Files
deepdrft/product-notes/phase-9-mix-visualizer-redesign.md

332 lines
21 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 9 — 8.K: Mix Visualizer Redesign (Design Spec)
Status: **design-complete, post-Phase-9.** Author: product-designer. Date: 2026-06-13
(interview answers captured 2026-06-13).
**Out of Phase-9-completion scope** — Phase 9 closes without this. This is an implementation-ready
spec; a future wave can be dispatched straight from it. **No code has been written by this doc.**
Cross-references: `PLAN.md §9.8` (Wave 8 entry, 8.K — marked post-Phase-9), `product-notes/phase-9-wave-8-remediation.md §4`,
`product-notes/phase-9-release-medium-types.md §5.4` (the original `MixWaveformVisualizer` design),
`DeepDrftAPI/Services/UnifiedReleaseService.cs` (the `MixWaveformBucketCount = 2048` compute),
`DeepDrftContent/Processors/WaveformProfileService.cs` (the datum compute + storage).
---
## Purpose
Daniel wants the Mix Visualizer **completely redesigned** from the current static silhouette into a
**scrolling, playback-coupled waveform** — a musical score going by, bit by bit. He was interviewed
before any design was committed; this document is the captured result. It is a finished spec, not a
question set.
One-line brief: **a windowed segment of the mix's waveform, showing only the currently-playing region,
scrolling bottom-to-top, coupled to playback, zoom-coupled to apparent scroll speed, rendered as a
theme-aware glassy lava-lamp background element, strictly read-only.**
---
## Current implementation (grounded, read 2026-06-13)
What exists today, so the redesign is anchored in the real starting point:
- **Component:** `MixWaveformVisualizer.razor` + `.razor.cs` in `DeepDrftPublic.Client/Controls/`.
- **What it renders today:** a **static** full-viewport background. It fetches a stored loudness profile
(`WaveformProfileDto`, base64 loudness bytes [0,255]) via `IReleaseDataService.GetMixWaveform(releaseId)`,
and builds **one closed SVG silhouette path** — a vertically mirrored continuous wave around the
horizontal midline, stretched across the full viewport via `preserveAspectRatio="none"`. A single
still shape; it does not move.
- **Layout:** full-page background behind the Mix detail content — `MixDetail.razor` places
`<MixWaveformVisualizer>` behind a `.mix-detail-foreground` stacking layer.
- **Played-portion wash:** a `<rect>` clipped to the silhouette, width = `PlaybackPosition * width`,
washes the played portion. `PlaybackPosition` is a normalized [0,1] input.
- **Inert seek seam:** `OnSeek` callback + two-way `PlaybackPosition` binding exist but click-to-seek is
**not wired**. **This seam is now dropped from the design** — see §D (the redesign is read-only).
- **Data:** the profile is the high-resolution Mix datum — a **fixed 2048-bucket** loudness profile
(`UnifiedReleaseService.MixWaveformBucketCount = 2048`), computed server-side at upload from the
track's WAV (`WaveformProfileService.ComputeAndStoreAsync`) and stored in the `mix-waveforms` vault
keyed by the track's EntryKey. **Crucially: the bucket count is fixed at 2048 regardless of mix
length** — a 3-minute mix and a 90-minute mix both get exactly 2048 buckets. This is the load-bearing
constraint for §F.
- **Design boundary (from §5.4):** deliberately **NOT** the player-bar peak-bar idiom
(`SpectrumVisualizer` / `LevelMeterFab`). Those own the player bar; the Mix visualizer has its own
visual language. That boundary holds in the redesign.
---
## A. Motion model
**The waveform scrolls like a musical score going by, bit by bit.** It is a **windowed segment showing
only the currently-playing region** — *not* the whole mix laid out and scrolled through, and *not* an
ambient free-running animation. The window is a moving slice of the mix centered on (or anchored to) the
playhead.
- **Coupled to playback.** The scroll exists *because* the track is playing. Scroll position = playback
position. When playback pauses, the scroll holds (see §E for the idle/backgrounded behavior). When
nothing is playing, there is no scroll — the panel shows a still slice (or the at-rest window at
position 0).
- **Direction: bottom-to-top (scrolling up).** New audio enters from the bottom and flows upward;
already-played audio exits off the top. This is fixed — confirmed intent, not a parameter.
- **"Now" anchor.** Because the window shows only the currently-playing region, the playhead sits at a
fixed line within the window and the waveform flows past it. **Recommendation: place "now" at or near
the vertical center** of the visible window, so the listener sees a short lead-in (audio about to
play, below center) and a short trail-out (just-played audio, above center). This reads most like a
score going by. (A top-anchored "now" line — everything visible is unplayed, flowing up to meet a line
at the top — is the alternative; center is the recommended default for the lava-lamp feel. Tunable.)
- **Start and end of the mix.** At the very start, the window scrolls *in* from a partially-empty state
(no audio below the lead-in yet); at the very end, it scrolls *out* to empty as the trail-out exits
the top. No looping, no hold-and-repeat — it begins and ends with the audio.
---
## B. Zoom / resolution coupling — the Guitar Hero model
**Mental model (b), confirmed:** zoom controls **how short a time-span fills the screen**. Zoomed in =
a shorter span of audio occupies the full window height, so at a constant playback rate the audio
traverses the window faster → **faster apparent scroll**. Zoomed out = a longer span fills the window →
slower apparent scroll. Daniel's analogy: **Guitar Hero** — higher difficulty is more "zoomed in," notes
appear and move faster.
This is a single coupled dimension: one zoom control drives both the visible time-span *and* (as a
consequence) the apparent scroll speed. There is **no independent speed control** — speed is a function
of zoom and the (fixed) playback rate.
### The hard anchor (load-bearing)
**At maximum zoom (fastest), exactly one quarter note is visible at 180 BPM.**
- One quarter note at 180 BPM = 60 / 180 s = **0.333 s (333 ms)** of audio.
- So the **most-zoomed window shows ~333 ms of audio**, top to bottom.
This anchors the *fast* end of the zoom range precisely. The *slow* end (minimum zoom — how much of the
mix is visible at most) is tunable; see the recommended default range below.
### Recommended default zoom range (smart guess now, tune later)
Daniel asked for a smart, aesthetically-pleasing default guess from current UI trends, range to be
tuned later. Recommendation:
- **Max zoom (anchor):** visible window = **0.333 s** (1 quarter note @ 180 BPM). This is the floor of
the time-span range.
- **Min zoom (default guess):** visible window = **~30 s**. A 30-second window scrolling up gives a calm,
readable "structure of the mix" view — you can see phrase-level shape without it feeling static. (For
reference: at 30 s the apparent scroll is ~90× slower than at the 0.333 s max-zoom end.)
- **Default opening zoom:** **~812 s visible.** Open the panel at a mid-calm zoom — enough motion to
read as alive (a lava lamp, not a frozen image), not so fast it reads as frantic. **Recommend 10 s** as
the default opening window. This is the "you glance at it and it's pleasantly drifting" setting.
- **Visible-window time-span range, then:** **0.333 s → 30 s** (a ~90× range), default open at **10 s**.
These are starting numbers chosen for feel; Daniel can tune the min-zoom ceiling and default opening
window once it's on screen. The **max-zoom 0.333 s anchor is fixed** (it's a stated requirement), and it
is what drives the datum-resolution analysis in §F.
### Slider persistence and default
- **Default:** open every Mix at the default opening zoom (~10 s window) — see above.
- **Persistence (recommendation):** persist the slider position **within a listening session** (so
scrubbing zoom on one mix carries to the next mix opened in the same session) but **reset to default on
a fresh page load**. This avoids a confusing "why is this mix zoomed weird" on return without making
the control feel forgetful mid-session. Low-stakes; tunable. (Cookie/localStorage if cross-session
persistence is later wanted.)
---
## C. Aesthetics — lava lamp, not test equipment
**Purely style and pleasure.** This is a **theming/background element**, not an informational readout.
The goal is hypnotic/ambient — something you can stare at — not "read the structure of the mix." If a
choice trades legibility-of-structure for beauty-of-motion, take beauty.
- **Theme-aware gradients.** The fill uses the active palette — **"Charleston in the Day"** (light) /
**"Lowcountry Summer Nights"** (dark) — as gradients, not flat color. It must respond to the dark-mode
toggle live (the same `DarkModeSettings` cascade the rest of the client uses). Pull gradient stops from
the MudBlazor palette so a palette change carries automatically.
- **Glassy treatment.** Frosted/translucent layering — think backdrop blur, soft luminous edges, a sense
of depth rather than a hard-edged silhouette. The waveform should feel like lit glass moving behind the
content, not a chart.
- **Form of the wave.** Keep a **filled, flowing shape** (the mirrored-silhouette lineage), but rendered
as a glassy gradient-filled band rather than a solid silhouette. High-resolution here means *smooth*
(no visible stair-stepping at any zoom), not *detailed-as-data*. The wave is a luminous ribbon flowing
upward.
- **Played vs. unplayed.** In a windowed, score-going-by model, "played" is simply "already scrolled off
the top." There is **no separate played-portion wash** like today's clipped rect — the motion itself
encodes progress. (Optionally, a subtle luminosity gradient across the window — brightest at the "now"
line, dimming toward the edges — can reinforce the playhead without a hard played/unplayed boundary.
Optional polish, not required.)
- **Layout.** Stays a **full-page background** behind the Mix detail content, as today — the detail
content (title, metadata, play control) sits over it via the existing `.mix-detail-foreground` stacking
layer. The redesign changes what's *in* the background, not where it sits.
---
## D. Interaction — strictly read-only
**NO SEEKING. The visualizer is strictly read-only.** It is a background/theming element, not an
interactive control. **Drop the inert click-to-seek seam** (`OnSeek`, the two-way `PlaybackPosition`
write-back) from the design — the redesigned component takes playback position as **one-way input only**
and never writes back.
- **No click-to-seek, no scrub, no controls on the panel.** The only thing that affects the visualizer
is (a) the playback position (input) and (b) the zoom slider — and even the zoom slider is a *viewing*
control, not a *playback* control.
- **Touch/mobile.** No touch gestures for seeking or scrubbing. The visualizer is display-only on
mobile. (Whether the zoom slider is exposed on mobile is a layout call — recommend yes, as a small
control, since it's the one knob; but it must never become a seek surface.)
- **Reusable/composable, but as a theme element.** The panel must remain a **reusable, composable
component** (give it a `ReleaseId` and a playback-position input, it renders itself — the current
component already self-fetches its datum, keep that). But it is integrated as a **theme/background
element**, not an interactive widget. If it's ever embedded elsewhere (a mix card preview, an embed),
it's still a read-only flowing backdrop. Design it so it works as a background at full-page size; small-
size embedding is a nice-to-have, not a requirement.
**Net contract change from today:** the component keeps a one-way `PlaybackPosition` input and a
`ReleaseId`; it **loses** `OnSeek` and the two-way write-back; it **gains** a zoom input (slider-bound)
and an internal animation loop.
---
## E. Performance & technical constraints
Daniel is **open to the rendering-tech shift** the scrolling animation requires — but with a hard
constraint: **NO TRICKS. Industry-standard patterns only, and comment the code well so Daniel can follow
it.** This is an explicit implementation constraint, not a preference.
- **Rendering tech.** A smooth, continuous bottom-to-top scroll at high resolution is a per-frame
animation; the current static SVG path won't animate smoothly. **Recommend HTML5 Canvas 2D** as the
default target: it is the industry-standard, well-documented, legible choice for a single flowing
waveform, it handles theme-aware gradients (`createLinearGradient`) and glassy compositing
(`globalAlpha`, `filter: blur()` / layered draws) directly, and it is far easier to comment and follow
than WebGL. **Reserve WebGL only if** Canvas 2D can't hold 60fps at the glassy treatment on target
devices — and if so, justify the move in a comment, and stay with standard, textbook WebGL (no exotic
shader tricks). Default: Canvas 2D.
- **Authoring:** follows the existing TypeScript-interop discipline (`DeepDrftPublic/Interop/audio/`
compiled to `wwwroot/js/`), one module per responsibility, consistent with the audio stack. The
visualizer animation module is new TS; the Blazor component drives it via a thin interop bridge
(pass datum + playback position + zoom; the TS owns the `requestAnimationFrame` loop).
- **No tricks, well-commented:** the scroll math (mapping playback time → window offset → which datum
samples are visible → screen Y), the zoom→time-span mapping, and the gradient/glass compositing each
get clear comments. Daniel must be able to read the module and follow how a quarter-note-at-180-BPM
becomes 333ms becomes N visible samples becomes pixels.
- **Frame budget.** Target **60fps on desktop**, graceful degrade on weaker devices/mobile (drop to a
lower internal sample density or a simpler gradient before dropping frames). No hard device floor set;
design to degrade, not to break.
- **Streaming interplay.** The audio player is a chunked streaming pipeline. The visualizer's datum is a
**separate, pre-computed, fully-downloaded** profile (not derived from the live stream) — so the scroll
animation does **not** depend on decode/buffer state and can run as soon as the datum is fetched and
playback position is flowing. It need not react to buffering. (If playback stalls on a buffer
underrun, the scroll holds because playback position holds — that falls out naturally from
playback-coupling.)
- **Idle / battery.** **Pause or slow the animation when the mix is paused or the tab is backgrounded**,
to avoid a CPU-hot idle animation. `requestAnimationFrame` already naturally throttles in a
backgrounded tab; additionally, gate the loop on "is playing" so a paused mix isn't burning frames.
This is the standard, no-tricks way to keep it cool.
---
## F. Data / datum resolution — the load-bearing analysis
Daniel's principle: **capture at a high enough resolution regardless of content length** — a long mix
must not be under-sampled by a fixed bucket count. This is a direct challenge to the current datum, which
is exactly a fixed bucket count. The question: does the stored datum suffice for the 333ms-at-max-zoom
requirement, or does it need to change?
### The current datum, measured against the anchor
The Mix datum is a **fixed 2048-bucket** loudness profile, content-length-agnostic
(`MixWaveformBucketCount = 2048`). So each bucket spans `mixDuration / 2048` of audio:
| Mix length | Seconds per bucket | Buckets in a 333 ms max-zoom window |
|------------|--------------------|--------------------------------------|
| 3 min (180 s) | 0.088 s | **~3.8 buckets** |
| 10 min (600 s) | 0.293 s | **~1.1 buckets** |
| 30 min (1800 s)| 0.879 s | **~0.38 buckets** |
| 60 min (3600 s)| 1.758 s | **~0.19 buckets** |
| 90 min (5400 s)| 2.637 s | **~0.13 buckets** |
**Conclusion: the current fixed-2048 datum fails the requirement for any mix longer than a few minutes.**
At max zoom the window must render ~333 ms of audio smoothly. To draw a smooth filled curve across the
window you want on the order of **60120 sample points** in that window. The current datum delivers
*fractions of a single sample* for a typical (1090 min) DJ mix — the max-zoom window would be a flat
line or a single interpolated segment. Even a short 3-minute mix gives only ~3.8 buckets in the window —
far short of smooth. The fixed bucket count is precisely the under-sampling-of-long-content failure
Daniel's principle warns against.
### Recommendation: switch to a content-length-aware (constant time-resolution) capture
**The stored datum should be captured at a constant *time* resolution, not a constant *bucket count*.**
Instead of "always 2048 buckets," capture "always N samples per second of audio" — so a 90-minute mix
gets proportionally more samples than a 3-minute one, and the time-resolution (seconds per sample) is the
same regardless of length. This is the direct expression of "high enough resolution regardless of content
length."
**Target sample density (concrete):**
- The max-zoom window is 333 ms. Target **~100 sample points across that window** for a smooth glassy
curve → `100 / 0.333 s`**300 samples/sec**.
- Round to a clean, defensible target: **~333 samples/sec ≈ one sample every 3 ms.** (333/sec makes the
333ms window hold exactly ~111 samples — comfortably smooth.)
- **State the target explicitly: capture the Mix loudness datum at ≈ 333 samples/second (≈ 3 ms/sample),
constant across all mix lengths.**
**What that costs (datum size):**
| Mix length | Samples @ 333/s | Bytes (1 byte/sample, current quantization) |
|------------|------------------|----------------------------------------------|
| 10 min | ~200,000 | ~200 KB |
| 30 min | ~600,000 | ~600 KB |
| 60 min | ~1,200,000| ~1.2 MB |
| 90 min | ~1,800,000| ~1.8 MB |
These are tractable as a one-time downloaded datum (the player already streams multi-megabyte audio; a
~1MB profile fetched once per mix detail page is fine). If size becomes a concern, two standard,
no-tricks mitigations:
1. **Cap + floor.** Capture at 333/s but cap the absolute sample count for extreme outliers (e.g. cap at
~2M samples ≈ a 100-min mix), accepting slightly-below-target density only past that length.
2. **Tiered / multi-resolution datum (mipmap-style).** Store the high-density datum *plus* a coarse
overview (e.g. the existing 2048-bucket profile) and let the renderer pick the right tier for the
current zoom — high-density only when zoomed in, coarse when zoomed out. This is the textbook approach
(it's how audio editors render waveforms at varying zoom) and stays "industry-standard, no tricks." It
also keeps the zoomed-*out* (30s window) view cheap. **Recommended if datum size is a concern;
otherwise the single high-density datum is simpler.**
**Compute-side change (for the future implementation wave):** `WaveformProfileService.ComputeAndStoreAsync`
already takes a `bucketCount` parameter, and `UnifiedReleaseService` already passes a Mix-specific value
(2048). The change is to compute the Mix bucket count **from the audio duration** (`bucketCount =
ceil(durationSeconds * 333)`) instead of a constant, optionally capped per above. The storage format,
vault, wire DTO (`WaveformProfileDto``BucketCount` + base64 `Data`), and fetch path **do not change**
`BucketCount` simply becomes variable, which the DTO already supports (it's just an int). This is a
contained, backward-compatible datum change: existing 2048-bucket mixes still render (coarsely at max
zoom); re-running the generate trigger re-captures at the new density. **This datum change is part of the
8.K implementation wave, not a Phase 9 deliverable.**
### Summary of the datum recommendation
- **Current fixed-2048 datum: insufficient** for the 333ms-at-max-zoom anchor on any real-length mix.
- **Switch to constant-time-resolution capture at ≈ 333 samples/sec (≈ 3 ms/sample)**, content-length-aware.
- **Datum size** ~1.2 MB for a 60-min mix — tractable; use a **tiered/mipmap datum** if size or
zoomed-out cost matters.
- **No wire/format change** — only the bucket count becomes duration-derived and variable.
---
## G. Scope & sequencing
- **Out of Phase-9-completion scope.** Phase 9 closes without 8.K. This is a **post-Phase-9
implementation wave**, dispatchable straight from this spec.
- **This is a data-needs-change, not a replace-in-place.** Beyond the new rendering, it requires the
datum-resolution change in §F (duration-derived capture). Sequence the datum change first (or together)
— the renderer is only as good as the samples it's fed.
- **The three things that drive the build's shape, settle-first order:**
1. **Datum resolution (§F)** — switch to constant-time-resolution capture. Everything visual depends on
having enough samples in the max-zoom window.
2. **Rendering tech (§E)** — Canvas 2D (default), TS-interop module owning the `rAF` loop, well
commented, no tricks.
3. **Motion + zoom mapping (§A, §B)** — the windowed bottom-to-top scroll and the Guitar-Hero
zoom→time-span→apparent-speed coupling, anchored at 333ms max zoom.
- **Read-only (§D)** simplifies the build: no seek wiring, no gesture handling — drop the existing inert
seam.
**This spec is complete enough to dispatch an implementation wave.** Open items are tuning knobs (exact
min-zoom ceiling, default opening window, "now" anchor position, slider persistence scope, single vs.
tiered datum) — none block starting; all are called out inline with recommended defaults.