21 KiB
Phase 9 — 8.K: Mix Visualizer Redesign (Design Spec)
Status: design-complete, post-Phase-9. Author: product-designer. Date: 2026-06-13 (interview answers captured 2026-06-13). Out of Phase-9-completion scope — Phase 9 closes without this. This is an implementation-ready spec; a future wave can be dispatched straight from it. No code has been written by this doc.
Cross-references: PLAN.md §9.8 (Wave 8 entry, 8.K — marked post-Phase-9), product-notes/phase-9-wave-8-remediation.md §4,
product-notes/phase-9-release-medium-types.md §5.4 (the original MixWaveformVisualizer design),
DeepDrftAPI/Services/UnifiedReleaseService.cs (the MixWaveformBucketCount = 2048 compute),
DeepDrftContent/Processors/WaveformProfileService.cs (the datum compute + storage).
Purpose
Daniel wants the Mix Visualizer completely redesigned from the current static silhouette into a scrolling, playback-coupled waveform — a musical score going by, bit by bit. He was interviewed before any design was committed; this document is the captured result. It is a finished spec, not a question set.
One-line brief: a windowed segment of the mix's waveform, showing only the currently-playing region, scrolling bottom-to-top, coupled to playback, zoom-coupled to apparent scroll speed, rendered as a theme-aware glassy lava-lamp background element, strictly read-only.
Current implementation (grounded, read 2026-06-13)
What exists today, so the redesign is anchored in the real starting point:
- Component:
MixWaveformVisualizer.razor+.razor.csinDeepDrftPublic.Client/Controls/. - What it renders today: a static full-viewport background. It fetches a stored loudness profile
(
WaveformProfileDto, base64 loudness bytes [0,255]) viaIReleaseDataService.GetMixWaveform(releaseId), and builds one closed SVG silhouette path — a vertically mirrored continuous wave around the horizontal midline, stretched across the full viewport viapreserveAspectRatio="none". A single still shape; it does not move. - Layout: full-page background behind the Mix detail content —
MixDetail.razorplaces<MixWaveformVisualizer>behind a.mix-detail-foregroundstacking layer. - Played-portion wash: a
<rect>clipped to the silhouette, width =PlaybackPosition * width, washes the played portion.PlaybackPositionis a normalized [0,1] input. - Inert seek seam:
OnSeekcallback + two-wayPlaybackPositionbinding exist but click-to-seek is not wired. This seam is now dropped from the design — see §D (the redesign is read-only). - Data: the profile is the high-resolution Mix datum — a fixed 2048-bucket loudness profile
(
UnifiedReleaseService.MixWaveformBucketCount = 2048), computed server-side at upload from the track's WAV (WaveformProfileService.ComputeAndStoreAsync) and stored in themix-waveformsvault keyed by the track's EntryKey. Crucially: the bucket count is fixed at 2048 regardless of mix length — a 3-minute mix and a 90-minute mix both get exactly 2048 buckets. This is the load-bearing constraint for §F. - Design boundary (from §5.4): deliberately NOT the player-bar peak-bar idiom
(
SpectrumVisualizer/LevelMeterFab). Those own the player bar; the Mix visualizer has its own visual language. That boundary holds in the redesign.
A. Motion model
The waveform scrolls like a musical score going by, bit by bit. It is a windowed segment showing only the currently-playing region — not the whole mix laid out and scrolled through, and not an ambient free-running animation. The window is a moving slice of the mix centered on (or anchored to) the playhead.
- Coupled to playback. The scroll exists because the track is playing. Scroll position = playback position. When playback pauses, the scroll holds (see §E for the idle/backgrounded behavior). When nothing is playing, there is no scroll — the panel shows a still slice (or the at-rest window at position 0).
- Direction: bottom-to-top (scrolling up). New audio enters from the bottom and flows upward; already-played audio exits off the top. This is fixed — confirmed intent, not a parameter.
- "Now" anchor. Because the window shows only the currently-playing region, the playhead sits at a fixed line within the window and the waveform flows past it. Recommendation: place "now" at or near the vertical center of the visible window, so the listener sees a short lead-in (audio about to play, below center) and a short trail-out (just-played audio, above center). This reads most like a score going by. (A top-anchored "now" line — everything visible is unplayed, flowing up to meet a line at the top — is the alternative; center is the recommended default for the lava-lamp feel. Tunable.)
- Start and end of the mix. At the very start, the window scrolls in from a partially-empty state (no audio below the lead-in yet); at the very end, it scrolls out to empty as the trail-out exits the top. No looping, no hold-and-repeat — it begins and ends with the audio.
B. Zoom / resolution coupling — the Guitar Hero model
Mental model (b), confirmed: zoom controls how short a time-span fills the screen. Zoomed in = a shorter span of audio occupies the full window height, so at a constant playback rate the audio traverses the window faster → faster apparent scroll. Zoomed out = a longer span fills the window → slower apparent scroll. Daniel's analogy: Guitar Hero — higher difficulty is more "zoomed in," notes appear and move faster.
This is a single coupled dimension: one zoom control drives both the visible time-span and (as a consequence) the apparent scroll speed. There is no independent speed control — speed is a function of zoom and the (fixed) playback rate.
The hard anchor (load-bearing)
At maximum zoom (fastest), exactly one quarter note is visible at 180 BPM.
- One quarter note at 180 BPM = 60 / 180 s = 0.333 s (333 ms) of audio.
- So the most-zoomed window shows ~333 ms of audio, top to bottom.
This anchors the fast end of the zoom range precisely. The slow end (minimum zoom — how much of the mix is visible at most) is tunable; see the recommended default range below.
Recommended default zoom range (smart guess now, tune later)
Daniel asked for a smart, aesthetically-pleasing default guess from current UI trends, range to be tuned later. Recommendation:
- Max zoom (anchor): visible window = 0.333 s (1 quarter note @ 180 BPM). This is the floor of the time-span range.
- Min zoom (default guess): visible window = ~30 s. A 30-second window scrolling up gives a calm, readable "structure of the mix" view — you can see phrase-level shape without it feeling static. (For reference: at 30 s the apparent scroll is ~90× slower than at the 0.333 s max-zoom end.)
- Default opening zoom: ~8–12 s visible. Open the panel at a mid-calm zoom — enough motion to read as alive (a lava lamp, not a frozen image), not so fast it reads as frantic. Recommend 10 s as the default opening window. This is the "you glance at it and it's pleasantly drifting" setting.
- Visible-window time-span range, then: 0.333 s → 30 s (a ~90× range), default open at 10 s.
These are starting numbers chosen for feel; Daniel can tune the min-zoom ceiling and default opening window once it's on screen. The max-zoom 0.333 s anchor is fixed (it's a stated requirement), and it is what drives the datum-resolution analysis in §F.
Slider persistence and default
- Default: open every Mix at the default opening zoom (~10 s window) — see above.
- Persistence (recommendation): persist the slider position within a listening session (so scrubbing zoom on one mix carries to the next mix opened in the same session) but reset to default on a fresh page load. This avoids a confusing "why is this mix zoomed weird" on return without making the control feel forgetful mid-session. Low-stakes; tunable. (Cookie/localStorage if cross-session persistence is later wanted.)
C. Aesthetics — lava lamp, not test equipment
Purely style and pleasure. This is a theming/background element, not an informational readout. The goal is hypnotic/ambient — something you can stare at — not "read the structure of the mix." If a choice trades legibility-of-structure for beauty-of-motion, take beauty.
- Theme-aware gradients. The fill uses the active palette — "Charleston in the Day" (light) /
"Lowcountry Summer Nights" (dark) — as gradients, not flat color. It must respond to the dark-mode
toggle live (the same
DarkModeSettingscascade the rest of the client uses). Pull gradient stops from the MudBlazor palette so a palette change carries automatically. - Glassy treatment. Frosted/translucent layering — think backdrop blur, soft luminous edges, a sense of depth rather than a hard-edged silhouette. The waveform should feel like lit glass moving behind the content, not a chart.
- Form of the wave. Keep a filled, flowing shape (the mirrored-silhouette lineage), but rendered as a glassy gradient-filled band rather than a solid silhouette. High-resolution here means smooth (no visible stair-stepping at any zoom), not detailed-as-data. The wave is a luminous ribbon flowing upward.
- Played vs. unplayed. In a windowed, score-going-by model, "played" is simply "already scrolled off the top." There is no separate played-portion wash like today's clipped rect — the motion itself encodes progress. (Optionally, a subtle luminosity gradient across the window — brightest at the "now" line, dimming toward the edges — can reinforce the playhead without a hard played/unplayed boundary. Optional polish, not required.)
- Layout. Stays a full-page background behind the Mix detail content, as today — the detail
content (title, metadata, play control) sits over it via the existing
.mix-detail-foregroundstacking layer. The redesign changes what's in the background, not where it sits.
D. Interaction — strictly read-only
NO SEEKING. The visualizer is strictly read-only. It is a background/theming element, not an
interactive control. Drop the inert click-to-seek seam (OnSeek, the two-way PlaybackPosition
write-back) from the design — the redesigned component takes playback position as one-way input only
and never writes back.
- No click-to-seek, no scrub, no controls on the panel. The only thing that affects the visualizer is (a) the playback position (input) and (b) the zoom slider — and even the zoom slider is a viewing control, not a playback control.
- Touch/mobile. No touch gestures for seeking or scrubbing. The visualizer is display-only on mobile. (Whether the zoom slider is exposed on mobile is a layout call — recommend yes, as a small control, since it's the one knob; but it must never become a seek surface.)
- Reusable/composable, but as a theme element. The panel must remain a reusable, composable
component (give it a
ReleaseIdand a playback-position input, it renders itself — the current component already self-fetches its datum, keep that). But it is integrated as a theme/background element, not an interactive widget. If it's ever embedded elsewhere (a mix card preview, an embed), it's still a read-only flowing backdrop. Design it so it works as a background at full-page size; small- size embedding is a nice-to-have, not a requirement.
Net contract change from today: the component keeps a one-way PlaybackPosition input and a
ReleaseId; it loses OnSeek and the two-way write-back; it gains a zoom input (slider-bound)
and an internal animation loop.
E. Performance & technical constraints
Daniel is open to the rendering-tech shift the scrolling animation requires — but with a hard constraint: NO TRICKS. Industry-standard patterns only, and comment the code well so Daniel can follow it. This is an explicit implementation constraint, not a preference.
- Rendering tech. A smooth, continuous bottom-to-top scroll at high resolution is a per-frame
animation; the current static SVG path won't animate smoothly. Recommend HTML5 Canvas 2D as the
default target: it is the industry-standard, well-documented, legible choice for a single flowing
waveform, it handles theme-aware gradients (
createLinearGradient) and glassy compositing (globalAlpha,filter: blur()/ layered draws) directly, and it is far easier to comment and follow than WebGL. Reserve WebGL only if Canvas 2D can't hold 60fps at the glassy treatment on target devices — and if so, justify the move in a comment, and stay with standard, textbook WebGL (no exotic shader tricks). Default: Canvas 2D.- Authoring: follows the existing TypeScript-interop discipline (
DeepDrftPublic/Interop/audio/compiled towwwroot/js/), one module per responsibility, consistent with the audio stack. The visualizer animation module is new TS; the Blazor component drives it via a thin interop bridge (pass datum + playback position + zoom; the TS owns therequestAnimationFrameloop). - No tricks, well-commented: the scroll math (mapping playback time → window offset → which datum samples are visible → screen Y), the zoom→time-span mapping, and the gradient/glass compositing each get clear comments. Daniel must be able to read the module and follow how a quarter-note-at-180-BPM becomes 333ms becomes N visible samples becomes pixels.
- Authoring: follows the existing TypeScript-interop discipline (
- Frame budget. Target 60fps on desktop, graceful degrade on weaker devices/mobile (drop to a lower internal sample density or a simpler gradient before dropping frames). No hard device floor set; design to degrade, not to break.
- Streaming interplay. The audio player is a chunked streaming pipeline. The visualizer's datum is a separate, pre-computed, fully-downloaded profile (not derived from the live stream) — so the scroll animation does not depend on decode/buffer state and can run as soon as the datum is fetched and playback position is flowing. It need not react to buffering. (If playback stalls on a buffer underrun, the scroll holds because playback position holds — that falls out naturally from playback-coupling.)
- Idle / battery. Pause or slow the animation when the mix is paused or the tab is backgrounded,
to avoid a CPU-hot idle animation.
requestAnimationFramealready naturally throttles in a backgrounded tab; additionally, gate the loop on "is playing" so a paused mix isn't burning frames. This is the standard, no-tricks way to keep it cool.
F. Data / datum resolution — the load-bearing analysis
Daniel's principle: capture at a high enough resolution regardless of content length — a long mix must not be under-sampled by a fixed bucket count. This is a direct challenge to the current datum, which is exactly a fixed bucket count. The question: does the stored datum suffice for the 333ms-at-max-zoom requirement, or does it need to change?
The current datum, measured against the anchor
The Mix datum is a fixed 2048-bucket loudness profile, content-length-agnostic
(MixWaveformBucketCount = 2048). So each bucket spans mixDuration / 2048 of audio:
| Mix length | Seconds per bucket | Buckets in a 333 ms max-zoom window |
|---|---|---|
| 3 min (180 s) | 0.088 s | ~3.8 buckets |
| 10 min (600 s) | 0.293 s | ~1.1 buckets |
| 30 min (1800 s) | 0.879 s | ~0.38 buckets |
| 60 min (3600 s) | 1.758 s | ~0.19 buckets |
| 90 min (5400 s) | 2.637 s | ~0.13 buckets |
Conclusion: the current fixed-2048 datum fails the requirement for any mix longer than a few minutes. At max zoom the window must render ~333 ms of audio smoothly. To draw a smooth filled curve across the window you want on the order of 60–120 sample points in that window. The current datum delivers fractions of a single sample for a typical (10–90 min) DJ mix — the max-zoom window would be a flat line or a single interpolated segment. Even a short 3-minute mix gives only ~3.8 buckets in the window — far short of smooth. The fixed bucket count is precisely the under-sampling-of-long-content failure Daniel's principle warns against.
Recommendation: switch to a content-length-aware (constant time-resolution) capture
The stored datum should be captured at a constant time resolution, not a constant bucket count. Instead of "always 2048 buckets," capture "always N samples per second of audio" — so a 90-minute mix gets proportionally more samples than a 3-minute one, and the time-resolution (seconds per sample) is the same regardless of length. This is the direct expression of "high enough resolution regardless of content length."
Target sample density (concrete):
- The max-zoom window is 333 ms. Target ~100 sample points across that window for a smooth glassy
curve →
100 / 0.333 s≈ 300 samples/sec. - Round to a clean, defensible target: ~333 samples/sec ≈ one sample every 3 ms. (333/sec makes the 333ms window hold exactly ~111 samples — comfortably smooth.)
- State the target explicitly: capture the Mix loudness datum at ≈ 333 samples/second (≈ 3 ms/sample), constant across all mix lengths.
What that costs (datum size):
| Mix length | Samples @ 333/s | Bytes (1 byte/sample, current quantization) |
|---|---|---|
| 10 min | ~200,000 | ~200 KB |
| 30 min | ~600,000 | ~600 KB |
| 60 min | ~1,200,000 | ~1.2 MB |
| 90 min | ~1,800,000 | ~1.8 MB |
These are tractable as a one-time downloaded datum (the player already streams multi-megabyte audio; a ~1MB profile fetched once per mix detail page is fine). If size becomes a concern, two standard, no-tricks mitigations:
- Cap + floor. Capture at 333/s but cap the absolute sample count for extreme outliers (e.g. cap at ~2M samples ≈ a 100-min mix), accepting slightly-below-target density only past that length.
- Tiered / multi-resolution datum (mipmap-style). Store the high-density datum plus a coarse overview (e.g. the existing 2048-bucket profile) and let the renderer pick the right tier for the current zoom — high-density only when zoomed in, coarse when zoomed out. This is the textbook approach (it's how audio editors render waveforms at varying zoom) and stays "industry-standard, no tricks." It also keeps the zoomed-out (30s window) view cheap. Recommended if datum size is a concern; otherwise the single high-density datum is simpler.
Compute-side change (for the future implementation wave): WaveformProfileService.ComputeAndStoreAsync
already takes a bucketCount parameter, and UnifiedReleaseService already passes a Mix-specific value
(2048). The change is to compute the Mix bucket count from the audio duration (bucketCount = ceil(durationSeconds * 333)) instead of a constant, optionally capped per above. The storage format,
vault, wire DTO (WaveformProfileDto — BucketCount + base64 Data), and fetch path do not change —
BucketCount simply becomes variable, which the DTO already supports (it's just an int). This is a
contained, backward-compatible datum change: existing 2048-bucket mixes still render (coarsely at max
zoom); re-running the generate trigger re-captures at the new density. This datum change is part of the
8.K implementation wave, not a Phase 9 deliverable.
Summary of the datum recommendation
- Current fixed-2048 datum: insufficient for the 333ms-at-max-zoom anchor on any real-length mix.
- Switch to constant-time-resolution capture at ≈ 333 samples/sec (≈ 3 ms/sample), content-length-aware.
- Datum size ~1.2 MB for a 60-min mix — tractable; use a tiered/mipmap datum if size or zoomed-out cost matters.
- No wire/format change — only the bucket count becomes duration-derived and variable.
G. Scope & sequencing
- Out of Phase-9-completion scope. Phase 9 closes without 8.K. This is a post-Phase-9 implementation wave, dispatchable straight from this spec.
- This is a data-needs-change, not a replace-in-place. Beyond the new rendering, it requires the datum-resolution change in §F (duration-derived capture). Sequence the datum change first (or together) — the renderer is only as good as the samples it's fed.
- The three things that drive the build's shape, settle-first order:
- Datum resolution (§F) — switch to constant-time-resolution capture. Everything visual depends on having enough samples in the max-zoom window.
- Rendering tech (§E) — Canvas 2D (default), TS-interop module owning the
rAFloop, well commented, no tricks. - Motion + zoom mapping (§A, §B) — the windowed bottom-to-top scroll and the Guitar-Hero zoom→time-span→apparent-speed coupling, anchored at 333ms max zoom.
- Read-only (§D) simplifies the build: no seek wiring, no gesture handling — drop the existing inert seam.
This spec is complete enough to dispatch an implementation wave. Open items are tuning knobs (exact min-zoom ceiling, default opening window, "now" anchor position, slider persistence scope, single vs. tiered datum) — none block starting; all are called out inline with recommended defaults.