diff --git a/product-notes/spectrum-seeker.md b/product-notes/spectrum-seeker.md new file mode 100644 index 0000000..7d4bfd1 --- /dev/null +++ b/product-notes/spectrum-seeker.md @@ -0,0 +1,605 @@ +# WaveformSeeker — loudness-waveform seekbar to replace the MudSlider + +Status: approved. Decisions resolved 2026-06-05. Author: product-designer. Date: 2026-06-05. +**Plan only — no code edits made by this doc.** + +--- + +## 1. Summary + +Replace the `MudSlider`-based scrub bar in `PlayerSeekZone.razor` with a new +`` component that renders the track's **loudness profile** as a +high-density vertical bar chart and serves as the seek surface (click / drag to seek). + +The point is to make the seekbar *informative*: instead of a featureless line, the +listener sees the track's energy shape — the quiet intro, the drop, the breakdown, the +outro — and can scrub against that shape. This is the established "waveform scrubber" +idiom from SoundCloud, Overcast, and most DAW transport bars. We are borrowing it +deliberately; the novel part for us is only that the profile is **preprocessed server-side +and shipped as a small quantized array**, so the visual paints the instant a track loads rather +than waiting for the audio to decode. + +The loudness measure is **not hardcoded to RMS**. The first implementation computes RMS, but +the compute path is built around a swappable `ILoudnessAlgorithm` abstraction (§5a) so a +different perceptual loudness profile (e.g. LUFS) can be substituted later without touching the +component, the wire format, or the storage. The component and the data are named for the +*concept* (waveform / loudness profile), not the algorithm. + +Two visualizations currently coexist in the seek zone. They are being separated by +*kind*: + +- **Real-time spectrum** (FFT frequency bars, `SpectrumVisualizer.razor`) — a *live* readout + of "what is sounding right now." This moves **up, above the volume slider**. +- **Static loudness-over-time** (the new `WaveformSeeker`) — a *whole-track* readout of "how loud + is each moment." This takes over the seek area. + +This is a clean conceptual split: live-frequency lives with the output level (volume), +whole-track-amplitude lives with the transport position (seek). The current arrangement +(real-time spectrum behind the seek slider) conflates the two. + +### Naming (decided) + +"Spectrum" properly means frequency content; what this component shows is **amplitude over +time**, not spectrum. The component is named honestly: **`WaveformSeeker`** (decided), which +reads correctly against the live `SpectrumVisualizer` (frequency) without implying FFT data is +in the payload. The *data* is named for the concept, not the algorithm: **`WaveformProfile`** / +`WaveformProfileDto` / `waveformBuckets` / a `profile` field — so substituting the loudness +algorithm (RMS → LUFS, §5a) never forces a rename of the type that carries it. + +--- + +## 2. Current state (what we're changing) + +The seek zone today (`PlayerSeekZone.razor`): + +```razor + + @* live FFT bars, sits on top *@ +
+ @* the scrub bar *@ +
+ @* time text *@ +
+``` + +Relevant mechanics already in place that the new component must preserve: + +- **Seek gesture plumbing** lives in `PlayerSeekZone.razor.cs`: `OnSeekStart` / + `OnSeekChange` / `OnSeekEnd` callbacks bubble to `AudioPlayerBar.razor.cs`, which sets + `_isSeeking`, tracks `_seekPosition`, and calls `PlayerService.Seek(position)` on release. + `DisplayTime` shows the drag position while seeking, real `CurrentTime` otherwise. +- **`CanSeek`** = `IsLoaded && Duration.HasValue && Duration > 0`. Seek is allowed during + streaming, including beyond the buffer (the offset-refetch path in + `StreamingAudioPlayerService` / `AudioPlayer.ts.seekBeyondBuffer`). The new component does + **not** touch that path — it only produces a target time and hands it to the same + `Seek(double)` call. +- **`SpectrumVisualizer`** is driven entirely by `AudioInteropService.StartSpectrumAnimationAsync`, + which subscribes a callback to the TS `SpectrumAnalyzer` (live FFT, ~30fps). It already + self-manages animation lifecycle off `PlayerService.StateChanged`. Moving it is a pure + layout move — no logic change. +- **Player layout** (`AudioPlayerBar.razor.css`) is pure-CSS responsive: at ≥600px the row is + `[transport] [seek grows] [volume]`; at <600px it's `[transport][volume]` then full-width + seek below. Wherever the spectrum lands, it must respect this. + +--- + +## 3. UI layout changes + +### 3a. What moves + +| Element | Today | After | +|---|---|---| +| Live FFT spectrum (`SpectrumVisualizer`) | Inside `PlayerSeekZone`, above the slider | Inside the **volume cluster**, above the volume slider | +| Scrub bar (`MudSlider`) | `PlayerSeekZone` | Replaced by `WaveformSeeker` (loudness bars + playhead) | +| Timestamp (`TimestampLabel`) | Below the slider in `PlayerSeekZone` | Stays with the seeker (below or overlaid on the bars) | +| Volume slider (`VolumeControls`) | Right cluster | Unchanged position; now has the live spectrum stacked above it | + +### 3b. Resulting zones + +- **Transport zone** — unchanged (play/pause/stop + load spinner). +- **Volume zone** — becomes a small vertical stack: live FFT spectrum on top, volume + slider below. This is a natural pairing ("here's the live output, here's how loud"). + `VolumeControls.razor` gets the `` stacked above its existing + `MudStack`. The wrapper is renamed `VolumeZone` (**decided**) for symmetry with the other + two zones. +- **Seek zone** — becomes the `WaveformSeeker`: a wide loudness bar chart that grows to fill the + available width (it inherits the `flex-grow:1` the seek zone has today), with the + timestamp beneath. + +### 3c. Layout risk + +The live spectrum is currently a wide element. Stacking it above the *volume* slider +constrains it to the narrow right cluster — at ≥600px the volume cluster is only as wide as +the slider (the CSS halves and flex-start-pins it per commit `78c6803`). A 32-bucket FFT bar +chart squeezed into ~120px will look cramped. + +**Decided: 24 buckets in the volume cluster, parameterized.** The live spectrum renders **24 +buckets** in the narrow volume slot, set via the existing `BucketCount` parameter on +`SpectrumVisualizer` so the count can be tuned without a code change to the component. 24 reads +denser than 16 while still fitting the ~120px cluster comfortably. + +--- + +## 4. WaveformSeeker component design + +### 4a. Data → geometry + +The component receives a normalized loudness profile: `double[] profile`, each value in `[0,1]`, +representing the loudness measure of a contiguous time slice. Profile length is **N buckets** +covering the whole track regardless of duration (fixed bucket count, variable bucket +*duration*). Each bucket renders as one vertical bar; bar height = `profile[i]` scaled to the +component height (with a small floor, ~2%, so silence is still visible as a hairline — mirrors +`SpectrumVisualizer.GetBarHeight`). + +**Bar count.** Two regimes: + +- **Preprocessed resolution (N):** how many buckets the backend computes and stores. **N is + configurable** (e.g. via `WaveformProfileOptions` bound from DI/config), **default 512**. A + high source resolution lets the front end downsample to whatever fits the rendered width + without re-fetching. Storage is tiny regardless of N (see §5). +- **Rendered resolution:** how many bars actually draw, = pixels-available / (bar + gap). **The + front end derives its rendered bar count from the available width, regardless of N** — it does + not assume the stored N is the bar count. At a typical ~600px seek zone with 2px bars + 1px + gaps that's ~200 bars. The component **downsamples N → rendered count** by max-or-mean over + each rendered bucket's source range. Use **max** (peak) for the visual — peak-per-bucket gives + the punchy DAW look; mean flattens transients. + +**Decided: N configurable, default 512; rendered count derived from width; downsample by peak.** +512 is a clean power-of-two, downsamples evenly to 256/128/64, and is ~512B on the wire as +quantized bytes (§5b). The wire format is the quantized `byte[]` base64 either way; N being +configurable does not change the format. + +### 4b. Playhead / progress indication + +The current position is shown two ways simultaneously (both cheap, both standard): + +1. **Played/unplayed split** — bars left of the playhead render in the played colour (moss + green `--deepdrft-green-accent`, matching the house waveform identity called out in + `track-card-theming.md`), bars right render muted. The split point = `CurrentTime / Duration`. +2. **Playhead line** — a 1–2px vertical rule at the split, for precision. + +While dragging, the split/line follow the pointer (`DisplayTime`), not playback — same +`_isSeeking` discipline as today. + +### 4c. Interaction model + +Pointer-based, reusing the existing callback contract so `AudioPlayerBar.razor.cs` is barely +touched: + +- **Hover** → a faint preview line at the cursor + a tooltip/label showing the time under the + cursor (`hoverTime = (cursorX / width) * Duration`). Preview only; no seek. (New affordance; + the MudSlider had none. Borrowed from SoundCloud/YouTube scrubbers.) +- **Click** → seek to `clickX / width * Duration`. Fires `OnSeekStart` then immediately + `OnSeekEnd(clickTime)`. +- **Drag** → `pointerdown` starts seeking (`OnSeekStart`), `pointermove` updates the preview + position and fires `OnSeekChange(t)` (so `DisplayTime` and the played/unplayed split track + the drag live), `pointerup` commits (`OnSeekEnd(t)` → `PlayerService.Seek(t)`). + `pointerleave` while dragging commits at the last position (matches current + `HandlePointerLeave` behaviour) — or, better, use **pointer capture** (`setPointerCapture`) + so a drag that leaves the element keeps tracking until release. Recommend pointer capture; + it's the more forgiving gesture and avoids the "lost the drag" feel. + +Position math needs the element's pixel width and the pointer's offset. Two implementations: + +- **Pure Blazor:** use `@onpointermove`/`@onpointerdown` with `PointerEventArgs.OffsetX` and a + cached bounding width (one JS `getBoundingClientRect` call on resize). Simple, no per-frame + interop. +- **Thin JS helper:** a tiny interop that does hit-testing and returns a normalized `[0,1]` + fraction. Only worth it if `OffsetX` proves unreliable across the responsive reflows. + +**Recommend pure-Blazor pointer events first**, with `OffsetX`/cached width; fall back to a JS +helper only if hit-testing is flaky. Keeps the new surface out of the TS bundle (see §7). + +### 4d. Rendering approach + +- **DOM bars** (one `
` per rendered bar, CSS `--bar-height`) — exactly how + `SpectrumVisualizer` works today, so it's consistent and themeable via existing + `deepdrft-` tokens. At ~200 bars this is fine; Blazor diffing over 200 static divs that only + change a CSS var on seek is cheap. +- **Canvas** — one ``, drawn via a small JS interop on load + on playhead move. Scales + to thousands of bars and avoids 200-node diffs, but pulls the component into the JS interop + layer and complicates theming (canvas can't read CSS vars without plumbing). + +**Recommend DOM bars** to match the existing visualizer and stay in pure Blazor/CSS. Revisit +canvas only if profiling shows the seek-time re-render (recolouring the split) janks. The +played/unplayed split can be done **without** re-rendering every bar by overlaying a clipped +coloured layer — render the bars once in the played colour, lay a muted-colour copy clipped to +`width * (1 - progress)` from the right on top. Then a seek only moves one clip rect, not 200 +divs. This is the key perf trick; call it out for the implementer. + +### 4e. No-profile-yet state (important) + +A track may have no stored loudness profile (legacy tracks uploaded before this feature; profile +fetch failed; profile still computing). The component must degrade, not break: + +- **Fallback bars:** render a flat row of floor-height bars (or a gentle idle shimmer) so the + control still reads as a seekbar and **remains fully seekable** (geometry is just time/width; + it needs no profile data to seek). Seek must never depend on the profile being present. +- **Optional client-side compute:** once audio is decoded, the front end *could* compute a + loudness profile from the decoded `AudioBuffer`s and fill the bars live (progressive reveal as + the stream decodes). This is a real fallback but adds a TS path (§7); treat as a **later + enhancement**, not part of the first cut. First cut: preprocessed profile or flat fallback. + +**Recommend: first cut ships preprocessed-or-flat. Seekability is never gated on the profile.** + +--- + +## 5. Backend loudness preprocessing + +This is the load-bearing design decision. Three sub-questions: **how to compute**, **when to +compute**, **where to store**. + +### 5a. How to compute (swappable loudness algorithm) + +**The loudness measure is an abstraction, not a hardwired RMS pass** (decided). `WaveformProfileService` +in `DeepDrftContent` owns the PCM walk, bucketing, normalization, and storage; the per-bucket +loudness calculation is delegated to an injected **`ILoudnessAlgorithm`** strategy. The first +implementation is RMS (`RmsLoudnessAlgorithm`); **LUFS** (or another perceptual profile) is the +named future alternative, droppable in as a second `ILoudnessAlgorithm` without touching the +service, the wire format, the storage, or the component. + +Sketch of the seam (illustrative, not prescriptive): + +``` +interface ILoudnessAlgorithm { + // given the mono samples for one time slice, return its loudness in [0,1]-able units + double Measure(ReadOnlySpan sliceSamples); +} +// first impl: RMS — sqrt(mean(sample²)). future: LUFS (K-weighting + gating). +``` + +We already own a PCM-WAV parser: `AudioProcessor` in `DeepDrftContent` parses RIFF/WAVE/fmt/ +data, validates PCM, and knows channels / sampleRate / bitsPerSample / blockAlign / dataSize. +Computing the profile is a straightforward extension of that same buffer walk — **no new audio +library needed for RMS**. The stack is PCM-only WAV today (`AudioProcessor` rejects non-PCM), so +`WaveformProfileService` can read samples directly: + +1. Locate the `data` chunk (already done in `ValidateWavStructure` / `FindChunk`). +2. Walk the PCM samples, decode per `bitsPerSample` (16/24/32-bit signed; 8-bit unsigned), + average channels to mono. +3. Partition the sample stream into **N equal time slices** (N from `WaveformProfileOptions`, + default 512); hand each slice to `ILoudnessAlgorithm.Measure` to get `bucket[i]`. +4. Normalize: divide by the max bucket (**peak-normalize** to `[0,1]`, decided) so quiet tracks + still show shape. (Trade-off: peak-normalize loses absolute-loudness comparison *between* + tracks. Acceptable — the seeker is about *this* track's shape, not cross-track loudness. A + future LUFS algorithm that wants absolute units can normalize differently behind the same + interface.) + +The PCM walk + bucketing + RMS `Measure` is ~40 lines. No external dependency for the RMS path. +**Do not pull in NAudio or similar** for RMS — the existing parser already does the hard part. A +future LUFS implementation may justify a dep; if so, that decision rides with *that* algorithm, +not the service. + +Cost: one linear pass over the PCM buffer. For a 100MB WAV that's ~25M stereo samples — a few +hundred ms, done **once at upload**, never on the playback path. + +### 5b. Data format on the wire + +Front end needs `double[] profile` length N, each `[0,1]`. **Decided: quantized `byte[]` (each +bucket 0–255), base64 in JSON**, decoded to `[0,1]` client-side (`b/255.0`). 8-bit quantization +is *visually* lossless for a bar chart; at N=512 that's 512 bytes raw / ~684 chars base64 — +negligible to store and ship, and it keeps the profile from bloating the metadata payload if it +ever rides along with `TrackDto` (see §5d). The format is independent of N and of the loudness +algorithm — both RMS and a future LUFS profile quantize to the same `[0,1]`→byte wire shape. + +### 5c. When to compute + +- **On upload (decided, for new tracks):** `UnifiedTrackService.UploadAsync` already processes + the WAV (`AddTrackFromWavAsync` → `AudioProcessor`). Add the `WaveformProfileService` pass there, + in the same read, and persist the profile alongside the track. Cost is paid once, by the + uploader (CMS admin), off the listener's path. This is the natural seam. +- **CMS PreProcessing panel (decided, for existing tracks):** **not** a CLI command. Existing + vault tracks predate the feature, so they need an explicit generation path — surfaced **in the + CMS** (`DeepDrftManager`) rather than as an offline job. The CMS track grid shows which tracks + are missing a profile and offers **1-click generation** per track (and/or a bulk action). The + compute runs server-side via the same `WaveformProfileService`. See Phase 5 (§12) for the panel + design. +- **On demand + cache (rejected):** computing lazily on first profile request spreads cost to + first-listen and needs a cache layer + cold-start penalty. Not worth its complexity given + upload is the only ingest and the CMS panel covers the backlog explicitly. + +**Decided: compute on upload for new tracks; CMS PreProcessing panel for existing ones.** The +no-profile fallback (§4e) carries the UI in the meantime, so the seeker can ship before every +existing track has been processed. (Memory note: Daniel favours designing the seam now even when +deferring the feature — the no-profile fallback *is* that seam.) + +### 5d. Where the data lives — vault sidecar (decided) + +**Decided: option 3 — a sidecar in the FileDatabase vault + a dedicated endpoint (§6).** Store +the profile as its own vault entry (e.g. a `profiles` vault keyed by `EntryKey`, or a +`.profile`/`.wfp` companion next to the audio). The candidates and the reasoning for the choice: + +1. **New column on `TrackEntity` / `track` table** (`WaveformProfile byte[]` or `text`). Profile + rides with metadata. Pro: one fetch (`GET api/track/page` or `meta/{id}` already returns + `TrackDto`). Con: bloats every paged list response by ~512B × pageSize (20 → ~10KB/page) even + when the player isn't open; `TrackEntity` is described in `CLAUDE.md` as "a join, *only* + metadata" — a binary blob stretches that contract. + +2. **New column, but only returned by `meta/{id}` / a dedicated fetch — not by `page`.** Keeps + the list lean; the player fetches the profile when a track is selected. Needs the profile + field to be omittable from the paged DTO. (Second choice if a vault type is unwelcome — see + below.) + +3. **A sidecar in the FileDatabase vault** (**chosen**) — store the profile as its own vault + entry keyed by `EntryKey`. Pro: keeps it out of SQL entirely, near the binary it describes, + consistent with "binary content lives in the vault." Con: a second vault round-trip to serve + it; new endpoint. + +4. **Computed into the audio stream's header response** — no separate storage; return the profile + as a response header / preamble on `GET api/track/{id}`. Couples profile delivery to the audio + fetch. Awkward (headers for 512B, or a framing change to the WAV stream). Rejected. + +**Rationale for the vault sidecar:** + +- It honours the architectural line `CLAUDE.md` draws — `TrackEntity` stays pure metadata, the + vault owns "binary stuff about the audio." A loudness profile is derived binary content; it + belongs with the binary. +- It keeps the paged list response unchanged (no regression to `TracksView` load weight). +- It parallels the existing audio path exactly: the player already does a *separate* content + fetch (`TrackMediaClient` → `api/track/{id}`) distinct from the metadata fetch + (`TrackClient` → `api/track/page`). The profile is one more content fetch on track-select. + +**Fallback if the vault type proves unwelcome: option 2** (SQL column, served only on `meta/{id}` +or a dedicated route). Simpler to migrate (one EF column) but puts derived binary in SQL. Not the +chosen path; recorded so the alternative is on the table if the vault sidecar hits friction. + +The dual-database split here is real: metadata (SQL) vs derived-binary (vault). The profile is +derived binary. The vault sidecar keeps the split clean. + +--- + +## 6. New API surface + +Per the vault-sidecar storage (§5d, decided), add **one unauthenticated GET** that mirrors +the existing audio route's shape and proxy path: + +### `GET api/track/{trackId}/waveform` (DeepDrftAPI, unauthenticated) + +- **Route param `trackId`** (string) = `EntryKey`, same as `GET api/track/{trackId}`. +- Loads the stored profile for that entry (from the `profiles` vault / sidecar). +- Returns `200` with `WaveformProfileDto { int BucketCount; string Data; }` (base64 quantized + bytes), or `404` if no profile exists for that track (front end then renders the flat fallback, + §4e). +- Unauthenticated, like audio streaming — it's public listener data. + +**Proxy:** add the matching forward in `DeepDrftPublic/Controllers/TrackProxyController.cs` +(currently forwards `page` and `{trackId}`); add `{trackId}/waveform`. Same thin-proxy pattern, +no logic. + +**Client:** a method on `TrackMediaClient` (it owns the `DeepDrft.Content` client and the +content base address) — `GetWaveformProfileAsync(trackId) → ApiResult`. Keeps +the profile fetch on the content client, consistent with §5d's "profile is content." + +The CMS PreProcessing panel (§12 Phase 5) also needs server-side endpoints: a way to query which +tracks lack a profile and a way to trigger generation. Those are **authenticated CMS routes** on +`DeepDrftAPI` (ApiKey), distinct from this public read — see Phase 5 for their shape. + +### New model + +`WaveformProfileDto` in `DeepDrftModels` (new `DTOs/WaveformProfileDto.cs`): + +``` +public class WaveformProfileDto { + public int BucketCount { get; set; } + public string Data { get; set; } // base64 of byte[BucketCount], each 0..255 +} +``` + +`DeepDrftModels` is referenced by every project (`CLAUDE.md`), so both API and client see it. The +DTO carries no algorithm tag — it is loudness-in-`[0,1]` regardless of how it was computed. + +--- + +## 7. TypeScript seam + +**First cut: no TS changes required.** The preprocessed profile arrives as data over HTTP, +is decoded in C# (`WaveformProfileDto.Data` → `double[]`), and rendered by the Blazor component +with pure pointer events (§4c/§4d). The TS audio bundle (`DeepDrftPublic/Interop/audio/`) is +untouched. The live `SpectrumVisualizer` keeps using the existing +`startSpectrumAnimation`/`SpectrumAnalyzer` path verbatim — only its *position* in the markup +changes. + +**Deliberately deferred TS work (later enhancement, see §4e):** client-side loudness computation +from decoded `AudioBuffer`s for the no-profile fallback. That *would* need a new TS module +(e.g. `WaveformProfiler.ts`) reading `scheduler`'s decoded buffers and bucketing amplitude, plus +an interop method to stream buckets to Blazor as they fill. It mirrors `SpectrumAnalyzer`'s +callback pattern. **Not in the first cut** — the flat fallback covers the gap, and the CMS +PreProcessing panel removes most no-profile cases. Keep this seam in mind so the component's data +input is an abstract `double[]` that *could* later be fed by either source. + +This matters for the component contract: `WaveformSeeker` should take its profile as a +parameter/observable it doesn't care about the origin of — preprocessed today, possibly +live-computed later. Don't hard-wire it to the HTTP fetch. + +--- + +## 8. Frontend data flow + +``` +Track selected (TracksView.PlayTrack → PlayerService.SelectTrackStreaming) + │ + ├── (existing) audio: TrackMediaClient.GetTrackMedia(entryKey) → stream → TS decode → playback + │ + └── (new) profile: TrackMediaClient.GetWaveformProfileAsync(entryKey) → WaveformProfileDto + │ + └── decode base64 → double[] profile → WaveformSeeker.Profile +``` + +Wiring options for *who* fetches the profile and holds it: + +- **A. Player service holds it.** `StreamingAudioPlayerService` (or the base + `AudioPlayerService`) gains a `WaveformProfile` property, fetched when a track is selected, + exposed like `Duration`/`CurrentTime`. `WaveformSeeker` reads it off the cascaded + `IStreamingPlayerService`, re-rendering on `StateChanged` — the same pattern + `SpectrumVisualizer` and `AudioPlayerBar` already use. **Recommended:** the profile is part + of "current track state," and the player service is already the single source the seek zone + binds to. One place fetches, one place caches per track, cleared on `Unload`. + +- **B. WaveformSeeker fetches its own.** Component takes `EntryKey` + `TrackMediaClient`, + fetches in `OnParametersSet` when the key changes. Simpler to reason about in isolation but + duplicates "current track" knowledge the player already owns and risks double-fetch / stale + key on rapid track switches. + +- **C. A dedicated `WaveformProfileViewModel`** (MVVM convention in `CLAUDE.md`) scoped in DI, + fetches and caches by `EntryKey`, injected into the component. Cleanest separation, an extra + moving part. Reasonable if profiles get reused across views (e.g. mini-waveforms on track + cards later — see §10). + +**Recommend A for the first cut** (profile as player-service state — matches the established +binding pattern and the "one source, multiple views" instinct: the seeker is just another view +over current-track state). Promote to C later if profiles need to be consumed outside the +player (track-card waveforms). + +`CurrentTime` / `Duration` for the playhead come from the player service exactly as +`PlayerSeekZone` reads them today — no change. + +--- + +## 9. Component & file inventory + +New: + +- `DeepDrftPublic.Client/Controls/AudioPlayerBar/WaveformSeeker.razor` (+ `.razor.cs`, `.razor.css`) +- `DeepDrftModels/DTOs/WaveformProfileDto.cs` +- `DeepDrftContent/Processors/WaveformProfileService.cs` — owns the PCM walk, bucketing, + normalization, storage; takes an `ILoudnessAlgorithm`. +- `DeepDrftContent/Processors/ILoudnessAlgorithm.cs` — the swappable loudness strategy (§5a). +- `DeepDrftContent/Processors/RmsLoudnessAlgorithm.cs` — first implementation (RMS). LUFS is a + future sibling implementation, not built now. +- `WaveformProfileOptions` (config-bound) — carries `BucketCount` (default 512) and any future + algorithm-selection knob. +- DeepDrftAPI public read route `GET api/track/{trackId}/waveform` in `TrackController.cs` + + proxy in `TrackProxyController.cs`. +- DeepDrftAPI CMS routes (ApiKey) for the PreProcessing panel: query missing-profile tracks + + trigger generation (§12 Phase 5). +- `TrackMediaClient.GetWaveformProfileAsync` +- (storage) new `profiles` vault constant in `VaultConstants`. +- (CMS) PreProcessing panel surface in `DeepDrftManager` — see Phase 5 for the component/service + inventory (it lands with that phase, not the first cut). + +Changed: + +- `PlayerSeekZone.razor` — swap `MudSlider` block for ``; drop the + `` (moves to volume). +- `VolumeControls.razor` → renamed **`VolumeZone.razor`** (decided) — stack `` + above the volume slider. +- `AudioPlayerBar.razor.css` — adjust volume cluster to host the spectrum; seeker sizing. +- `SpectrumVisualizer` — set `BucketCount=24` for the narrow volume slot (§3c). +- `AudioPlayerBar.razor.cs` — minimal; seek callbacks already abstract. Possibly hold/clear + `WaveformProfile` if §8-A. +- `StreamingAudioPlayerService` / `AudioPlayerService` — add `WaveformProfile` state + fetch (§8-A). +- `UnifiedTrackService.UploadAsync` — compute + persist profile on upload via `WaveformProfileService`. + +Untouched (important): the entire TS audio bundle, the seek-beyond-buffer offset path, +`WavOffsetService`, the streaming decode pipeline. + +--- + +## 10. Future options this unlocks (don't build now, leave room for) + +- **LUFS (or other perceptual) loudness profile.** The `ILoudnessAlgorithm` seam (§5a) exists + precisely so this drops in as a second strategy without touching the component, wire format, or + storage. The cheapest of the future moves because the abstraction is built up front. +- **Track-card mini-waveforms.** Once profiles exist as a reusable resource, `TrackCard` could + show a tiny loudness sparkline. This is the argument for the §8-C `WaveformProfileViewModel` + eventually, and for storing profiles where non-player surfaces can fetch them cheaply (favours + the vault sidecar + endpoint, §5d-3). +- **Loudness-normalized playback / waveform colouring by energy.** The same profile data could + drive auto-gain or heat-coloured bars. +- **Live-computed profiles** for the no-profile case (§7 deferred TS). +- **Higher-res zoomed scrub** on long tracks (re-fetch a denser profile for a time window) — + why a generous, configurable stored N and client-side downsampling is worth it now. + +Keep the component's profile input origin-agnostic and the stored resolution generous so these +stay cheap to add. + +--- + +## 11. Decisions (resolved 2026-06-05) + +All seven forks below are **decided**. Recorded here so the rationale travels with the spec. + +1. **Storage location (§5d): vault sidecar + dedicated endpoint — decided ✓.** Profile is derived + binary; it lives in the vault, `TrackEntity` stays pure metadata, the paged list stays lean. + SQL-column-on-`meta/{id}` is the recorded fallback only if the vault type hits friction. +2. **Names (§1): component `WaveformSeeker`; data `WaveformProfile` (`WaveformProfileDto`, + `waveformBuckets`, `profile`) — decided ✓.** Honest naming; the data is named for the concept, + not the algorithm, so RMS→LUFS never forces a rename. +3. **Live-spectrum bucket count (§3c): 24 buckets, parameterized — decided ✓.** Set via + `BucketCount` on `SpectrumVisualizer` so it can be tuned without a code change. +4. **Stored resolution + wire format (§4a/§5b): N configurable (default 512) via + `WaveformProfileOptions`; quantized `byte[]` base64 — decided ✓.** Front end derives its + rendered bar count from available width regardless of N. +5. **Backfill (§5c): CMS PreProcessing panel, not a CLI — decided ✓.** The CMS track grid shows + missing-profile tracks and offers 1-click generation per track (and/or bulk); compute runs + server-side via `WaveformProfileService`. See §12 Phase 5. +6. **Normalization (§5a): peak-normalize — decided ✓.** Per-track shape over cross-track absolute + loudness; a future LUFS algorithm can normalize differently behind the same interface. +7. **`VolumeControls` → `VolumeZone` rename (§3b) — decided ✓.** Symmetry with the transport and + seek zones. + +**Cross-cutting decision (§5a):** the loudness measure is a swappable `ILoudnessAlgorithm`, RMS +first, LUFS the named future alternative — not hardwired to RMS. + +--- + +## 12. Implementation phases (ordered, delegable) + +Sequenced so each phase has a shippable deliverable and the UI can land before existing tracks +are all preprocessed. Phases 1–2 (backend) and phase 3 (layout move) are **parallelizable** — +they touch disjoint files and meet only at the client fetch in phase 4. §11 decisions are all +resolved, so there is no decisions-gate phase. + +**Phase 1 — Loudness computation + storage (backend).** `WaveformProfileService` in +`DeepDrftContent` (extend the existing PCM walk) with an `ILoudnessAlgorithm` strategy and the +`RmsLoudnessAlgorithm` first implementation. Wire into `UnifiedTrackService.UploadAsync` to +compute + persist on upload (vault sidecar, §5d). Add `WaveformProfileDto` to `DeepDrftModels` and +`WaveformProfileOptions` (default N=512). +*Deliverable:* new uploads get a stored profile; unit-test the RMS math against a known WAV, and +unit-test that a second `ILoudnessAlgorithm` swaps in cleanly (guards the abstraction). + +**Phase 2 — Public read API + proxy + client (backend/transport).** Add +`GET api/track/{trackId}/waveform`, the proxy forward, and `TrackMediaClient.GetWaveformProfileAsync`. +*Deliverable:* a track's profile is fetchable end-to-end over HTTP. Can be tested with curl +before any UI. + +**Phase 3 — Layout move (frontend, parallel with 1–2).** Move `SpectrumVisualizer` from +`PlayerSeekZone` into the volume cluster (renamed `VolumeZone`); adjust CSS (§3c); set +`BucketCount=24`. +*Deliverable:* live spectrum sits above the volume slider; seek zone temporarily keeps the +MudSlider (or a placeholder). Player still fully works. This de-risks the layout independently +of the new component. + +**Phase 4 — WaveformSeeker component (frontend, needs 2 + 3).** Build `WaveformSeeker.razor`: +DOM bars, played/unplayed split via clip overlay (§4d), pointer-capture seek (§4c), flat +fallback (§4e), rendered bar count derived from width. Wire profile via player-service state +(§8-A). Replace the MudSlider in `PlayerSeekZone` with it. +*Deliverable:* the new seekbar is live for tracks that have a profile; flat-but-seekable for +those that don't. + +**Phase 5 — CMS PreProcessing panel (CMS, after 1).** In `DeepDrftManager`, add a PreProcessing +feature to the CMS track grid: a column/indicator showing which tracks **lack a waveform profile** +and a per-track **Generate** action (and/or a bulk "generate all missing" action). The grid +queries missing-profile state and triggers generation through authenticated CMS API routes on +`DeepDrftAPI`; the compute runs server-side via the same `WaveformProfileService` (no CLI). New +surface roughly: a CMS service method on `ICmsTrackService`/`CmsTrackService` for +list-missing + generate, the backing `DeepDrftAPI` routes (ApiKey), and the grid column/action in +the Tracks CMS page. +*Deliverable:* a CMS admin can see and one-click-fill any missing profile; the no-profile fallback +becomes rare/never as the backlog is worked off in-app. + +**Deferred (not scheduled):** live client-side loudness compute (§7), track-card mini-waveforms +(§10), a LUFS `ILoudnessAlgorithm` (§5a/§10). Tracked here so the component contract stays +origin-agnostic and the algorithm stays swappable. + +--- + +## 13. What this plan deliberately does NOT do + +- Does not touch the streaming decode pipeline, seek-beyond-buffer, or `WavOffsetService`. +- Does not add an audio-processing dependency (NAudio etc.) for the RMS path — the existing PCM + parser suffices. (A future LUFS `ILoudnessAlgorithm` may revisit this, on its own merits.) +- Does not compute the profile on the playback path — preprocessed only (the whole point). +- Does not change `TrackEntity`'s metadata contract — the profile lives in the vault sidecar. +- Does not add a CLI; existing-track preprocessing is the in-CMS PreProcessing panel (§12 Phase 5). +- Does not require TS bundle changes in the first cut.