Files
daniel-c-harvey c10d315a7b docs(product): add approved WaveformSeeker spec
Loudness-waveform seekbar replacing MudSlider; ILoudnessAlgorithm
abstraction (RMS first, LUFS future); vault sidecar storage; CMS
PreProcessing panel for backfill; VolumeZone rename. All decisions
resolved 2026-06-05.
2026-06-05 15:44:40 -04:00

606 lines
34 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# WaveformSeeker — loudness-waveform seekbar to replace the MudSlider
Status: approved. Decisions resolved 2026-06-05. Author: product-designer. Date: 2026-06-05.
**Plan only — no code edits made by this doc.**
---
## 1. Summary
Replace the `MudSlider`-based scrub bar in `PlayerSeekZone.razor` with a new
`<WaveformSeeker/>` component that renders the track's **loudness profile** as a
high-density vertical bar chart and serves as the seek surface (click / drag to seek).
The point is to make the seekbar *informative*: instead of a featureless line, the
listener sees the track's energy shape — the quiet intro, the drop, the breakdown, the
outro — and can scrub against that shape. This is the established "waveform scrubber"
idiom from SoundCloud, Overcast, and most DAW transport bars. We are borrowing it
deliberately; the novel part for us is only that the profile is **preprocessed server-side
and shipped as a small quantized array**, so the visual paints the instant a track loads rather
than waiting for the audio to decode.
The loudness measure is **not hardcoded to RMS**. The first implementation computes RMS, but
the compute path is built around a swappable `ILoudnessAlgorithm` abstraction (§5a) so a
different perceptual loudness profile (e.g. LUFS) can be substituted later without touching the
component, the wire format, or the storage. The component and the data are named for the
*concept* (waveform / loudness profile), not the algorithm.
Two visualizations currently coexist in the seek zone. They are being separated by
*kind*:
- **Real-time spectrum** (FFT frequency bars, `SpectrumVisualizer.razor`) — a *live* readout
of "what is sounding right now." This moves **up, above the volume slider**.
- **Static loudness-over-time** (the new `WaveformSeeker`) — a *whole-track* readout of "how loud
is each moment." This takes over the seek area.
This is a clean conceptual split: live-frequency lives with the output level (volume),
whole-track-amplitude lives with the transport position (seek). The current arrangement
(real-time spectrum behind the seek slider) conflates the two.
### Naming (decided)
"Spectrum" properly means frequency content; what this component shows is **amplitude over
time**, not spectrum. The component is named honestly: **`WaveformSeeker`** (decided), which
reads correctly against the live `SpectrumVisualizer` (frequency) without implying FFT data is
in the payload. The *data* is named for the concept, not the algorithm: **`WaveformProfile`** /
`WaveformProfileDto` / `waveformBuckets` / a `profile` field — so substituting the loudness
algorithm (RMS → LUFS, §5a) never forces a rename of the type that carries it.
---
## 2. Current state (what we're changing)
The seek zone today (`PlayerSeekZone.razor`):
```razor
<MudStack Row="false" Spacing="0" Class="@Class">
<SpectrumVisualizer/> @* live FFT bars, sits on top *@
<div class="mx-3" @onpointerdown/up/leave>
<MudSlider .../> @* the scrub bar *@
</div>
<TimestampLabel CurrentTime=... Duration=.../> @* time text *@
</MudStack>
```
Relevant mechanics already in place that the new component must preserve:
- **Seek gesture plumbing** lives in `PlayerSeekZone.razor.cs`: `OnSeekStart` /
`OnSeekChange` / `OnSeekEnd` callbacks bubble to `AudioPlayerBar.razor.cs`, which sets
`_isSeeking`, tracks `_seekPosition`, and calls `PlayerService.Seek(position)` on release.
`DisplayTime` shows the drag position while seeking, real `CurrentTime` otherwise.
- **`CanSeek`** = `IsLoaded && Duration.HasValue && Duration > 0`. Seek is allowed during
streaming, including beyond the buffer (the offset-refetch path in
`StreamingAudioPlayerService` / `AudioPlayer.ts.seekBeyondBuffer`). The new component does
**not** touch that path — it only produces a target time and hands it to the same
`Seek(double)` call.
- **`SpectrumVisualizer`** is driven entirely by `AudioInteropService.StartSpectrumAnimationAsync`,
which subscribes a callback to the TS `SpectrumAnalyzer` (live FFT, ~30fps). It already
self-manages animation lifecycle off `PlayerService.StateChanged`. Moving it is a pure
layout move — no logic change.
- **Player layout** (`AudioPlayerBar.razor.css`) is pure-CSS responsive: at ≥600px the row is
`[transport] [seek grows] [volume]`; at <600px it's `[transport][volume]` then full-width
seek below. Wherever the spectrum lands, it must respect this.
---
## 3. UI layout changes
### 3a. What moves
| Element | Today | After |
|---|---|---|
| Live FFT spectrum (`SpectrumVisualizer`) | Inside `PlayerSeekZone`, above the slider | Inside the **volume cluster**, above the volume slider |
| Scrub bar (`MudSlider`) | `PlayerSeekZone` | Replaced by `WaveformSeeker` (loudness bars + playhead) |
| Timestamp (`TimestampLabel`) | Below the slider in `PlayerSeekZone` | Stays with the seeker (below or overlaid on the bars) |
| Volume slider (`VolumeControls`) | Right cluster | Unchanged position; now has the live spectrum stacked above it |
### 3b. Resulting zones
- **Transport zone** — unchanged (play/pause/stop + load spinner).
- **Volume zone** — becomes a small vertical stack: live FFT spectrum on top, volume
slider below. This is a natural pairing ("here's the live output, here's how loud").
`VolumeControls.razor` gets the `<SpectrumVisualizer/>` stacked above its existing
`MudStack`. The wrapper is renamed `VolumeZone` (**decided**) for symmetry with the other
two zones.
- **Seek zone** — becomes the `WaveformSeeker`: a wide loudness bar chart that grows to fill the
available width (it inherits the `flex-grow:1` the seek zone has today), with the
timestamp beneath.
### 3c. Layout risk
The live spectrum is currently a wide element. Stacking it above the *volume* slider
constrains it to the narrow right cluster — at ≥600px the volume cluster is only as wide as
the slider (the CSS halves and flex-start-pins it per commit `78c6803`). A 32-bucket FFT bar
chart squeezed into ~120px will look cramped.
**Decided: 24 buckets in the volume cluster, parameterized.** The live spectrum renders **24
buckets** in the narrow volume slot, set via the existing `BucketCount` parameter on
`SpectrumVisualizer` so the count can be tuned without a code change to the component. 24 reads
denser than 16 while still fitting the ~120px cluster comfortably.
---
## 4. WaveformSeeker component design
### 4a. Data → geometry
The component receives a normalized loudness profile: `double[] profile`, each value in `[0,1]`,
representing the loudness measure of a contiguous time slice. Profile length is **N buckets**
covering the whole track regardless of duration (fixed bucket count, variable bucket
*duration*). Each bucket renders as one vertical bar; bar height = `profile[i]` scaled to the
component height (with a small floor, ~2%, so silence is still visible as a hairline — mirrors
`SpectrumVisualizer.GetBarHeight`).
**Bar count.** Two regimes:
- **Preprocessed resolution (N):** how many buckets the backend computes and stores. **N is
configurable** (e.g. via `WaveformProfileOptions` bound from DI/config), **default 512**. A
high source resolution lets the front end downsample to whatever fits the rendered width
without re-fetching. Storage is tiny regardless of N (see §5).
- **Rendered resolution:** how many bars actually draw, = pixels-available / (bar + gap). **The
front end derives its rendered bar count from the available width, regardless of N** — it does
not assume the stored N is the bar count. At a typical ~600px seek zone with 2px bars + 1px
gaps that's ~200 bars. The component **downsamples N → rendered count** by max-or-mean over
each rendered bucket's source range. Use **max** (peak) for the visual — peak-per-bucket gives
the punchy DAW look; mean flattens transients.
**Decided: N configurable, default 512; rendered count derived from width; downsample by peak.**
512 is a clean power-of-two, downsamples evenly to 256/128/64, and is ~512B on the wire as
quantized bytes (§5b). The wire format is the quantized `byte[]` base64 either way; N being
configurable does not change the format.
### 4b. Playhead / progress indication
The current position is shown two ways simultaneously (both cheap, both standard):
1. **Played/unplayed split** — bars left of the playhead render in the played colour (moss
green `--deepdrft-green-accent`, matching the house waveform identity called out in
`track-card-theming.md`), bars right render muted. The split point = `CurrentTime / Duration`.
2. **Playhead line** — a 12px vertical rule at the split, for precision.
While dragging, the split/line follow the pointer (`DisplayTime`), not playback — same
`_isSeeking` discipline as today.
### 4c. Interaction model
Pointer-based, reusing the existing callback contract so `AudioPlayerBar.razor.cs` is barely
touched:
- **Hover** → a faint preview line at the cursor + a tooltip/label showing the time under the
cursor (`hoverTime = (cursorX / width) * Duration`). Preview only; no seek. (New affordance;
the MudSlider had none. Borrowed from SoundCloud/YouTube scrubbers.)
- **Click** → seek to `clickX / width * Duration`. Fires `OnSeekStart` then immediately
`OnSeekEnd(clickTime)`.
- **Drag** → `pointerdown` starts seeking (`OnSeekStart`), `pointermove` updates the preview
position and fires `OnSeekChange(t)` (so `DisplayTime` and the played/unplayed split track
the drag live), `pointerup` commits (`OnSeekEnd(t)``PlayerService.Seek(t)`).
`pointerleave` while dragging commits at the last position (matches current
`HandlePointerLeave` behaviour) — or, better, use **pointer capture** (`setPointerCapture`)
so a drag that leaves the element keeps tracking until release. Recommend pointer capture;
it's the more forgiving gesture and avoids the "lost the drag" feel.
Position math needs the element's pixel width and the pointer's offset. Two implementations:
- **Pure Blazor:** use `@onpointermove`/`@onpointerdown` with `PointerEventArgs.OffsetX` and a
cached bounding width (one JS `getBoundingClientRect` call on resize). Simple, no per-frame
interop.
- **Thin JS helper:** a tiny interop that does hit-testing and returns a normalized `[0,1]`
fraction. Only worth it if `OffsetX` proves unreliable across the responsive reflows.
**Recommend pure-Blazor pointer events first**, with `OffsetX`/cached width; fall back to a JS
helper only if hit-testing is flaky. Keeps the new surface out of the TS bundle (see §7).
### 4d. Rendering approach
- **DOM bars** (one `<div>` per rendered bar, CSS `--bar-height`) — exactly how
`SpectrumVisualizer` works today, so it's consistent and themeable via existing
`deepdrft-` tokens. At ~200 bars this is fine; Blazor diffing over 200 static divs that only
change a CSS var on seek is cheap.
- **Canvas** — one `<canvas>`, drawn via a small JS interop on load + on playhead move. Scales
to thousands of bars and avoids 200-node diffs, but pulls the component into the JS interop
layer and complicates theming (canvas can't read CSS vars without plumbing).
**Recommend DOM bars** to match the existing visualizer and stay in pure Blazor/CSS. Revisit
canvas only if profiling shows the seek-time re-render (recolouring the split) janks. The
played/unplayed split can be done **without** re-rendering every bar by overlaying a clipped
coloured layer — render the bars once in the played colour, lay a muted-colour copy clipped to
`width * (1 - progress)` from the right on top. Then a seek only moves one clip rect, not 200
divs. This is the key perf trick; call it out for the implementer.
### 4e. No-profile-yet state (important)
A track may have no stored loudness profile (legacy tracks uploaded before this feature; profile
fetch failed; profile still computing). The component must degrade, not break:
- **Fallback bars:** render a flat row of floor-height bars (or a gentle idle shimmer) so the
control still reads as a seekbar and **remains fully seekable** (geometry is just time/width;
it needs no profile data to seek). Seek must never depend on the profile being present.
- **Optional client-side compute:** once audio is decoded, the front end *could* compute a
loudness profile from the decoded `AudioBuffer`s and fill the bars live (progressive reveal as
the stream decodes). This is a real fallback but adds a TS path (§7); treat as a **later
enhancement**, not part of the first cut. First cut: preprocessed profile or flat fallback.
**Recommend: first cut ships preprocessed-or-flat. Seekability is never gated on the profile.**
---
## 5. Backend loudness preprocessing
This is the load-bearing design decision. Three sub-questions: **how to compute**, **when to
compute**, **where to store**.
### 5a. How to compute (swappable loudness algorithm)
**The loudness measure is an abstraction, not a hardwired RMS pass** (decided). `WaveformProfileService`
in `DeepDrftContent` owns the PCM walk, bucketing, normalization, and storage; the per-bucket
loudness calculation is delegated to an injected **`ILoudnessAlgorithm`** strategy. The first
implementation is RMS (`RmsLoudnessAlgorithm`); **LUFS** (or another perceptual profile) is the
named future alternative, droppable in as a second `ILoudnessAlgorithm` without touching the
service, the wire format, the storage, or the component.
Sketch of the seam (illustrative, not prescriptive):
```
interface ILoudnessAlgorithm {
// given the mono samples for one time slice, return its loudness in [0,1]-able units
double Measure(ReadOnlySpan<float> sliceSamples);
}
// first impl: RMS — sqrt(mean(sample²)). future: LUFS (K-weighting + gating).
```
We already own a PCM-WAV parser: `AudioProcessor` in `DeepDrftContent` parses RIFF/WAVE/fmt/
data, validates PCM, and knows channels / sampleRate / bitsPerSample / blockAlign / dataSize.
Computing the profile is a straightforward extension of that same buffer walk — **no new audio
library needed for RMS**. The stack is PCM-only WAV today (`AudioProcessor` rejects non-PCM), so
`WaveformProfileService` can read samples directly:
1. Locate the `data` chunk (already done in `ValidateWavStructure` / `FindChunk`).
2. Walk the PCM samples, decode per `bitsPerSample` (16/24/32-bit signed; 8-bit unsigned),
average channels to mono.
3. Partition the sample stream into **N equal time slices** (N from `WaveformProfileOptions`,
default 512); hand each slice to `ILoudnessAlgorithm.Measure` to get `bucket[i]`.
4. Normalize: divide by the max bucket (**peak-normalize** to `[0,1]`, decided) so quiet tracks
still show shape. (Trade-off: peak-normalize loses absolute-loudness comparison *between*
tracks. Acceptable — the seeker is about *this* track's shape, not cross-track loudness. A
future LUFS algorithm that wants absolute units can normalize differently behind the same
interface.)
The PCM walk + bucketing + RMS `Measure` is ~40 lines. No external dependency for the RMS path.
**Do not pull in NAudio or similar** for RMS — the existing parser already does the hard part. A
future LUFS implementation may justify a dep; if so, that decision rides with *that* algorithm,
not the service.
Cost: one linear pass over the PCM buffer. For a 100MB WAV that's ~25M stereo samples — a few
hundred ms, done **once at upload**, never on the playback path.
### 5b. Data format on the wire
Front end needs `double[] profile` length N, each `[0,1]`. **Decided: quantized `byte[]` (each
bucket 0255), base64 in JSON**, decoded to `[0,1]` client-side (`b/255.0`). 8-bit quantization
is *visually* lossless for a bar chart; at N=512 that's 512 bytes raw / ~684 chars base64 —
negligible to store and ship, and it keeps the profile from bloating the metadata payload if it
ever rides along with `TrackDto` (see §5d). The format is independent of N and of the loudness
algorithm — both RMS and a future LUFS profile quantize to the same `[0,1]`→byte wire shape.
### 5c. When to compute
- **On upload (decided, for new tracks):** `UnifiedTrackService.UploadAsync` already processes
the WAV (`AddTrackFromWavAsync``AudioProcessor`). Add the `WaveformProfileService` pass there,
in the same read, and persist the profile alongside the track. Cost is paid once, by the
uploader (CMS admin), off the listener's path. This is the natural seam.
- **CMS PreProcessing panel (decided, for existing tracks):** **not** a CLI command. Existing
vault tracks predate the feature, so they need an explicit generation path — surfaced **in the
CMS** (`DeepDrftManager`) rather than as an offline job. The CMS track grid shows which tracks
are missing a profile and offers **1-click generation** per track (and/or a bulk action). The
compute runs server-side via the same `WaveformProfileService`. See Phase 5 (§12) for the panel
design.
- **On demand + cache (rejected):** computing lazily on first profile request spreads cost to
first-listen and needs a cache layer + cold-start penalty. Not worth its complexity given
upload is the only ingest and the CMS panel covers the backlog explicitly.
**Decided: compute on upload for new tracks; CMS PreProcessing panel for existing ones.** The
no-profile fallback (§4e) carries the UI in the meantime, so the seeker can ship before every
existing track has been processed. (Memory note: Daniel favours designing the seam now even when
deferring the feature — the no-profile fallback *is* that seam.)
### 5d. Where the data lives — vault sidecar (decided)
**Decided: option 3 — a sidecar in the FileDatabase vault + a dedicated endpoint (§6).** Store
the profile as its own vault entry (e.g. a `profiles` vault keyed by `EntryKey`, or a
`.profile`/`.wfp` companion next to the audio). The candidates and the reasoning for the choice:
1. **New column on `TrackEntity` / `track` table** (`WaveformProfile byte[]` or `text`). Profile
rides with metadata. Pro: one fetch (`GET api/track/page` or `meta/{id}` already returns
`TrackDto`). Con: bloats every paged list response by ~512B × pageSize (20 → ~10KB/page) even
when the player isn't open; `TrackEntity` is described in `CLAUDE.md` as "a join, *only*
metadata" — a binary blob stretches that contract.
2. **New column, but only returned by `meta/{id}` / a dedicated fetch — not by `page`.** Keeps
the list lean; the player fetches the profile when a track is selected. Needs the profile
field to be omittable from the paged DTO. (Second choice if a vault type is unwelcome — see
below.)
3. **A sidecar in the FileDatabase vault** (**chosen**) — store the profile as its own vault
entry keyed by `EntryKey`. Pro: keeps it out of SQL entirely, near the binary it describes,
consistent with "binary content lives in the vault." Con: a second vault round-trip to serve
it; new endpoint.
4. **Computed into the audio stream's header response** — no separate storage; return the profile
as a response header / preamble on `GET api/track/{id}`. Couples profile delivery to the audio
fetch. Awkward (headers for 512B, or a framing change to the WAV stream). Rejected.
**Rationale for the vault sidecar:**
- It honours the architectural line `CLAUDE.md` draws — `TrackEntity` stays pure metadata, the
vault owns "binary stuff about the audio." A loudness profile is derived binary content; it
belongs with the binary.
- It keeps the paged list response unchanged (no regression to `TracksView` load weight).
- It parallels the existing audio path exactly: the player already does a *separate* content
fetch (`TrackMediaClient``api/track/{id}`) distinct from the metadata fetch
(`TrackClient``api/track/page`). The profile is one more content fetch on track-select.
**Fallback if the vault type proves unwelcome: option 2** (SQL column, served only on `meta/{id}`
or a dedicated route). Simpler to migrate (one EF column) but puts derived binary in SQL. Not the
chosen path; recorded so the alternative is on the table if the vault sidecar hits friction.
The dual-database split here is real: metadata (SQL) vs derived-binary (vault). The profile is
derived binary. The vault sidecar keeps the split clean.
---
## 6. New API surface
Per the vault-sidecar storage (§5d, decided), add **one unauthenticated GET** that mirrors
the existing audio route's shape and proxy path:
### `GET api/track/{trackId}/waveform` (DeepDrftAPI, unauthenticated)
- **Route param `trackId`** (string) = `EntryKey`, same as `GET api/track/{trackId}`.
- Loads the stored profile for that entry (from the `profiles` vault / sidecar).
- Returns `200` with `WaveformProfileDto { int BucketCount; string Data; }` (base64 quantized
bytes), or `404` if no profile exists for that track (front end then renders the flat fallback,
§4e).
- Unauthenticated, like audio streaming — it's public listener data.
**Proxy:** add the matching forward in `DeepDrftPublic/Controllers/TrackProxyController.cs`
(currently forwards `page` and `{trackId}`); add `{trackId}/waveform`. Same thin-proxy pattern,
no logic.
**Client:** a method on `TrackMediaClient` (it owns the `DeepDrft.Content` client and the
content base address) — `GetWaveformProfileAsync(trackId) → ApiResult<WaveformProfileDto>`. Keeps
the profile fetch on the content client, consistent with §5d's "profile is content."
The CMS PreProcessing panel (§12 Phase 5) also needs server-side endpoints: a way to query which
tracks lack a profile and a way to trigger generation. Those are **authenticated CMS routes** on
`DeepDrftAPI` (ApiKey), distinct from this public read — see Phase 5 for their shape.
### New model
`WaveformProfileDto` in `DeepDrftModels` (new `DTOs/WaveformProfileDto.cs`):
```
public class WaveformProfileDto {
public int BucketCount { get; set; }
public string Data { get; set; } // base64 of byte[BucketCount], each 0..255
}
```
`DeepDrftModels` is referenced by every project (`CLAUDE.md`), so both API and client see it. The
DTO carries no algorithm tag — it is loudness-in-`[0,1]` regardless of how it was computed.
---
## 7. TypeScript seam
**First cut: no TS changes required.** The preprocessed profile arrives as data over HTTP,
is decoded in C# (`WaveformProfileDto.Data``double[]`), and rendered by the Blazor component
with pure pointer events (§4c/§4d). The TS audio bundle (`DeepDrftPublic/Interop/audio/`) is
untouched. The live `SpectrumVisualizer` keeps using the existing
`startSpectrumAnimation`/`SpectrumAnalyzer` path verbatim — only its *position* in the markup
changes.
**Deliberately deferred TS work (later enhancement, see §4e):** client-side loudness computation
from decoded `AudioBuffer`s for the no-profile fallback. That *would* need a new TS module
(e.g. `WaveformProfiler.ts`) reading `scheduler`'s decoded buffers and bucketing amplitude, plus
an interop method to stream buckets to Blazor as they fill. It mirrors `SpectrumAnalyzer`'s
callback pattern. **Not in the first cut** — the flat fallback covers the gap, and the CMS
PreProcessing panel removes most no-profile cases. Keep this seam in mind so the component's data
input is an abstract `double[]` that *could* later be fed by either source.
This matters for the component contract: `WaveformSeeker` should take its profile as a
parameter/observable it doesn't care about the origin of — preprocessed today, possibly
live-computed later. Don't hard-wire it to the HTTP fetch.
---
## 8. Frontend data flow
```
Track selected (TracksView.PlayTrack → PlayerService.SelectTrackStreaming)
├── (existing) audio: TrackMediaClient.GetTrackMedia(entryKey) → stream → TS decode → playback
└── (new) profile: TrackMediaClient.GetWaveformProfileAsync(entryKey) → WaveformProfileDto
└── decode base64 → double[] profile → WaveformSeeker.Profile
```
Wiring options for *who* fetches the profile and holds it:
- **A. Player service holds it.** `StreamingAudioPlayerService` (or the base
`AudioPlayerService`) gains a `WaveformProfile` property, fetched when a track is selected,
exposed like `Duration`/`CurrentTime`. `WaveformSeeker` reads it off the cascaded
`IStreamingPlayerService`, re-rendering on `StateChanged` — the same pattern
`SpectrumVisualizer` and `AudioPlayerBar` already use. **Recommended:** the profile is part
of "current track state," and the player service is already the single source the seek zone
binds to. One place fetches, one place caches per track, cleared on `Unload`.
- **B. WaveformSeeker fetches its own.** Component takes `EntryKey` + `TrackMediaClient`,
fetches in `OnParametersSet` when the key changes. Simpler to reason about in isolation but
duplicates "current track" knowledge the player already owns and risks double-fetch / stale
key on rapid track switches.
- **C. A dedicated `WaveformProfileViewModel`** (MVVM convention in `CLAUDE.md`) scoped in DI,
fetches and caches by `EntryKey`, injected into the component. Cleanest separation, an extra
moving part. Reasonable if profiles get reused across views (e.g. mini-waveforms on track
cards later — see §10).
**Recommend A for the first cut** (profile as player-service state — matches the established
binding pattern and the "one source, multiple views" instinct: the seeker is just another view
over current-track state). Promote to C later if profiles need to be consumed outside the
player (track-card waveforms).
`CurrentTime` / `Duration` for the playhead come from the player service exactly as
`PlayerSeekZone` reads them today — no change.
---
## 9. Component & file inventory
New:
- `DeepDrftPublic.Client/Controls/AudioPlayerBar/WaveformSeeker.razor` (+ `.razor.cs`, `.razor.css`)
- `DeepDrftModels/DTOs/WaveformProfileDto.cs`
- `DeepDrftContent/Processors/WaveformProfileService.cs` — owns the PCM walk, bucketing,
normalization, storage; takes an `ILoudnessAlgorithm`.
- `DeepDrftContent/Processors/ILoudnessAlgorithm.cs` — the swappable loudness strategy (§5a).
- `DeepDrftContent/Processors/RmsLoudnessAlgorithm.cs` — first implementation (RMS). LUFS is a
future sibling implementation, not built now.
- `WaveformProfileOptions` (config-bound) — carries `BucketCount` (default 512) and any future
algorithm-selection knob.
- DeepDrftAPI public read route `GET api/track/{trackId}/waveform` in `TrackController.cs` +
proxy in `TrackProxyController.cs`.
- DeepDrftAPI CMS routes (ApiKey) for the PreProcessing panel: query missing-profile tracks +
trigger generation (§12 Phase 5).
- `TrackMediaClient.GetWaveformProfileAsync`
- (storage) new `profiles` vault constant in `VaultConstants`.
- (CMS) PreProcessing panel surface in `DeepDrftManager` — see Phase 5 for the component/service
inventory (it lands with that phase, not the first cut).
Changed:
- `PlayerSeekZone.razor` — swap `MudSlider` block for `<WaveformSeeker/>`; drop the
`<SpectrumVisualizer/>` (moves to volume).
- `VolumeControls.razor` → renamed **`VolumeZone.razor`** (decided) — stack `<SpectrumVisualizer/>`
above the volume slider.
- `AudioPlayerBar.razor.css` — adjust volume cluster to host the spectrum; seeker sizing.
- `SpectrumVisualizer` — set `BucketCount=24` for the narrow volume slot (§3c).
- `AudioPlayerBar.razor.cs` — minimal; seek callbacks already abstract. Possibly hold/clear
`WaveformProfile` if §8-A.
- `StreamingAudioPlayerService` / `AudioPlayerService` — add `WaveformProfile` state + fetch (§8-A).
- `UnifiedTrackService.UploadAsync` — compute + persist profile on upload via `WaveformProfileService`.
Untouched (important): the entire TS audio bundle, the seek-beyond-buffer offset path,
`WavOffsetService`, the streaming decode pipeline.
---
## 10. Future options this unlocks (don't build now, leave room for)
- **LUFS (or other perceptual) loudness profile.** The `ILoudnessAlgorithm` seam (§5a) exists
precisely so this drops in as a second strategy without touching the component, wire format, or
storage. The cheapest of the future moves because the abstraction is built up front.
- **Track-card mini-waveforms.** Once profiles exist as a reusable resource, `TrackCard` could
show a tiny loudness sparkline. This is the argument for the §8-C `WaveformProfileViewModel`
eventually, and for storing profiles where non-player surfaces can fetch them cheaply (favours
the vault sidecar + endpoint, §5d-3).
- **Loudness-normalized playback / waveform colouring by energy.** The same profile data could
drive auto-gain or heat-coloured bars.
- **Live-computed profiles** for the no-profile case (§7 deferred TS).
- **Higher-res zoomed scrub** on long tracks (re-fetch a denser profile for a time window) —
why a generous, configurable stored N and client-side downsampling is worth it now.
Keep the component's profile input origin-agnostic and the stored resolution generous so these
stay cheap to add.
---
## 11. Decisions (resolved 2026-06-05)
All seven forks below are **decided**. Recorded here so the rationale travels with the spec.
1. **Storage location (§5d): vault sidecar + dedicated endpoint — decided ✓.** Profile is derived
binary; it lives in the vault, `TrackEntity` stays pure metadata, the paged list stays lean.
SQL-column-on-`meta/{id}` is the recorded fallback only if the vault type hits friction.
2. **Names (§1): component `WaveformSeeker`; data `WaveformProfile` (`WaveformProfileDto`,
`waveformBuckets`, `profile`) — decided ✓.** Honest naming; the data is named for the concept,
not the algorithm, so RMS→LUFS never forces a rename.
3. **Live-spectrum bucket count (§3c): 24 buckets, parameterized — decided ✓.** Set via
`BucketCount` on `SpectrumVisualizer` so it can be tuned without a code change.
4. **Stored resolution + wire format (§4a/§5b): N configurable (default 512) via
`WaveformProfileOptions`; quantized `byte[]` base64 — decided ✓.** Front end derives its
rendered bar count from available width regardless of N.
5. **Backfill (§5c): CMS PreProcessing panel, not a CLI — decided ✓.** The CMS track grid shows
missing-profile tracks and offers 1-click generation per track (and/or bulk); compute runs
server-side via `WaveformProfileService`. See §12 Phase 5.
6. **Normalization (§5a): peak-normalize — decided ✓.** Per-track shape over cross-track absolute
loudness; a future LUFS algorithm can normalize differently behind the same interface.
7. **`VolumeControls``VolumeZone` rename (§3b) — decided ✓.** Symmetry with the transport and
seek zones.
**Cross-cutting decision (§5a):** the loudness measure is a swappable `ILoudnessAlgorithm`, RMS
first, LUFS the named future alternative — not hardwired to RMS.
---
## 12. Implementation phases (ordered, delegable)
Sequenced so each phase has a shippable deliverable and the UI can land before existing tracks
are all preprocessed. Phases 12 (backend) and phase 3 (layout move) are **parallelizable**
they touch disjoint files and meet only at the client fetch in phase 4. §11 decisions are all
resolved, so there is no decisions-gate phase.
**Phase 1 — Loudness computation + storage (backend).** `WaveformProfileService` in
`DeepDrftContent` (extend the existing PCM walk) with an `ILoudnessAlgorithm` strategy and the
`RmsLoudnessAlgorithm` first implementation. Wire into `UnifiedTrackService.UploadAsync` to
compute + persist on upload (vault sidecar, §5d). Add `WaveformProfileDto` to `DeepDrftModels` and
`WaveformProfileOptions` (default N=512).
*Deliverable:* new uploads get a stored profile; unit-test the RMS math against a known WAV, and
unit-test that a second `ILoudnessAlgorithm` swaps in cleanly (guards the abstraction).
**Phase 2 — Public read API + proxy + client (backend/transport).** Add
`GET api/track/{trackId}/waveform`, the proxy forward, and `TrackMediaClient.GetWaveformProfileAsync`.
*Deliverable:* a track's profile is fetchable end-to-end over HTTP. Can be tested with curl
before any UI.
**Phase 3 — Layout move (frontend, parallel with 12).** Move `SpectrumVisualizer` from
`PlayerSeekZone` into the volume cluster (renamed `VolumeZone`); adjust CSS (§3c); set
`BucketCount=24`.
*Deliverable:* live spectrum sits above the volume slider; seek zone temporarily keeps the
MudSlider (or a placeholder). Player still fully works. This de-risks the layout independently
of the new component.
**Phase 4 — WaveformSeeker component (frontend, needs 2 + 3).** Build `WaveformSeeker.razor`:
DOM bars, played/unplayed split via clip overlay (§4d), pointer-capture seek (§4c), flat
fallback (§4e), rendered bar count derived from width. Wire profile via player-service state
(§8-A). Replace the MudSlider in `PlayerSeekZone` with it.
*Deliverable:* the new seekbar is live for tracks that have a profile; flat-but-seekable for
those that don't.
**Phase 5 — CMS PreProcessing panel (CMS, after 1).** In `DeepDrftManager`, add a PreProcessing
feature to the CMS track grid: a column/indicator showing which tracks **lack a waveform profile**
and a per-track **Generate** action (and/or a bulk "generate all missing" action). The grid
queries missing-profile state and triggers generation through authenticated CMS API routes on
`DeepDrftAPI`; the compute runs server-side via the same `WaveformProfileService` (no CLI). New
surface roughly: a CMS service method on `ICmsTrackService`/`CmsTrackService` for
list-missing + generate, the backing `DeepDrftAPI` routes (ApiKey), and the grid column/action in
the Tracks CMS page.
*Deliverable:* a CMS admin can see and one-click-fill any missing profile; the no-profile fallback
becomes rare/never as the backlog is worked off in-app.
**Deferred (not scheduled):** live client-side loudness compute (§7), track-card mini-waveforms
(§10), a LUFS `ILoudnessAlgorithm` (§5a/§10). Tracked here so the component contract stays
origin-agnostic and the algorithm stays swappable.
---
## 13. What this plan deliberately does NOT do
- Does not touch the streaming decode pipeline, seek-beyond-buffer, or `WavOffsetService`.
- Does not add an audio-processing dependency (NAudio etc.) for the RMS path — the existing PCM
parser suffices. (A future LUFS `ILoudnessAlgorithm` may revisit this, on its own merits.)
- Does not compute the profile on the playback path — preprocessed only (the whole point).
- Does not change `TrackEntity`'s metadata contract — the profile lives in the vault sidecar.
- Does not add a CLI; existing-track preprocessing is the in-CMS PreProcessing panel (§12 Phase 5).
- Does not require TS bundle changes in the first cut.