Loudness-waveform seekbar replacing MudSlider; ILoudnessAlgorithm abstraction (RMS first, LUFS future); vault sidecar storage; CMS PreProcessing panel for backfill; VolumeZone rename. All decisions resolved 2026-06-05.
34 KiB
WaveformSeeker — loudness-waveform seekbar to replace the MudSlider
Status: approved. Decisions resolved 2026-06-05. Author: product-designer. Date: 2026-06-05. Plan only — no code edits made by this doc.
1. Summary
Replace the MudSlider-based scrub bar in PlayerSeekZone.razor with a new
<WaveformSeeker/> component that renders the track's loudness profile as a
high-density vertical bar chart and serves as the seek surface (click / drag to seek).
The point is to make the seekbar informative: instead of a featureless line, the listener sees the track's energy shape — the quiet intro, the drop, the breakdown, the outro — and can scrub against that shape. This is the established "waveform scrubber" idiom from SoundCloud, Overcast, and most DAW transport bars. We are borrowing it deliberately; the novel part for us is only that the profile is preprocessed server-side and shipped as a small quantized array, so the visual paints the instant a track loads rather than waiting for the audio to decode.
The loudness measure is not hardcoded to RMS. The first implementation computes RMS, but
the compute path is built around a swappable ILoudnessAlgorithm abstraction (§5a) so a
different perceptual loudness profile (e.g. LUFS) can be substituted later without touching the
component, the wire format, or the storage. The component and the data are named for the
concept (waveform / loudness profile), not the algorithm.
Two visualizations currently coexist in the seek zone. They are being separated by kind:
- Real-time spectrum (FFT frequency bars,
SpectrumVisualizer.razor) — a live readout of "what is sounding right now." This moves up, above the volume slider. - Static loudness-over-time (the new
WaveformSeeker) — a whole-track readout of "how loud is each moment." This takes over the seek area.
This is a clean conceptual split: live-frequency lives with the output level (volume), whole-track-amplitude lives with the transport position (seek). The current arrangement (real-time spectrum behind the seek slider) conflates the two.
Naming (decided)
"Spectrum" properly means frequency content; what this component shows is amplitude over
time, not spectrum. The component is named honestly: WaveformSeeker (decided), which
reads correctly against the live SpectrumVisualizer (frequency) without implying FFT data is
in the payload. The data is named for the concept, not the algorithm: WaveformProfile /
WaveformProfileDto / waveformBuckets / a profile field — so substituting the loudness
algorithm (RMS → LUFS, §5a) never forces a rename of the type that carries it.
2. Current state (what we're changing)
The seek zone today (PlayerSeekZone.razor):
<MudStack Row="false" Spacing="0" Class="@Class">
<SpectrumVisualizer/> @* live FFT bars, sits on top *@
<div class="mx-3" @onpointerdown/up/leave>
<MudSlider .../> @* the scrub bar *@
</div>
<TimestampLabel CurrentTime=... Duration=.../> @* time text *@
</MudStack>
Relevant mechanics already in place that the new component must preserve:
- Seek gesture plumbing lives in
PlayerSeekZone.razor.cs:OnSeekStart/OnSeekChange/OnSeekEndcallbacks bubble toAudioPlayerBar.razor.cs, which sets_isSeeking, tracks_seekPosition, and callsPlayerService.Seek(position)on release.DisplayTimeshows the drag position while seeking, realCurrentTimeotherwise. CanSeek=IsLoaded && Duration.HasValue && Duration > 0. Seek is allowed during streaming, including beyond the buffer (the offset-refetch path inStreamingAudioPlayerService/AudioPlayer.ts.seekBeyondBuffer). The new component does not touch that path — it only produces a target time and hands it to the sameSeek(double)call.SpectrumVisualizeris driven entirely byAudioInteropService.StartSpectrumAnimationAsync, which subscribes a callback to the TSSpectrumAnalyzer(live FFT, ~30fps). It already self-manages animation lifecycle offPlayerService.StateChanged. Moving it is a pure layout move — no logic change.- Player layout (
AudioPlayerBar.razor.css) is pure-CSS responsive: at ≥600px the row is[transport] [seek grows] [volume]; at <600px it's[transport][volume]then full-width seek below. Wherever the spectrum lands, it must respect this.
3. UI layout changes
3a. What moves
| Element | Today | After |
|---|---|---|
Live FFT spectrum (SpectrumVisualizer) |
Inside PlayerSeekZone, above the slider |
Inside the volume cluster, above the volume slider |
Scrub bar (MudSlider) |
PlayerSeekZone |
Replaced by WaveformSeeker (loudness bars + playhead) |
Timestamp (TimestampLabel) |
Below the slider in PlayerSeekZone |
Stays with the seeker (below or overlaid on the bars) |
Volume slider (VolumeControls) |
Right cluster | Unchanged position; now has the live spectrum stacked above it |
3b. Resulting zones
- Transport zone — unchanged (play/pause/stop + load spinner).
- Volume zone — becomes a small vertical stack: live FFT spectrum on top, volume
slider below. This is a natural pairing ("here's the live output, here's how loud").
VolumeControls.razorgets the<SpectrumVisualizer/>stacked above its existingMudStack. The wrapper is renamedVolumeZone(decided) for symmetry with the other two zones. - Seek zone — becomes the
WaveformSeeker: a wide loudness bar chart that grows to fill the available width (it inherits theflex-grow:1the seek zone has today), with the timestamp beneath.
3c. Layout risk
The live spectrum is currently a wide element. Stacking it above the volume slider
constrains it to the narrow right cluster — at ≥600px the volume cluster is only as wide as
the slider (the CSS halves and flex-start-pins it per commit 78c6803). A 32-bucket FFT bar
chart squeezed into ~120px will look cramped.
Decided: 24 buckets in the volume cluster, parameterized. The live spectrum renders 24
buckets in the narrow volume slot, set via the existing BucketCount parameter on
SpectrumVisualizer so the count can be tuned without a code change to the component. 24 reads
denser than 16 while still fitting the ~120px cluster comfortably.
4. WaveformSeeker component design
4a. Data → geometry
The component receives a normalized loudness profile: double[] profile, each value in [0,1],
representing the loudness measure of a contiguous time slice. Profile length is N buckets
covering the whole track regardless of duration (fixed bucket count, variable bucket
duration). Each bucket renders as one vertical bar; bar height = profile[i] scaled to the
component height (with a small floor, ~2%, so silence is still visible as a hairline — mirrors
SpectrumVisualizer.GetBarHeight).
Bar count. Two regimes:
- Preprocessed resolution (N): how many buckets the backend computes and stores. N is
configurable (e.g. via
WaveformProfileOptionsbound from DI/config), default 512. A high source resolution lets the front end downsample to whatever fits the rendered width without re-fetching. Storage is tiny regardless of N (see §5). - Rendered resolution: how many bars actually draw, = pixels-available / (bar + gap). The front end derives its rendered bar count from the available width, regardless of N — it does not assume the stored N is the bar count. At a typical ~600px seek zone with 2px bars + 1px gaps that's ~200 bars. The component downsamples N → rendered count by max-or-mean over each rendered bucket's source range. Use max (peak) for the visual — peak-per-bucket gives the punchy DAW look; mean flattens transients.
Decided: N configurable, default 512; rendered count derived from width; downsample by peak.
512 is a clean power-of-two, downsamples evenly to 256/128/64, and is ~512B on the wire as
quantized bytes (§5b). The wire format is the quantized byte[] base64 either way; N being
configurable does not change the format.
4b. Playhead / progress indication
The current position is shown two ways simultaneously (both cheap, both standard):
- Played/unplayed split — bars left of the playhead render in the played colour (moss
green
--deepdrft-green-accent, matching the house waveform identity called out intrack-card-theming.md), bars right render muted. The split point =CurrentTime / Duration. - Playhead line — a 1–2px vertical rule at the split, for precision.
While dragging, the split/line follow the pointer (DisplayTime), not playback — same
_isSeeking discipline as today.
4c. Interaction model
Pointer-based, reusing the existing callback contract so AudioPlayerBar.razor.cs is barely
touched:
- Hover → a faint preview line at the cursor + a tooltip/label showing the time under the
cursor (
hoverTime = (cursorX / width) * Duration). Preview only; no seek. (New affordance; the MudSlider had none. Borrowed from SoundCloud/YouTube scrubbers.) - Click → seek to
clickX / width * Duration. FiresOnSeekStartthen immediatelyOnSeekEnd(clickTime). - Drag →
pointerdownstarts seeking (OnSeekStart),pointermoveupdates the preview position and firesOnSeekChange(t)(soDisplayTimeand the played/unplayed split track the drag live),pointerupcommits (OnSeekEnd(t)→PlayerService.Seek(t)).pointerleavewhile dragging commits at the last position (matches currentHandlePointerLeavebehaviour) — or, better, use pointer capture (setPointerCapture) so a drag that leaves the element keeps tracking until release. Recommend pointer capture; it's the more forgiving gesture and avoids the "lost the drag" feel.
Position math needs the element's pixel width and the pointer's offset. Two implementations:
- Pure Blazor: use
@onpointermove/@onpointerdownwithPointerEventArgs.OffsetXand a cached bounding width (one JSgetBoundingClientRectcall on resize). Simple, no per-frame interop. - Thin JS helper: a tiny interop that does hit-testing and returns a normalized
[0,1]fraction. Only worth it ifOffsetXproves unreliable across the responsive reflows.
Recommend pure-Blazor pointer events first, with OffsetX/cached width; fall back to a JS
helper only if hit-testing is flaky. Keeps the new surface out of the TS bundle (see §7).
4d. Rendering approach
- DOM bars (one
<div>per rendered bar, CSS--bar-height) — exactly howSpectrumVisualizerworks today, so it's consistent and themeable via existingdeepdrft-tokens. At ~200 bars this is fine; Blazor diffing over 200 static divs that only change a CSS var on seek is cheap. - Canvas — one
<canvas>, drawn via a small JS interop on load + on playhead move. Scales to thousands of bars and avoids 200-node diffs, but pulls the component into the JS interop layer and complicates theming (canvas can't read CSS vars without plumbing).
Recommend DOM bars to match the existing visualizer and stay in pure Blazor/CSS. Revisit
canvas only if profiling shows the seek-time re-render (recolouring the split) janks. The
played/unplayed split can be done without re-rendering every bar by overlaying a clipped
coloured layer — render the bars once in the played colour, lay a muted-colour copy clipped to
width * (1 - progress) from the right on top. Then a seek only moves one clip rect, not 200
divs. This is the key perf trick; call it out for the implementer.
4e. No-profile-yet state (important)
A track may have no stored loudness profile (legacy tracks uploaded before this feature; profile fetch failed; profile still computing). The component must degrade, not break:
- Fallback bars: render a flat row of floor-height bars (or a gentle idle shimmer) so the control still reads as a seekbar and remains fully seekable (geometry is just time/width; it needs no profile data to seek). Seek must never depend on the profile being present.
- Optional client-side compute: once audio is decoded, the front end could compute a
loudness profile from the decoded
AudioBuffers and fill the bars live (progressive reveal as the stream decodes). This is a real fallback but adds a TS path (§7); treat as a later enhancement, not part of the first cut. First cut: preprocessed profile or flat fallback.
Recommend: first cut ships preprocessed-or-flat. Seekability is never gated on the profile.
5. Backend loudness preprocessing
This is the load-bearing design decision. Three sub-questions: how to compute, when to compute, where to store.
5a. How to compute (swappable loudness algorithm)
The loudness measure is an abstraction, not a hardwired RMS pass (decided). WaveformProfileService
in DeepDrftContent owns the PCM walk, bucketing, normalization, and storage; the per-bucket
loudness calculation is delegated to an injected ILoudnessAlgorithm strategy. The first
implementation is RMS (RmsLoudnessAlgorithm); LUFS (or another perceptual profile) is the
named future alternative, droppable in as a second ILoudnessAlgorithm without touching the
service, the wire format, the storage, or the component.
Sketch of the seam (illustrative, not prescriptive):
interface ILoudnessAlgorithm {
// given the mono samples for one time slice, return its loudness in [0,1]-able units
double Measure(ReadOnlySpan<float> sliceSamples);
}
// first impl: RMS — sqrt(mean(sample²)). future: LUFS (K-weighting + gating).
We already own a PCM-WAV parser: AudioProcessor in DeepDrftContent parses RIFF/WAVE/fmt/
data, validates PCM, and knows channels / sampleRate / bitsPerSample / blockAlign / dataSize.
Computing the profile is a straightforward extension of that same buffer walk — no new audio
library needed for RMS. The stack is PCM-only WAV today (AudioProcessor rejects non-PCM), so
WaveformProfileService can read samples directly:
- Locate the
datachunk (already done inValidateWavStructure/FindChunk). - Walk the PCM samples, decode per
bitsPerSample(16/24/32-bit signed; 8-bit unsigned), average channels to mono. - Partition the sample stream into N equal time slices (N from
WaveformProfileOptions, default 512); hand each slice toILoudnessAlgorithm.Measureto getbucket[i]. - Normalize: divide by the max bucket (peak-normalize to
[0,1], decided) so quiet tracks still show shape. (Trade-off: peak-normalize loses absolute-loudness comparison between tracks. Acceptable — the seeker is about this track's shape, not cross-track loudness. A future LUFS algorithm that wants absolute units can normalize differently behind the same interface.)
The PCM walk + bucketing + RMS Measure is ~40 lines. No external dependency for the RMS path.
Do not pull in NAudio or similar for RMS — the existing parser already does the hard part. A
future LUFS implementation may justify a dep; if so, that decision rides with that algorithm,
not the service.
Cost: one linear pass over the PCM buffer. For a 100MB WAV that's ~25M stereo samples — a few hundred ms, done once at upload, never on the playback path.
5b. Data format on the wire
Front end needs double[] profile length N, each [0,1]. Decided: quantized byte[] (each
bucket 0–255), base64 in JSON, decoded to [0,1] client-side (b/255.0). 8-bit quantization
is visually lossless for a bar chart; at N=512 that's 512 bytes raw / ~684 chars base64 —
negligible to store and ship, and it keeps the profile from bloating the metadata payload if it
ever rides along with TrackDto (see §5d). The format is independent of N and of the loudness
algorithm — both RMS and a future LUFS profile quantize to the same [0,1]→byte wire shape.
5c. When to compute
- On upload (decided, for new tracks):
UnifiedTrackService.UploadAsyncalready processes the WAV (AddTrackFromWavAsync→AudioProcessor). Add theWaveformProfileServicepass there, in the same read, and persist the profile alongside the track. Cost is paid once, by the uploader (CMS admin), off the listener's path. This is the natural seam. - CMS PreProcessing panel (decided, for existing tracks): not a CLI command. Existing
vault tracks predate the feature, so they need an explicit generation path — surfaced in the
CMS (
DeepDrftManager) rather than as an offline job. The CMS track grid shows which tracks are missing a profile and offers 1-click generation per track (and/or a bulk action). The compute runs server-side via the sameWaveformProfileService. See Phase 5 (§12) for the panel design. - On demand + cache (rejected): computing lazily on first profile request spreads cost to first-listen and needs a cache layer + cold-start penalty. Not worth its complexity given upload is the only ingest and the CMS panel covers the backlog explicitly.
Decided: compute on upload for new tracks; CMS PreProcessing panel for existing ones. The no-profile fallback (§4e) carries the UI in the meantime, so the seeker can ship before every existing track has been processed. (Memory note: Daniel favours designing the seam now even when deferring the feature — the no-profile fallback is that seam.)
5d. Where the data lives — vault sidecar (decided)
Decided: option 3 — a sidecar in the FileDatabase vault + a dedicated endpoint (§6). Store
the profile as its own vault entry (e.g. a profiles vault keyed by EntryKey, or a
.profile/.wfp companion next to the audio). The candidates and the reasoning for the choice:
-
New column on
TrackEntity/tracktable (WaveformProfile byte[]ortext). Profile rides with metadata. Pro: one fetch (GET api/track/pageormeta/{id}already returnsTrackDto). Con: bloats every paged list response by ~512B × pageSize (20 → ~10KB/page) even when the player isn't open;TrackEntityis described inCLAUDE.mdas "a join, only metadata" — a binary blob stretches that contract. -
New column, but only returned by
meta/{id}/ a dedicated fetch — not bypage. Keeps the list lean; the player fetches the profile when a track is selected. Needs the profile field to be omittable from the paged DTO. (Second choice if a vault type is unwelcome — see below.) -
A sidecar in the FileDatabase vault (chosen) — store the profile as its own vault entry keyed by
EntryKey. Pro: keeps it out of SQL entirely, near the binary it describes, consistent with "binary content lives in the vault." Con: a second vault round-trip to serve it; new endpoint. -
Computed into the audio stream's header response — no separate storage; return the profile as a response header / preamble on
GET api/track/{id}. Couples profile delivery to the audio fetch. Awkward (headers for 512B, or a framing change to the WAV stream). Rejected.
Rationale for the vault sidecar:
- It honours the architectural line
CLAUDE.mddraws —TrackEntitystays pure metadata, the vault owns "binary stuff about the audio." A loudness profile is derived binary content; it belongs with the binary. - It keeps the paged list response unchanged (no regression to
TracksViewload weight). - It parallels the existing audio path exactly: the player already does a separate content
fetch (
TrackMediaClient→api/track/{id}) distinct from the metadata fetch (TrackClient→api/track/page). The profile is one more content fetch on track-select.
Fallback if the vault type proves unwelcome: option 2 (SQL column, served only on meta/{id}
or a dedicated route). Simpler to migrate (one EF column) but puts derived binary in SQL. Not the
chosen path; recorded so the alternative is on the table if the vault sidecar hits friction.
The dual-database split here is real: metadata (SQL) vs derived-binary (vault). The profile is derived binary. The vault sidecar keeps the split clean.
6. New API surface
Per the vault-sidecar storage (§5d, decided), add one unauthenticated GET that mirrors the existing audio route's shape and proxy path:
GET api/track/{trackId}/waveform (DeepDrftAPI, unauthenticated)
- Route param
trackId(string) =EntryKey, same asGET api/track/{trackId}. - Loads the stored profile for that entry (from the
profilesvault / sidecar). - Returns
200withWaveformProfileDto { int BucketCount; string Data; }(base64 quantized bytes), or404if no profile exists for that track (front end then renders the flat fallback, §4e). - Unauthenticated, like audio streaming — it's public listener data.
Proxy: add the matching forward in DeepDrftPublic/Controllers/TrackProxyController.cs
(currently forwards page and {trackId}); add {trackId}/waveform. Same thin-proxy pattern,
no logic.
Client: a method on TrackMediaClient (it owns the DeepDrft.Content client and the
content base address) — GetWaveformProfileAsync(trackId) → ApiResult<WaveformProfileDto>. Keeps
the profile fetch on the content client, consistent with §5d's "profile is content."
The CMS PreProcessing panel (§12 Phase 5) also needs server-side endpoints: a way to query which
tracks lack a profile and a way to trigger generation. Those are authenticated CMS routes on
DeepDrftAPI (ApiKey), distinct from this public read — see Phase 5 for their shape.
New model
WaveformProfileDto in DeepDrftModels (new DTOs/WaveformProfileDto.cs):
public class WaveformProfileDto {
public int BucketCount { get; set; }
public string Data { get; set; } // base64 of byte[BucketCount], each 0..255
}
DeepDrftModels is referenced by every project (CLAUDE.md), so both API and client see it. The
DTO carries no algorithm tag — it is loudness-in-[0,1] regardless of how it was computed.
7. TypeScript seam
First cut: no TS changes required. The preprocessed profile arrives as data over HTTP,
is decoded in C# (WaveformProfileDto.Data → double[]), and rendered by the Blazor component
with pure pointer events (§4c/§4d). The TS audio bundle (DeepDrftPublic/Interop/audio/) is
untouched. The live SpectrumVisualizer keeps using the existing
startSpectrumAnimation/SpectrumAnalyzer path verbatim — only its position in the markup
changes.
Deliberately deferred TS work (later enhancement, see §4e): client-side loudness computation
from decoded AudioBuffers for the no-profile fallback. That would need a new TS module
(e.g. WaveformProfiler.ts) reading scheduler's decoded buffers and bucketing amplitude, plus
an interop method to stream buckets to Blazor as they fill. It mirrors SpectrumAnalyzer's
callback pattern. Not in the first cut — the flat fallback covers the gap, and the CMS
PreProcessing panel removes most no-profile cases. Keep this seam in mind so the component's data
input is an abstract double[] that could later be fed by either source.
This matters for the component contract: WaveformSeeker should take its profile as a
parameter/observable it doesn't care about the origin of — preprocessed today, possibly
live-computed later. Don't hard-wire it to the HTTP fetch.
8. Frontend data flow
Track selected (TracksView.PlayTrack → PlayerService.SelectTrackStreaming)
│
├── (existing) audio: TrackMediaClient.GetTrackMedia(entryKey) → stream → TS decode → playback
│
└── (new) profile: TrackMediaClient.GetWaveformProfileAsync(entryKey) → WaveformProfileDto
│
└── decode base64 → double[] profile → WaveformSeeker.Profile
Wiring options for who fetches the profile and holds it:
-
A. Player service holds it.
StreamingAudioPlayerService(or the baseAudioPlayerService) gains aWaveformProfileproperty, fetched when a track is selected, exposed likeDuration/CurrentTime.WaveformSeekerreads it off the cascadedIStreamingPlayerService, re-rendering onStateChanged— the same patternSpectrumVisualizerandAudioPlayerBaralready use. Recommended: the profile is part of "current track state," and the player service is already the single source the seek zone binds to. One place fetches, one place caches per track, cleared onUnload. -
B. WaveformSeeker fetches its own. Component takes
EntryKey+TrackMediaClient, fetches inOnParametersSetwhen the key changes. Simpler to reason about in isolation but duplicates "current track" knowledge the player already owns and risks double-fetch / stale key on rapid track switches. -
C. A dedicated
WaveformProfileViewModel(MVVM convention inCLAUDE.md) scoped in DI, fetches and caches byEntryKey, injected into the component. Cleanest separation, an extra moving part. Reasonable if profiles get reused across views (e.g. mini-waveforms on track cards later — see §10).
Recommend A for the first cut (profile as player-service state — matches the established binding pattern and the "one source, multiple views" instinct: the seeker is just another view over current-track state). Promote to C later if profiles need to be consumed outside the player (track-card waveforms).
CurrentTime / Duration for the playhead come from the player service exactly as
PlayerSeekZone reads them today — no change.
9. Component & file inventory
New:
DeepDrftPublic.Client/Controls/AudioPlayerBar/WaveformSeeker.razor(+.razor.cs,.razor.css)DeepDrftModels/DTOs/WaveformProfileDto.csDeepDrftContent/Processors/WaveformProfileService.cs— owns the PCM walk, bucketing, normalization, storage; takes anILoudnessAlgorithm.DeepDrftContent/Processors/ILoudnessAlgorithm.cs— the swappable loudness strategy (§5a).DeepDrftContent/Processors/RmsLoudnessAlgorithm.cs— first implementation (RMS). LUFS is a future sibling implementation, not built now.WaveformProfileOptions(config-bound) — carriesBucketCount(default 512) and any future algorithm-selection knob.- DeepDrftAPI public read route
GET api/track/{trackId}/waveforminTrackController.cs+ proxy inTrackProxyController.cs. - DeepDrftAPI CMS routes (ApiKey) for the PreProcessing panel: query missing-profile tracks + trigger generation (§12 Phase 5).
TrackMediaClient.GetWaveformProfileAsync- (storage) new
profilesvault constant inVaultConstants. - (CMS) PreProcessing panel surface in
DeepDrftManager— see Phase 5 for the component/service inventory (it lands with that phase, not the first cut).
Changed:
PlayerSeekZone.razor— swapMudSliderblock for<WaveformSeeker/>; drop the<SpectrumVisualizer/>(moves to volume).VolumeControls.razor→ renamedVolumeZone.razor(decided) — stack<SpectrumVisualizer/>above the volume slider.AudioPlayerBar.razor.css— adjust volume cluster to host the spectrum; seeker sizing.SpectrumVisualizer— setBucketCount=24for the narrow volume slot (§3c).AudioPlayerBar.razor.cs— minimal; seek callbacks already abstract. Possibly hold/clearWaveformProfileif §8-A.StreamingAudioPlayerService/AudioPlayerService— addWaveformProfilestate + fetch (§8-A).UnifiedTrackService.UploadAsync— compute + persist profile on upload viaWaveformProfileService.
Untouched (important): the entire TS audio bundle, the seek-beyond-buffer offset path,
WavOffsetService, the streaming decode pipeline.
10. Future options this unlocks (don't build now, leave room for)
- LUFS (or other perceptual) loudness profile. The
ILoudnessAlgorithmseam (§5a) exists precisely so this drops in as a second strategy without touching the component, wire format, or storage. The cheapest of the future moves because the abstraction is built up front. - Track-card mini-waveforms. Once profiles exist as a reusable resource,
TrackCardcould show a tiny loudness sparkline. This is the argument for the §8-CWaveformProfileViewModeleventually, and for storing profiles where non-player surfaces can fetch them cheaply (favours the vault sidecar + endpoint, §5d-3). - Loudness-normalized playback / waveform colouring by energy. The same profile data could drive auto-gain or heat-coloured bars.
- Live-computed profiles for the no-profile case (§7 deferred TS).
- Higher-res zoomed scrub on long tracks (re-fetch a denser profile for a time window) — why a generous, configurable stored N and client-side downsampling is worth it now.
Keep the component's profile input origin-agnostic and the stored resolution generous so these stay cheap to add.
11. Decisions (resolved 2026-06-05)
All seven forks below are decided. Recorded here so the rationale travels with the spec.
- Storage location (§5d): vault sidecar + dedicated endpoint — decided ✓. Profile is derived
binary; it lives in the vault,
TrackEntitystays pure metadata, the paged list stays lean. SQL-column-on-meta/{id}is the recorded fallback only if the vault type hits friction. - Names (§1): component
WaveformSeeker; dataWaveformProfile(WaveformProfileDto,waveformBuckets,profile) — decided ✓. Honest naming; the data is named for the concept, not the algorithm, so RMS→LUFS never forces a rename. - Live-spectrum bucket count (§3c): 24 buckets, parameterized — decided ✓. Set via
BucketCountonSpectrumVisualizerso it can be tuned without a code change. - Stored resolution + wire format (§4a/§5b): N configurable (default 512) via
WaveformProfileOptions; quantizedbyte[]base64 — decided ✓. Front end derives its rendered bar count from available width regardless of N. - Backfill (§5c): CMS PreProcessing panel, not a CLI — decided ✓. The CMS track grid shows
missing-profile tracks and offers 1-click generation per track (and/or bulk); compute runs
server-side via
WaveformProfileService. See §12 Phase 5. - Normalization (§5a): peak-normalize — decided ✓. Per-track shape over cross-track absolute loudness; a future LUFS algorithm can normalize differently behind the same interface.
VolumeControls→VolumeZonerename (§3b) — decided ✓. Symmetry with the transport and seek zones.
Cross-cutting decision (§5a): the loudness measure is a swappable ILoudnessAlgorithm, RMS
first, LUFS the named future alternative — not hardwired to RMS.
12. Implementation phases (ordered, delegable)
Sequenced so each phase has a shippable deliverable and the UI can land before existing tracks are all preprocessed. Phases 1–2 (backend) and phase 3 (layout move) are parallelizable — they touch disjoint files and meet only at the client fetch in phase 4. §11 decisions are all resolved, so there is no decisions-gate phase.
Phase 1 — Loudness computation + storage (backend). WaveformProfileService in
DeepDrftContent (extend the existing PCM walk) with an ILoudnessAlgorithm strategy and the
RmsLoudnessAlgorithm first implementation. Wire into UnifiedTrackService.UploadAsync to
compute + persist on upload (vault sidecar, §5d). Add WaveformProfileDto to DeepDrftModels and
WaveformProfileOptions (default N=512).
Deliverable: new uploads get a stored profile; unit-test the RMS math against a known WAV, and
unit-test that a second ILoudnessAlgorithm swaps in cleanly (guards the abstraction).
Phase 2 — Public read API + proxy + client (backend/transport). Add
GET api/track/{trackId}/waveform, the proxy forward, and TrackMediaClient.GetWaveformProfileAsync.
Deliverable: a track's profile is fetchable end-to-end over HTTP. Can be tested with curl
before any UI.
Phase 3 — Layout move (frontend, parallel with 1–2). Move SpectrumVisualizer from
PlayerSeekZone into the volume cluster (renamed VolumeZone); adjust CSS (§3c); set
BucketCount=24.
Deliverable: live spectrum sits above the volume slider; seek zone temporarily keeps the
MudSlider (or a placeholder). Player still fully works. This de-risks the layout independently
of the new component.
Phase 4 — WaveformSeeker component (frontend, needs 2 + 3). Build WaveformSeeker.razor:
DOM bars, played/unplayed split via clip overlay (§4d), pointer-capture seek (§4c), flat
fallback (§4e), rendered bar count derived from width. Wire profile via player-service state
(§8-A). Replace the MudSlider in PlayerSeekZone with it.
Deliverable: the new seekbar is live for tracks that have a profile; flat-but-seekable for
those that don't.
Phase 5 — CMS PreProcessing panel (CMS, after 1). In DeepDrftManager, add a PreProcessing
feature to the CMS track grid: a column/indicator showing which tracks lack a waveform profile
and a per-track Generate action (and/or a bulk "generate all missing" action). The grid
queries missing-profile state and triggers generation through authenticated CMS API routes on
DeepDrftAPI; the compute runs server-side via the same WaveformProfileService (no CLI). New
surface roughly: a CMS service method on ICmsTrackService/CmsTrackService for
list-missing + generate, the backing DeepDrftAPI routes (ApiKey), and the grid column/action in
the Tracks CMS page.
Deliverable: a CMS admin can see and one-click-fill any missing profile; the no-profile fallback
becomes rare/never as the backlog is worked off in-app.
Deferred (not scheduled): live client-side loudness compute (§7), track-card mini-waveforms
(§10), a LUFS ILoudnessAlgorithm (§5a/§10). Tracked here so the component contract stays
origin-agnostic and the algorithm stays swappable.
13. What this plan deliberately does NOT do
- Does not touch the streaming decode pipeline, seek-beyond-buffer, or
WavOffsetService. - Does not add an audio-processing dependency (NAudio etc.) for the RMS path — the existing PCM
parser suffices. (A future LUFS
ILoudnessAlgorithmmay revisit this, on its own merits.) - Does not compute the profile on the playback path — preprocessed only (the whole point).
- Does not change
TrackEntity's metadata contract — the profile lives in the vault sidecar. - Does not add a CLI; existing-track preprocessing is the in-CMS PreProcessing panel (§12 Phase 5).
- Does not require TS bundle changes in the first cut.