Files

T

daniel-c-harvey c10d315a7b docs(product): add approved WaveformSeeker spec

Loudness-waveform seekbar replacing MudSlider; ILoudnessAlgorithm
abstraction (RMS first, LUFS future); vault sidecar storage; CMS
PreProcessing panel for backfill; VolumeZone rename. All decisions
resolved 2026-06-05.

2026-06-05 15:44:40 -04:00

34 KiB

Raw Permalink Blame History

WaveformSeeker — loudness-waveform seekbar to replace the MudSlider

Status: approved. Decisions resolved 2026-06-05. Author: product-designer. Date: 2026-06-05. Plan only — no code edits made by this doc.

1. Summary

Replace the MudSlider-based scrub bar in PlayerSeekZone.razor with a new <WaveformSeeker/> component that renders the track's loudness profile as a high-density vertical bar chart and serves as the seek surface (click / drag to seek).

The point is to make the seekbar informative: instead of a featureless line, the listener sees the track's energy shape — the quiet intro, the drop, the breakdown, the outro — and can scrub against that shape. This is the established "waveform scrubber" idiom from SoundCloud, Overcast, and most DAW transport bars. We are borrowing it deliberately; the novel part for us is only that the profile is preprocessed server-side and shipped as a small quantized array, so the visual paints the instant a track loads rather than waiting for the audio to decode.

The loudness measure is not hardcoded to RMS. The first implementation computes RMS, but the compute path is built around a swappable ILoudnessAlgorithm abstraction (§5a) so a different perceptual loudness profile (e.g. LUFS) can be substituted later without touching the component, the wire format, or the storage. The component and the data are named for the concept (waveform / loudness profile), not the algorithm.

Two visualizations currently coexist in the seek zone. They are being separated by kind:

Real-time spectrum (FFT frequency bars, SpectrumVisualizer.razor) — a live readout of "what is sounding right now." This moves up, above the volume slider.
Static loudness-over-time (the new WaveformSeeker) — a whole-track readout of "how loud is each moment." This takes over the seek area.

This is a clean conceptual split: live-frequency lives with the output level (volume), whole-track-amplitude lives with the transport position (seek). The current arrangement (real-time spectrum behind the seek slider) conflates the two.

Naming (decided)

"Spectrum" properly means frequency content; what this component shows is amplitude over time, not spectrum. The component is named honestly: WaveformSeeker (decided), which reads correctly against the live SpectrumVisualizer (frequency) without implying FFT data is in the payload. The data is named for the concept, not the algorithm: WaveformProfile / WaveformProfileDto / waveformBuckets / a profile field — so substituting the loudness algorithm (RMS → LUFS, §5a) never forces a rename of the type that carries it.

2. Current state (what we're changing)

The seek zone today (PlayerSeekZone.razor):

<MudStack Row="false" Spacing="0" Class="@Class">
    <SpectrumVisualizer/>                          @* live FFT bars, sits on top *@
    <div class="mx-3" @onpointerdown/up/leave>
        <MudSlider .../>                            @* the scrub bar *@
    </div>
    <TimestampLabel CurrentTime=... Duration=.../>  @* time text *@
</MudStack>

Relevant mechanics already in place that the new component must preserve:

Seek gesture plumbing lives in PlayerSeekZone.razor.cs: OnSeekStart / OnSeekChange / OnSeekEnd callbacks bubble to AudioPlayerBar.razor.cs, which sets _isSeeking, tracks _seekPosition, and calls PlayerService.Seek(position) on release. DisplayTime shows the drag position while seeking, real CurrentTime otherwise.
CanSeek = IsLoaded && Duration.HasValue && Duration > 0. Seek is allowed during streaming, including beyond the buffer (the offset-refetch path in StreamingAudioPlayerService / AudioPlayer.ts.seekBeyondBuffer). The new component does not touch that path — it only produces a target time and hands it to the same Seek(double) call.
SpectrumVisualizer is driven entirely by AudioInteropService.StartSpectrumAnimationAsync, which subscribes a callback to the TS SpectrumAnalyzer (live FFT, ~30fps). It already self-manages animation lifecycle off PlayerService.StateChanged. Moving it is a pure layout move — no logic change.
Player layout (AudioPlayerBar.razor.css) is pure-CSS responsive: at ≥600px the row is [transport] [seek grows] [volume]; at <600px it's [transport][volume] then full-width seek below. Wherever the spectrum lands, it must respect this.

3. UI layout changes

3a. What moves

Element	Today	After
Live FFT spectrum (`SpectrumVisualizer`)	Inside `PlayerSeekZone`, above the slider	Inside the volume cluster, above the volume slider
Scrub bar (`MudSlider`)	`PlayerSeekZone`	Replaced by `WaveformSeeker` (loudness bars + playhead)
Timestamp (`TimestampLabel`)	Below the slider in `PlayerSeekZone`	Stays with the seeker (below or overlaid on the bars)
Volume slider (`VolumeControls`)	Right cluster	Unchanged position; now has the live spectrum stacked above it

3b. Resulting zones

Transport zone — unchanged (play/pause/stop + load spinner).
Volume zone — becomes a small vertical stack: live FFT spectrum on top, volume slider below. This is a natural pairing ("here's the live output, here's how loud"). VolumeControls.razor gets the <SpectrumVisualizer/> stacked above its existing MudStack. The wrapper is renamed VolumeZone (decided) for symmetry with the other two zones.
Seek zone — becomes the WaveformSeeker: a wide loudness bar chart that grows to fill the available width (it inherits the flex-grow:1 the seek zone has today), with the timestamp beneath.

3c. Layout risk

The live spectrum is currently a wide element. Stacking it above the volume slider constrains it to the narrow right cluster — at ≥600px the volume cluster is only as wide as the slider (the CSS halves and flex-start-pins it per commit 78c6803). A 32-bucket FFT bar chart squeezed into ~120px will look cramped.

Decided: 24 buckets in the volume cluster, parameterized. The live spectrum renders 24 buckets in the narrow volume slot, set via the existing BucketCount parameter on SpectrumVisualizer so the count can be tuned without a code change to the component. 24 reads denser than 16 while still fitting the ~120px cluster comfortably.

4. WaveformSeeker component design

4a. Data → geometry

The component receives a normalized loudness profile: double[] profile, each value in [0,1], representing the loudness measure of a contiguous time slice. Profile length is N buckets covering the whole track regardless of duration (fixed bucket count, variable bucket duration). Each bucket renders as one vertical bar; bar height = profile[i] scaled to the component height (with a small floor, ~2%, so silence is still visible as a hairline — mirrors SpectrumVisualizer.GetBarHeight).

Bar count. Two regimes:

Preprocessed resolution (N): how many buckets the backend computes and stores. N is configurable (e.g. via WaveformProfileOptions bound from DI/config), default 512. A high source resolution lets the front end downsample to whatever fits the rendered width without re-fetching. Storage is tiny regardless of N (see §5).
Rendered resolution: how many bars actually draw, = pixels-available / (bar + gap). The front end derives its rendered bar count from the available width, regardless of N — it does not assume the stored N is the bar count. At a typical ~600px seek zone with 2px bars + 1px gaps that's ~200 bars. The component downsamples N → rendered count by max-or-mean over each rendered bucket's source range. Use max (peak) for the visual — peak-per-bucket gives the punchy DAW look; mean flattens transients.

Decided: N configurable, default 512; rendered count derived from width; downsample by peak. 512 is a clean power-of-two, downsamples evenly to 256/128/64, and is ~512B on the wire as quantized bytes (§5b). The wire format is the quantized byte[] base64 either way; N being configurable does not change the format.

4b. Playhead / progress indication

The current position is shown two ways simultaneously (both cheap, both standard):

Played/unplayed split — bars left of the playhead render in the played colour (moss green --deepdrft-green-accent, matching the house waveform identity called out in track-card-theming.md), bars right render muted. The split point = CurrentTime / Duration.
Playhead line — a 1–2px vertical rule at the split, for precision.

While dragging, the split/line follow the pointer (DisplayTime), not playback — same _isSeeking discipline as today.

4c. Interaction model

Pointer-based, reusing the existing callback contract so AudioPlayerBar.razor.cs is barely touched:

Hover → a faint preview line at the cursor + a tooltip/label showing the time under the cursor (hoverTime = (cursorX / width) * Duration). Preview only; no seek. (New affordance; the MudSlider had none. Borrowed from SoundCloud/YouTube scrubbers.)
Click → seek to clickX / width * Duration. Fires OnSeekStart then immediately OnSeekEnd(clickTime).
Drag → pointerdown starts seeking (OnSeekStart), pointermove updates the preview position and fires OnSeekChange(t) (so DisplayTime and the played/unplayed split track the drag live), pointerup commits (OnSeekEnd(t) → PlayerService.Seek(t)). pointerleave while dragging commits at the last position (matches current HandlePointerLeave behaviour) — or, better, use pointer capture (setPointerCapture) so a drag that leaves the element keeps tracking until release. Recommend pointer capture; it's the more forgiving gesture and avoids the "lost the drag" feel.

Position math needs the element's pixel width and the pointer's offset. Two implementations:

Pure Blazor: use @onpointermove/@onpointerdown with PointerEventArgs.OffsetX and a cached bounding width (one JS getBoundingClientRect call on resize). Simple, no per-frame interop.
Thin JS helper: a tiny interop that does hit-testing and returns a normalized [0,1] fraction. Only worth it if OffsetX proves unreliable across the responsive reflows.

Recommend pure-Blazor pointer events first, with OffsetX/cached width; fall back to a JS helper only if hit-testing is flaky. Keeps the new surface out of the TS bundle (see §7).

4d. Rendering approach

DOM bars (one <div> per rendered bar, CSS --bar-height) — exactly how SpectrumVisualizer works today, so it's consistent and themeable via existing deepdrft- tokens. At ~200 bars this is fine; Blazor diffing over 200 static divs that only change a CSS var on seek is cheap.
Canvas — one <canvas>, drawn via a small JS interop on load + on playhead move. Scales to thousands of bars and avoids 200-node diffs, but pulls the component into the JS interop layer and complicates theming (canvas can't read CSS vars without plumbing).

Recommend DOM bars to match the existing visualizer and stay in pure Blazor/CSS. Revisit canvas only if profiling shows the seek-time re-render (recolouring the split) janks. The played/unplayed split can be done without re-rendering every bar by overlaying a clipped coloured layer — render the bars once in the played colour, lay a muted-colour copy clipped to width * (1 - progress) from the right on top. Then a seek only moves one clip rect, not 200 divs. This is the key perf trick; call it out for the implementer.

4e. No-profile-yet state (important)

A track may have no stored loudness profile (legacy tracks uploaded before this feature; profile fetch failed; profile still computing). The component must degrade, not break:

Fallback bars: render a flat row of floor-height bars (or a gentle idle shimmer) so the control still reads as a seekbar and remains fully seekable (geometry is just time/width; it needs no profile data to seek). Seek must never depend on the profile being present.
Optional client-side compute: once audio is decoded, the front end could compute a loudness profile from the decoded AudioBuffers and fill the bars live (progressive reveal as the stream decodes). This is a real fallback but adds a TS path (§7); treat as a later enhancement, not part of the first cut. First cut: preprocessed profile or flat fallback.

Recommend: first cut ships preprocessed-or-flat. Seekability is never gated on the profile.

5. Backend loudness preprocessing

This is the load-bearing design decision. Three sub-questions: how to compute, when to compute, where to store.

5a. How to compute (swappable loudness algorithm)

The loudness measure is an abstraction, not a hardwired RMS pass (decided). WaveformProfileService in DeepDrftContent owns the PCM walk, bucketing, normalization, and storage; the per-bucket loudness calculation is delegated to an injected ILoudnessAlgorithm strategy. The first implementation is RMS (RmsLoudnessAlgorithm); LUFS (or another perceptual profile) is the named future alternative, droppable in as a second ILoudnessAlgorithm without touching the service, the wire format, the storage, or the component.

Sketch of the seam (illustrative, not prescriptive):

interface ILoudnessAlgorithm {
    // given the mono samples for one time slice, return its loudness in [0,1]-able units
    double Measure(ReadOnlySpan<float> sliceSamples);
}
// first impl: RMS — sqrt(mean(sample²)). future: LUFS (K-weighting + gating).

We already own a PCM-WAV parser: AudioProcessor in DeepDrftContent parses RIFF/WAVE/fmt/ data, validates PCM, and knows channels / sampleRate / bitsPerSample / blockAlign / dataSize. Computing the profile is a straightforward extension of that same buffer walk — no new audio library needed for RMS. The stack is PCM-only WAV today (AudioProcessor rejects non-PCM), so WaveformProfileService can read samples directly:

Locate the data chunk (already done in ValidateWavStructure / FindChunk).
Walk the PCM samples, decode per bitsPerSample (16/24/32-bit signed; 8-bit unsigned), average channels to mono.
Partition the sample stream into N equal time slices (N from WaveformProfileOptions, default 512); hand each slice to ILoudnessAlgorithm.Measure to get bucket[i].
Normalize: divide by the max bucket (peak-normalize to [0,1], decided) so quiet tracks still show shape. (Trade-off: peak-normalize loses absolute-loudness comparison between tracks. Acceptable — the seeker is about this track's shape, not cross-track loudness. A future LUFS algorithm that wants absolute units can normalize differently behind the same interface.)

The PCM walk + bucketing + RMS Measure is ~40 lines. No external dependency for the RMS path. Do not pull in NAudio or similar for RMS — the existing parser already does the hard part. A future LUFS implementation may justify a dep; if so, that decision rides with that algorithm, not the service.

Cost: one linear pass over the PCM buffer. For a 100MB WAV that's ~25M stereo samples — a few hundred ms, done once at upload, never on the playback path.

5b. Data format on the wire

Front end needs double[] profile length N, each [0,1]. Decided: quantized byte[] (each bucket 0–255), base64 in JSON, decoded to [0,1] client-side (b/255.0). 8-bit quantization is visually lossless for a bar chart; at N=512 that's 512 bytes raw / ~684 chars base64 — negligible to store and ship, and it keeps the profile from bloating the metadata payload if it ever rides along with TrackDto (see §5d). The format is independent of N and of the loudness algorithm — both RMS and a future LUFS profile quantize to the same [0,1]→byte wire shape.

5c. When to compute

On upload (decided, for new tracks): UnifiedTrackService.UploadAsync already processes the WAV (AddTrackFromWavAsync → AudioProcessor). Add the WaveformProfileService pass there, in the same read, and persist the profile alongside the track. Cost is paid once, by the uploader (CMS admin), off the listener's path. This is the natural seam.
CMS PreProcessing panel (decided, for existing tracks): not a CLI command. Existing vault tracks predate the feature, so they need an explicit generation path — surfaced in the CMS (DeepDrftManager) rather than as an offline job. The CMS track grid shows which tracks are missing a profile and offers 1-click generation per track (and/or a bulk action). The compute runs server-side via the same WaveformProfileService. See Phase 5 (§12) for the panel design.
On demand + cache (rejected): computing lazily on first profile request spreads cost to first-listen and needs a cache layer + cold-start penalty. Not worth its complexity given upload is the only ingest and the CMS panel covers the backlog explicitly.

Decided: compute on upload for new tracks; CMS PreProcessing panel for existing ones. The no-profile fallback (§4e) carries the UI in the meantime, so the seeker can ship before every existing track has been processed. (Memory note: Daniel favours designing the seam now even when deferring the feature — the no-profile fallback is that seam.)

5d. Where the data lives — vault sidecar (decided)

Decided: option 3 — a sidecar in the FileDatabase vault + a dedicated endpoint (§6). Store the profile as its own vault entry (e.g. a profiles vault keyed by EntryKey, or a .profile/.wfp companion next to the audio). The candidates and the reasoning for the choice:

New column on TrackEntity / track table (WaveformProfile byte[] or text). Profile rides with metadata. Pro: one fetch (GET api/track/page or meta/{id} already returns TrackDto). Con: bloats every paged list response by ~512B × pageSize (20 → ~10KB/page) even when the player isn't open; TrackEntity is described in CLAUDE.md as "a join, only metadata" — a binary blob stretches that contract.
New column, but only returned by meta/{id} / a dedicated fetch — not by page. Keeps the list lean; the player fetches the profile when a track is selected. Needs the profile field to be omittable from the paged DTO. (Second choice if a vault type is unwelcome — see below.)
A sidecar in the FileDatabase vault (chosen) — store the profile as its own vault entry keyed by EntryKey. Pro: keeps it out of SQL entirely, near the binary it describes, consistent with "binary content lives in the vault." Con: a second vault round-trip to serve it; new endpoint.
Computed into the audio stream's header response — no separate storage; return the profile as a response header / preamble on GET api/track/{id}. Couples profile delivery to the audio fetch. Awkward (headers for 512B, or a framing change to the WAV stream). Rejected.

Rationale for the vault sidecar:

It honours the architectural line CLAUDE.md draws — TrackEntity stays pure metadata, the vault owns "binary stuff about the audio." A loudness profile is derived binary content; it belongs with the binary.
It keeps the paged list response unchanged (no regression to TracksView load weight).
It parallels the existing audio path exactly: the player already does a separate content fetch (TrackMediaClient → api/track/{id}) distinct from the metadata fetch (TrackClient → api/track/page). The profile is one more content fetch on track-select.

Fallback if the vault type proves unwelcome: option 2 (SQL column, served only on meta/{id} or a dedicated route). Simpler to migrate (one EF column) but puts derived binary in SQL. Not the chosen path; recorded so the alternative is on the table if the vault sidecar hits friction.

The dual-database split here is real: metadata (SQL) vs derived-binary (vault). The profile is derived binary. The vault sidecar keeps the split clean.

6. New API surface

Per the vault-sidecar storage (§5d, decided), add one unauthenticated GET that mirrors the existing audio route's shape and proxy path:

`GET api/track/{trackId}/waveform` (DeepDrftAPI, unauthenticated)

Route param trackId (string) = EntryKey, same as GET api/track/{trackId}.
Loads the stored profile for that entry (from the profiles vault / sidecar).
Returns 200 with WaveformProfileDto { int BucketCount; string Data; } (base64 quantized bytes), or 404 if no profile exists for that track (front end then renders the flat fallback, §4e).
Unauthenticated, like audio streaming — it's public listener data.

Proxy: add the matching forward in DeepDrftPublic/Controllers/TrackProxyController.cs (currently forwards page and {trackId}); add {trackId}/waveform. Same thin-proxy pattern, no logic.

Client: a method on TrackMediaClient (it owns the DeepDrft.Content client and the content base address) — GetWaveformProfileAsync(trackId) → ApiResult<WaveformProfileDto>. Keeps the profile fetch on the content client, consistent with §5d's "profile is content."

The CMS PreProcessing panel (§12 Phase 5) also needs server-side endpoints: a way to query which tracks lack a profile and a way to trigger generation. Those are authenticated CMS routes on DeepDrftAPI (ApiKey), distinct from this public read — see Phase 5 for their shape.

New model

WaveformProfileDto in DeepDrftModels (new DTOs/WaveformProfileDto.cs):

public class WaveformProfileDto {
    public int BucketCount { get; set; }
    public string Data { get; set; }   // base64 of byte[BucketCount], each 0..255
}

DeepDrftModels is referenced by every project (CLAUDE.md), so both API and client see it. The DTO carries no algorithm tag — it is loudness-in-[0,1] regardless of how it was computed.

7. TypeScript seam

First cut: no TS changes required. The preprocessed profile arrives as data over HTTP, is decoded in C# (WaveformProfileDto.Data → double[]), and rendered by the Blazor component with pure pointer events (§4c/§4d). The TS audio bundle (DeepDrftPublic/Interop/audio/) is untouched. The live SpectrumVisualizer keeps using the existing startSpectrumAnimation/SpectrumAnalyzer path verbatim — only its position in the markup changes.

Deliberately deferred TS work (later enhancement, see §4e): client-side loudness computation from decoded AudioBuffers for the no-profile fallback. That would need a new TS module (e.g. WaveformProfiler.ts) reading scheduler's decoded buffers and bucketing amplitude, plus an interop method to stream buckets to Blazor as they fill. It mirrors SpectrumAnalyzer's callback pattern. Not in the first cut — the flat fallback covers the gap, and the CMS PreProcessing panel removes most no-profile cases. Keep this seam in mind so the component's data input is an abstract double[] that could later be fed by either source.

This matters for the component contract: WaveformSeeker should take its profile as a parameter/observable it doesn't care about the origin of — preprocessed today, possibly live-computed later. Don't hard-wire it to the HTTP fetch.

8. Frontend data flow

Track selected (TracksView.PlayTrack → PlayerService.SelectTrackStreaming)
        │
        ├── (existing) audio: TrackMediaClient.GetTrackMedia(entryKey) → stream → TS decode → playback
        │
        └── (new) profile: TrackMediaClient.GetWaveformProfileAsync(entryKey) → WaveformProfileDto
                    │
                    └── decode base64 → double[] profile  →  WaveformSeeker.Profile

Wiring options for who fetches the profile and holds it:

A. Player service holds it. StreamingAudioPlayerService (or the base AudioPlayerService) gains a WaveformProfile property, fetched when a track is selected, exposed like Duration/CurrentTime. WaveformSeeker reads it off the cascaded IStreamingPlayerService, re-rendering on StateChanged — the same pattern SpectrumVisualizer and AudioPlayerBar already use. Recommended: the profile is part of "current track state," and the player service is already the single source the seek zone binds to. One place fetches, one place caches per track, cleared on Unload.
B. WaveformSeeker fetches its own. Component takes EntryKey + TrackMediaClient, fetches in OnParametersSet when the key changes. Simpler to reason about in isolation but duplicates "current track" knowledge the player already owns and risks double-fetch / stale key on rapid track switches.
C. A dedicated WaveformProfileViewModel (MVVM convention in CLAUDE.md) scoped in DI, fetches and caches by EntryKey, injected into the component. Cleanest separation, an extra moving part. Reasonable if profiles get reused across views (e.g. mini-waveforms on track cards later — see §10).

Recommend A for the first cut (profile as player-service state — matches the established binding pattern and the "one source, multiple views" instinct: the seeker is just another view over current-track state). Promote to C later if profiles need to be consumed outside the player (track-card waveforms).

CurrentTime / Duration for the playhead come from the player service exactly as PlayerSeekZone reads them today — no change.

9. Component & file inventory

New:

DeepDrftPublic.Client/Controls/AudioPlayerBar/WaveformSeeker.razor (+ .razor.cs, .razor.css)
DeepDrftModels/DTOs/WaveformProfileDto.cs
DeepDrftContent/Processors/WaveformProfileService.cs — owns the PCM walk, bucketing, normalization, storage; takes an ILoudnessAlgorithm.
DeepDrftContent/Processors/ILoudnessAlgorithm.cs — the swappable loudness strategy (§5a).
DeepDrftContent/Processors/RmsLoudnessAlgorithm.cs — first implementation (RMS). LUFS is a future sibling implementation, not built now.
WaveformProfileOptions (config-bound) — carries BucketCount (default 512) and any future algorithm-selection knob.
DeepDrftAPI public read route GET api/track/{trackId}/waveform in TrackController.cs + proxy in TrackProxyController.cs.
DeepDrftAPI CMS routes (ApiKey) for the PreProcessing panel: query missing-profile tracks + trigger generation (§12 Phase 5).
TrackMediaClient.GetWaveformProfileAsync
(storage) new profiles vault constant in VaultConstants.
(CMS) PreProcessing panel surface in DeepDrftManager — see Phase 5 for the component/service inventory (it lands with that phase, not the first cut).

Changed:

PlayerSeekZone.razor — swap MudSlider block for <WaveformSeeker/>; drop the <SpectrumVisualizer/> (moves to volume).
VolumeControls.razor → renamed VolumeZone.razor (decided) — stack <SpectrumVisualizer/> above the volume slider.
AudioPlayerBar.razor.css — adjust volume cluster to host the spectrum; seeker sizing.
SpectrumVisualizer — set BucketCount=24 for the narrow volume slot (§3c).
AudioPlayerBar.razor.cs — minimal; seek callbacks already abstract. Possibly hold/clear WaveformProfile if §8-A.
StreamingAudioPlayerService / AudioPlayerService — add WaveformProfile state + fetch (§8-A).
UnifiedTrackService.UploadAsync — compute + persist profile on upload via WaveformProfileService.

Untouched (important): the entire TS audio bundle, the seek-beyond-buffer offset path, WavOffsetService, the streaming decode pipeline.

10. Future options this unlocks (don't build now, leave room for)

LUFS (or other perceptual) loudness profile. The ILoudnessAlgorithm seam (§5a) exists precisely so this drops in as a second strategy without touching the component, wire format, or storage. The cheapest of the future moves because the abstraction is built up front.
Track-card mini-waveforms. Once profiles exist as a reusable resource, TrackCard could show a tiny loudness sparkline. This is the argument for the §8-C WaveformProfileViewModel eventually, and for storing profiles where non-player surfaces can fetch them cheaply (favours the vault sidecar + endpoint, §5d-3).
Loudness-normalized playback / waveform colouring by energy. The same profile data could drive auto-gain or heat-coloured bars.
Live-computed profiles for the no-profile case (§7 deferred TS).
Higher-res zoomed scrub on long tracks (re-fetch a denser profile for a time window) — why a generous, configurable stored N and client-side downsampling is worth it now.

Keep the component's profile input origin-agnostic and the stored resolution generous so these stay cheap to add.

11. Decisions (resolved 2026-06-05)

All seven forks below are decided. Recorded here so the rationale travels with the spec.

Storage location (§5d): vault sidecar + dedicated endpoint — decided ✓. Profile is derived binary; it lives in the vault, TrackEntity stays pure metadata, the paged list stays lean. SQL-column-on-meta/{id} is the recorded fallback only if the vault type hits friction.
Names (§1): component WaveformSeeker; data WaveformProfile (WaveformProfileDto, waveformBuckets, profile) — decided ✓. Honest naming; the data is named for the concept, not the algorithm, so RMS→LUFS never forces a rename.
Live-spectrum bucket count (§3c): 24 buckets, parameterized — decided ✓. Set via BucketCount on SpectrumVisualizer so it can be tuned without a code change.
Stored resolution + wire format (§4a/§5b): N configurable (default 512) via WaveformProfileOptions; quantized byte[] base64 — decided ✓. Front end derives its rendered bar count from available width regardless of N.
Backfill (§5c): CMS PreProcessing panel, not a CLI — decided ✓. The CMS track grid shows missing-profile tracks and offers 1-click generation per track (and/or bulk); compute runs server-side via WaveformProfileService. See §12 Phase 5.
Normalization (§5a): peak-normalize — decided ✓. Per-track shape over cross-track absolute loudness; a future LUFS algorithm can normalize differently behind the same interface.
VolumeControls → VolumeZone rename (§3b) — decided ✓. Symmetry with the transport and seek zones.

Cross-cutting decision (§5a): the loudness measure is a swappable ILoudnessAlgorithm, RMS first, LUFS the named future alternative — not hardwired to RMS.

12. Implementation phases (ordered, delegable)

Sequenced so each phase has a shippable deliverable and the UI can land before existing tracks are all preprocessed. Phases 1–2 (backend) and phase 3 (layout move) are parallelizable — they touch disjoint files and meet only at the client fetch in phase 4. §11 decisions are all resolved, so there is no decisions-gate phase.

Phase 1 — Loudness computation + storage (backend). WaveformProfileService in DeepDrftContent (extend the existing PCM walk) with an ILoudnessAlgorithm strategy and the RmsLoudnessAlgorithm first implementation. Wire into UnifiedTrackService.UploadAsync to compute + persist on upload (vault sidecar, §5d). Add WaveformProfileDto to DeepDrftModels and WaveformProfileOptions (default N=512). Deliverable: new uploads get a stored profile; unit-test the RMS math against a known WAV, and unit-test that a second ILoudnessAlgorithm swaps in cleanly (guards the abstraction).

Phase 2 — Public read API + proxy + client (backend/transport). Add GET api/track/{trackId}/waveform, the proxy forward, and TrackMediaClient.GetWaveformProfileAsync. Deliverable: a track's profile is fetchable end-to-end over HTTP. Can be tested with curl before any UI.

Phase 3 — Layout move (frontend, parallel with 1–2). Move SpectrumVisualizer from PlayerSeekZone into the volume cluster (renamed VolumeZone); adjust CSS (§3c); set BucketCount=24. Deliverable: live spectrum sits above the volume slider; seek zone temporarily keeps the MudSlider (or a placeholder). Player still fully works. This de-risks the layout independently of the new component.

Phase 4 — WaveformSeeker component (frontend, needs 2 + 3). Build WaveformSeeker.razor: DOM bars, played/unplayed split via clip overlay (§4d), pointer-capture seek (§4c), flat fallback (§4e), rendered bar count derived from width. Wire profile via player-service state (§8-A). Replace the MudSlider in PlayerSeekZone with it. Deliverable: the new seekbar is live for tracks that have a profile; flat-but-seekable for those that don't.

Phase 5 — CMS PreProcessing panel (CMS, after 1). In DeepDrftManager, add a PreProcessing feature to the CMS track grid: a column/indicator showing which tracks lack a waveform profile and a per-track Generate action (and/or a bulk "generate all missing" action). The grid queries missing-profile state and triggers generation through authenticated CMS API routes on DeepDrftAPI; the compute runs server-side via the same WaveformProfileService (no CLI). New surface roughly: a CMS service method on ICmsTrackService/CmsTrackService for list-missing + generate, the backing DeepDrftAPI routes (ApiKey), and the grid column/action in the Tracks CMS page. Deliverable: a CMS admin can see and one-click-fill any missing profile; the no-profile fallback becomes rare/never as the backlog is worked off in-app.

Deferred (not scheduled): live client-side loudness compute (§7), track-card mini-waveforms (§10), a LUFS ILoudnessAlgorithm (§5a/§10). Tracked here so the component contract stays origin-agnostic and the algorithm stays swappable.

13. What this plan deliberately does NOT do

Does not touch the streaming decode pipeline, seek-beyond-buffer, or WavOffsetService.
Does not add an audio-processing dependency (NAudio etc.) for the RMS path — the existing PCM parser suffices. (A future LUFS ILoudnessAlgorithm may revisit this, on its own merits.)
Does not compute the profile on the playback path — preprocessed only (the whole point).
Does not change TrackEntity's metadata contract — the profile lives in the vault sidecar.
Does not add a CLI; existing-track preprocessing is the in-CMS PreProcessing panel (§12 Phase 5).
Does not require TS bundle changes in the first cut.

34 KiB Raw Permalink Blame History Unescape Escape