27 KiB
Phase 21 — Windowed Streaming Buffer (bounded client memory for long streams)
Product spec. Status: design / framing — implementation-ready pending Daniel's open-question calls.
Author: product-designer. Date: 2026-06-23. No code has been written by this doc.
Surface: public listener site only (DeepDrftPublic.Client player stack + DeepDrftPublic
TypeScript audio interop). No CMS (DeepDrftManager) change. No data-model or schema change. The one
server touch is reuse, not new surface: the existing DeepDrftAPI HTTP Range: bytes=X-
partial-content primitive (Phase 4, landed) is the load-bearing dependency; this phase adds no new API
endpoint.
Sequencing dependency (Daniel, 2026-06-23): Phase 18 (Opus Low-Data Streaming) comes BEFORE this phase. Format support — specifically the derived Ogg Opus fullband 320 low-data delivery path (
product-notes/phase-18-opus-low-data-streaming.md) — is a prerequisite that sequences ahead of windowing. Phase 21's windowing must work across both delivery formats (lossless WAV and Opus). Its C5 invariant below already anticipated this ("must not foreclose MP3/FLAC"); Opus is now the concrete VBR/containerized driver of C5. Windowing an Opus stream uses the decoder's approximate byte↔time mapping (OpusFormatDecoder.calculateByteOffset— Ogg-page interpolation), exactly the C5 case — not the exact CBR-WAVbyteRatemath. Build the window machinery format-agnostically (§2 C3/C5) so it inherits Opus for free.
1. Goal
Bound the client memory a playing track consumes to a small, configurable forward window —
independent of total stream length — so a 1 GB+ DJ MIX (Phase 9 Mix medium: a single long track)
plays without the whole decoded PCM accumulating in the browser.
The defect, stated precisely. The network path already streams in adaptive 16–64 KB chunks
(StreamingAudioPlayerService.StreamAudioWithEarlyPlayback) — that part is fine. The accumulation is on
the decode side: PlaybackScheduler holds private buffers: AudioBuffer[] and never evicts
("Supports pause/resume/seek by retaining all buffers" — its own doc comment). Every 64 KB segment
the StreamDecoder decodes is pushed via addBuffer() and kept for the life of the track. Decoded PCM
is larger than the compressed-or-raw source in memory (Web Audio AudioBuffer is 32-bit float per
sample per channel — a 16-bit stereo WAV roughly doubles in size once decoded), so a 1 GB WAV becomes
~2 GB of retained AudioBuffer float data. That is the OOM.
One-line framing: today the player decodes the whole track into memory and keeps it; Phase 21 makes it keep only a sliding forward window and discard what has already played, refilling on demand from the Range primitive it already uses for seek.
2. Constraints / invariants (the contract that must hold)
These are non-negotiable. The §3.5 streaming seam (root CLAUDE.md "Streaming-first audio playback";
CONTEXT.md §3.5) is called the most architecturally load-bearing part of the playback path by both
docs. This phase modifies that seam — so the contract it must preserve is spelled out here.
- C1 — The seek-beyond-buffer Range path is the substrate, kept intact. Phase 4 landed HTTP
Range: bytes={offset}-→206 Partial Contentend to end (clientTrackMediaClient→DeepDrftPublicproxy →DeepDrftAPI), andStreamDecoder.reinitializeForRangeContinuationretains the parsed format header on a continuation body (no re-parse). Windowed refill is a generalization of this exact path (§3.1) — it must not require a second, divergent fetch mechanism. - C2 — Playback start latency unchanged. Today playback starts as soon as a configurable minimum buffer count is queued (header-derived duration, not full-file). The window model must keep first-audio latency at parity — bounding memory must not reintroduce a fetch-then-play stall.
- C3 — The format-decoder abstraction is untouched.
IFormatDecoderowns all format-specific byte math;AudioPlayer.createFormatDecoderalready dispatches onContent-Type(WAV/MP3/FLAC decoders all wired today — verified 2026-06-23; anOpusFormatDecoderjoins them in Phase 18). Windowing lives in the format-agnostic layer (PlaybackSchedulereviction +StreamDecoder/player refill orchestration); it must add no format-specific branches. A future wired MP3/FLAC decoder inherits windowing for free. - C4 — Read-only playback only. This is a memory-management change, not a UX change. No new user-visible control, no change to seek/transport semantics beyond what the listener already experiences. Seek must still feel identical.
- C5 — Must window both delivery formats (WAV lossless AND Opus low-data). Byte↔time mapping for
refill is exact and cheap for WAV (CBR:
byteRatefrom the header). For VBR/containerized formats it is approximate (the decoders carry TOC/SEEKTABLE/Ogg-page seek math). Phase 18 (Opus) is sequenced before this phase and is the concrete driver here: an Ogg Opus 320 stream is VBR and page-paged, so itscalculateByteOffsetis an approximate page-interpolation, not exact-offset. The window machinery must express refill purely in terms of the decoder's existingcalculateByteOffset, so the same code windows WAV exactly and Opus approximately — no WAV-special-cased offset math in the window layer. (MP3/FLAC decoders are already wired in the registry too — the registry dispatches on content-type today; anOpusFormatDecoderjoins them in Phase 18.) - C6 — No regression to the single-instance JS decoder concurrency guarantees. The current code is
careful that only one streaming loop touches the single JS
StreamDecoderat a time (DrainActiveStreamingTaskAsync, the_streamingCancellationidentity dance). Windowed refill introduces more mid-stream fetches; it must route through the same drain/cancellation discipline, not around it. - C7 — The Mix visualizer's data source is independent and must stay that way. The Phase 10/12
WebGL2 lava visualizer renders from a preprocessed high-res waveform datum fetched per-track
(
GET api/track/{entryKey}/waveform/high-res), not from live decoded PCM. Confirmed: evicting playedAudioBuffers cannot starve the visualizer — it never read them. The window model is invisible to the visualizer. (This is the canonical 1 GB case and the case that proves the eviction is safe.)
3. Architectural shape
3.0 The mental model
A track's audio is a byte range [0, fileLength) on disk. At any moment the listener is at playback
position P (seconds → byte offset via the format decoder). The player should hold decoded
AudioBuffers only for a bounded window roughly [P - back, P + ahead]:
- forward fill (
ahead) — enough decoded lookahead that playback never starves (covers the existing 500 ms scheduler lookahead plus network jitter headroom); - back-retain (
back) — a small amount of already-played audio kept so a short seek-back does not trigger a network refetch; - evict — anything older than
P - backis dropped (AudioBufferreferences released → GC reclaims the float data); - refill — when forward decoded lookahead drops below a low-water mark, fetch+decode more from the current byte position; when the window's tail is evicted and the listener seeks back past it, refetch that region via the Range primitive (the seek-beyond-buffer path, run backwards).
This is a ring/sliding-window buffer keyed on playback position, driven by high/low-water marks — the standard bounded-producer/bounded-consumer pattern, transplanted onto the decode→schedule seam.
3.1 Why this is a generalization of seek-beyond-buffer, not a new mechanism
The seek-beyond-buffer path already does every primitive the window needs, just triggered manually and one-shot:
| Window operation | Existing seek-beyond-buffer machinery it reuses |
|---|---|
| Discard buffers, keep offset | PlaybackScheduler.clearForSeek() + setPlaybackOffset() (clears buffers, retains the absolute-time anchor) |
| Fetch from a byte offset | TrackMediaClient.GetTrackMedia(key, byteOffset) → Range: bytes=X- → 206 |
| Decode a header-less body | StreamDecoder.reinitializeForRangeContinuation(remainingByteLength) |
| Map time → byte offset | StreamDecoder.calculateByteOffset() → IFormatDecoder.calculateByteOffset() |
| Single-loop safety on refetch | _streamingCancellation swap + DrainActiveStreamingTaskAsync() |
The difference is eviction does not exist yet (the scheduler only ever clear()s wholesale) and
refill is one-shot (a seek, not a continuous low-water-triggered loop). So the new work is two
seams: a partial-evict on the scheduler, and a position-driven refill controller on the player. The
fetch/decode/offset plumbing is reused verbatim.
3.2 The three candidate directions
Per file convention the alternatives are recorded; the recommendation follows.
Direction A — Sliding window on the existing single forward stream (recommended).
Keep the current model where the C# loop reads one forward HTTP stream and pumps chunks into the JS
decoder. Add two things: (1) PlaybackScheduler gains partial eviction — drop buffers whose
absolute-time end is older than P - back, adjusting its index bookkeeping so getCurrentPosition()
and scheduling stay correct against a buffer array that no longer starts at index 0; (2) a
back-pressure signal — when forward decoded lookahead exceeds the high-water mark, the C# loop
pauses reading the HTTP stream (stops calling ReadAsync) until playback drains it below low-water,
then resumes. Memory is bounded by high-water + back-retain. Seek-back beyond the retained window falls
through to the existing seek-beyond-buffer path unchanged.
Why recommended: smallest change to the load-bearing seam; reuses the live forward stream (no extra
connections in the common case); eviction and back-pressure are the only genuinely new mechanisms, and
both are local (one to the scheduler, one to the read loop). Back-pressure via "stop reading the socket"
is exactly how TCP flow control already wants to behave — pausing ReadAsync lets the kernel window
close; we are not fighting the transport.
Direction B — Discrete window segments, each its own Range fetch.
Treat the file as fixed-size byte segments (e.g. 4 MB). Hold N decoded segments around P; fetch the
next/previous segment via a fresh Range request as the window slides; discard the far segment. No live
long-lived forward stream — every window is an independent 206.
Why not (default): turns one connection into many short Range requests (more proxy hops through
DeepDrftPublic, more server-side WavOffsetService-style header synthesis, more places a fetch can
fail mid-stream — worsening the §1.6 error surface), and the byte↔time segment math must be exact at
every boundary. It is the cleaner model for true random-access (and the better base if seeking-heavy
usage dominates), so keep it as the fallback if Direction A's back-pressure proves leaky in practice.
Borrowed prior art: HLS/DASH segment windows and the MSE SourceBuffer.remove() eviction model — this
is how every production HTML5 adaptive player bounds memory. We are doing the hand-rolled equivalent
because the stack is a bespoke Web Audio graph, not <media> + MSE.
Direction C — Adopt MediaSource Extensions (MSE) and let the browser manage the buffer.
Stop hand-rolling the decode→schedule graph for long tracks; feed the Range stream into a SourceBuffer
and let the browser evict via its built-in quota + remove(). Memory management becomes the platform's
problem.
Why not — RESOLVED, rejected (Daniel, 2026-06-23; see OQ5): MSE does not accept raw WAV/PCM — it
wants containerized formats (fragmented MP4/WebM, or MP3/AAC elementary streams). The entire bespoke
visualizer/spectrum graph is wired to the Web Audio AudioContext, not a <media> element. Adopting
MSE is a rewrite of the playback substrate, not a windowing change. It looked like the real
long-term answer once compressed delivery arrived — but Daniel has decided compressed delivery
(Phase 18 Opus) will feed the same bespoke graph via the IFormatDecoder seam, so the
compressed-delivery move that would have justified MSE happens without surrendering the graph. The
bespoke graph is a deliberate long-term commitment; MSE is rejected. Direction A is therefore the
permanent destination, not a stopgap that MSE will retire. Recorded as considered-and-declined.
3.3 Recommended direction: A, with B held as the documented fallback
Direction A is the smallest coherent change that hits the headline (bounded memory under a 1 GB stream) while honoring C1–C7. It keeps the live forward stream, reuses the seek-beyond-buffer path for the only genuinely random-access case (seek-back past the retained tail), and isolates the two new mechanisms. The final architecture and the exact eviction/back-pressure API are staff-engineer's call at implementation (per file convention); this spec fixes the shape and the invariants, not the method signatures.
3.4 SOLID / road-not-taken rationale
- SRP, preserved. Eviction is a
PlaybackSchedulerconcern (it already owns buffer storage); refill orchestration is a player-service/StreamDecoderconcern (they already own the fetch loop); byte↔time math stays inIFormatDecoder. No responsibility crosses a boundary it does not already own. - OCP, via C3/C5. Windowing added in the format-agnostic layer means wiring MP3/FLAC later changes
zero window code. The window expresses refill through
calculateByteOffset— the one seam the decoders already implement. - The seam stays single-writer (C6). Every new refetch routes through the existing cancellation/drain discipline, so "only one loop touches the JS decoder" remains true. This is the rule most likely to be violated by a naive implementation and is called out as a hard invariant.
- Road not taken — eager full decode with a memory cap that just stops decoding. Tempting (decode until you hit a byte budget, then stop) but it breaks playback of long tracks past the cap entirely — it bounds memory by refusing to play the rest, not by sliding. Rejected: it is a degradation, not a feature.
4. Use cases
- UC1 — Play a 1 GB+ DJ MIX start to finish (the headline). Memory stays bounded throughout; the listener experiences continuous playback identical to a short track.
- UC2 — Seek forward within a long track. Already handled by seek-beyond-buffer; under windowing the forward seek clears the window and refills at the target — no behavior change, now with eviction so the pre-seek region does not linger.
- UC3 — Seek back a few seconds. Served from the back-retain window with no network refetch
(the reason
backexists). - UC4 — Seek back far, past the evicted tail. Falls through to the existing seek-beyond-buffer Range fetch, run toward an earlier offset. (Open question OQ2 — see §6.)
- UC5 — Pause a long track for a long time. Memory stays at the bounded window size while paused (no continued decode). On resume, forward fill restarts from the low-water trigger.
- UC6 — Mix detail page with the lava visualizer running. Visualizer reads its preprocessed datum (C7); windowing is invisible to it. Confirmed non-interaction.
5. Interaction with the deferred Phase 1 streaming features
This phase touches the same decoder/scheduler seam as the deferred Phase 1.3/1.4/1.5 items and the 1.6/1.7 robustness items. The interactions, explicitly:
- 1.3 Preload / prefetch (deferred; preload half). Shares machinery, does not conflict — and should be sequenced after. Preload stages the next track into a second decoder instance during the current track's tail; windowing bounds the current track's forward buffer. They are orthogonal axes (next-track vs. current-track-window), but they compound the memory question: a naive preload of a second 1 GB mix would reintroduce the OOM this phase fixes. Recommendation: land windowing first, so that when preload arrives, the staged next-track decoder is also windowed by construction (it inherits the bounded scheduler). Windowing makes preload safe for long tracks; without it, preload of mixes is a memory hazard.
- 1.4 Crossfade (deferred). Needs two simultaneous
PlaybackSchedulerinstances briefly overlapping. Both would be windowed instances — the overlap doubles the window size momentarily, not the whole track. Windowing makes crossfade between two long mixes affordable. No reordering needed; 1.4 still gates on 1.3. - 1.5 Gapless (deferred). Sample-accurate hand-off of the next track's first buffer at the current track's last buffer. Windowing changes which buffers are retained but not the hand-off mechanism; the only care point is that the current track's final window must not be evicted before the gapless boundary is scheduled. A minor invariant for whoever builds 1.5, not a blocker. Note 1.5's existing WAV-only caveat stands.
- 1.6 Track-skip on error (deferred). Windowing enlarges the error surface — call this out. Today
a fetch failure happens at load (one fetch) or at a user seek (one fetch). Windowed refill issues
mid-stream fetches the listener did not initiate; one of those can fail at byte 700 M of a 1 GB
mix. So Phase 21 should ship with at least the cheap half of 1.6: a mid-stream refill failure must
surface a clear error and not wedge the player (it must not leave playback "running" with a starved
scheduler — mirror the
playFromPositionend-of-buffer recovery already inPlaybackScheduler). The rich half (byte-scan to next valid frame) stays deferred. Recommendation: fold the minimal refill- failure handling into Phase 21's acceptance criteria (AC6) rather than leaving it entirely to 1.6 — it is created by this phase. - 1.7 Safari compatibility (deferred). Windowing adds no new Safari-specific surface beyond what the
streaming path already has. The one adjacency: more frequent
AudioContextactivity during refill should be checked against the older-SafariwebkitAudioContextquirks when 1.7 is addressed — note it, do not block on it.
6. Open questions for Daniel (genuine product decisions, not implementation detail)
These are policy calls with user-visible or resource trade-offs — flagged rather than decided here.
- OQ1 — Window size policy. What bounds the window — a fixed byte/time budget (e.g. "hold at
most ~30 s decoded ahead + ~10 s behind"), or a configurable memory budget (e.g. "≤ N MB of
decoded PCM") that derives the time window from the stream's byte rate? Recommend a time-based
forward window + small time-based back-retain as the primary knob (intuitive, format-portable), with
a hard memory ceiling as a secondary guard. The exact numbers are tunable post-landing; Daniel
picks the policy axis.
[Daniel decision] - OQ2 — Seek-back past the evicted window. When the listener seeks back earlier than the retained
tail, we must refetch (the audio is gone). Acceptable to take the same brief re-buffer the forward
seek-beyond-buffer takes today? (Recommend yes — it is the symmetric case and listeners already accept
it forward.) Or should back-retain be generous enough that this is rare?
[Daniel decision] - OQ3 — Configurable total in-flight memory cap. Should there be a single hard byte ceiling on total
decoded audio held by the player (a safety net independent of the window-size policy), exposed as a
config value? Recommend yes, as a guard rail even if the window policy is time-based — it is the
backstop that makes "1 GB stream never OOMs" a guarantee rather than a tuning hope.
[Daniel decision] - OQ4 — Apply windowing to all tracks, or only long ones? A 3-minute Cut decoded whole is ~30–60 MB
— harmless today. Windowing everything is simpler (one code path) but adds refill machinery to short
tracks that never needed it. Recommend window everything (one path, C6-safe, and short tracks
simply never hit a refill because they fit inside the forward window) — but Daniel may prefer a
size threshold.
[Daniel decision] - OQ5 — Is MSE (Direction C) the real destination? — RESOLVED: NO (Daniel, 2026-06-23). Do not
adopt MSE. The bespoke Web Audio decode→schedule graph stays — it is bespoke by deliberate choice, a
long-term commitment, not a stopgap. Daniel's rationale: the player is intentionally a custom
graph, not an HTML
<media>element; the compressed-delivery move that would have made MSE tempting is being met instead by Phase 18 (Opus low-data path) feeding the same bespoke graph through theIFormatDecoderseam — so compressed delivery arrives without surrendering the graph. Consequence for this phase: Direction A (the hand-rolled sliding window) is the destination, not a placeholder; invest in it as permanent machinery. It will window both the WAV and the Opus path (the sequencing note at the top). Direction C is recorded as considered and declined per file convention; kept visible so a future reader sees the road not taken and why.[RESOLVED — bespoke graph retained; MSE rejected]
7. Acceptance criteria
- AC1 (headline) — Bounded memory under a 1 GB stream. Playing a 1 GB+ WAV mix start to finish, the browser tab's retained decoded-audio memory stays bounded to the configured window (not growing toward ~2 GB). Verifiable via browser memory tooling: peak decoded-audio footprint is independent of track length and tracks the window-size policy, not the file size.
- AC2 — Playback-start latency at parity (C2). First-audio latency for a track is unchanged from pre-windowing (within noise). Windowing does not introduce a fetch-then-play stall.
- AC3 — Continuous playback, no starvation. A long mix plays edge to edge with no audible gaps, underruns, or stalls under normal network conditions — the forward fill stays ahead of the playhead.
- AC4 — Seek-back within the window is instant (UC3). A short backward seek into retained audio produces no network request.
- AC5 — Seek (forward, and back past the window) still works (UC2/UC4). Both resolve via the existing Range path with the same behavior the listener sees today; the pre-seek region is evicted, not retained.
- AC6 — A mid-stream refill failure degrades cleanly (the 1.6 adjacency). A failed refill fetch surfaces a clear user-visible error and leaves the player in a recoverable state (not a wedged "playing" with a starved scheduler). It must not silently hang.
- AC7 — The Mix visualizer is unaffected (C7). With the lava visualizer running on a long mix, the visualizer renders identically (it reads the preprocessed datum, never the evicted buffers).
- AC8 — Single-decoder concurrency invariant holds (C6). Under rapid seek + refill activity, no
interleaved
ProcessStreamingChunkcalls corrupt the single JS decoder (the existing drain/cancel discipline still governs every fetch).
8. Wave decomposition
Dependency shape: 21.1 → 21.2 → 21.3, with 21.4 validating the whole. 21.1 is the cold-start
prerequisite and the load-bearing change; the rest layer on it.
- 21.1 — Partial eviction in
PlaybackScheduler(cold-start; the load-bearing change). Give the scheduler the ability to drop already-played buffers and keep its position/index bookkeeping correct against a buffer array that no longer begins at absolute time 0 (todaygetCurrentPosition,playFromPosition, and the scheduling loop all assumebuffers[0]is the track start). This is the hardest correctness work in the phase — the time-anchor math must stay exact through eviction. No refill yet; with eviction alone and the forward read loop unchanged, this is provably memory-bounded for the played region. Independent of the §6 open questions — it can begin immediately; the window sizes (OQ1/OQ3) are parameters fed in later. Settled and cold-start. - 21.2 — Back-pressure on the forward read loop (the bound on the unplayed region). Make the C#
StreamAudioWithEarlyPlaybackloop stop callingReadAsyncwhen forward decoded lookahead exceeds the high-water mark, and resume below low-water. Together with 21.1, this bounds both the played and unplayed sides — the full memory guarantee (AC1). Must route resume/pause through the existing cancellation-safe single-loop discipline (C6). Depends on 21.1 (eviction must exist so the drained region is reclaimed, not merely un-read). - 21.3 — Seek-back-past-window refill (close the random-access case). Wire UC4 — when a backward seek lands earlier than the retained tail, refetch via the existing seek-beyond-buffer Range path pointed at the earlier offset, and the minimal AC6 refill-failure handling. Mostly reuse of the landed seek path; the new work is the trigger (window-miss detection) and the clean-failure path. Depends on 21.1 + 21.2 (needs the window boundaries they define).
- 21.4 — Validation pass against the 1 GB target (acceptance). Exercise AC1–AC8 against a real 1 GB+ mix: memory profiling (AC1), latency parity (AC2), edge-to-edge playback (AC3), the seek matrix (AC4/AC5), induced refill failure (AC6), visualizer-running (AC7), and rapid-seek concurrency (AC8). Largely test/measurement; any break is likely a tuning fix in the 21.1 anchor math or the 21.2 water-marks. Depends on 21.1–21.3.
9. Cross-references (read before implementing)
- Root
CLAUDE.md"Streaming-first audio playback" /CONTEXT.md §3.5— the seam this phase modifies; the §2 invariants here restate its contract. Both flag it as the most load-bearing path. PLAN.mdPhase 4 (landed) /COMPLETED.md— the HTTP Rangebytes=X-primitive this generalizes.PLAN.mdPhase 1.3 / 1.4 / 1.5 / 1.6 / 1.7 — the deferred decoder/scheduler-seam features; §5 above reconciles each.PLAN.mdPhase 9 — defines theMixmedium (single long track), the canonical 1 GB case.PLAN.mdPhase 10 /product-notes/phase-10-mix-visualizer-lava-reframe.md/product-notes/phase-12-waveform-visualizer-generalization.md— establishes the preprocessed per-track high-res waveform datum; the basis for C7 (visualizer does not read live PCM).DeepDrftPublic/Interop/audio/PlaybackScheduler.ts— owns the unboundedbuffers: AudioBuffer[]; 21.1 lives here.DeepDrftPublic/Interop/audio/StreamDecoder.ts—reinitializeForRangeContinuation,calculateByteOffset; the refill substrate.DeepDrftPublic.Client/Services/StreamingAudioPlayerService.cs— the C# forward read loop (StreamAudioWithEarlyPlayback), the seek-beyond-buffer path (SeekBeyondBuffer), and the cancellation/drain discipline (C6); 21.2/21.3 live here.DeepDrftPublic.Client/Clients/TrackMediaClient.cs— the Range-capable media fetch reused by refill.