Files
deepdrft/product-notes/phase-21-windowed-streaming-buffer.md
daniel-c-harvey 036ee1f78e docs: record Phase 21 (windowed streaming) as landed; note Direction A to B pivot
Move Phase 21 from PLAN to COMPLETED with the as-built record, and annotate
the spec that Direction B shipped after WASM fetch buffering defeated A.
2026-06-24 16:05:30 -04:00

46 KiB
Raw Permalink Blame History

Phase 21 — Windowed Streaming Buffer (bounded client memory for long streams)

Product spec. Status: LANDED 2026-06-24 on streaming-overhaul. See COMPLETED.md for the full as-built record. Author: product-designer. Date: 2026-06-23 (reconciliation pass after Phase 18 landed).

AS-BUILT NOTE — Direction A→B pivot (2026-06-24). This spec recommended Direction A (sliding window on one open-ended forward stream, pausing ReadAsync/the segment loop to backpressure the socket) and held Direction B (discrete bounded Range: bytes=start-end segments, §3.2) as the documented fallback. 21.4 browser validation proved Direction A insufficient for Blazor WASM: the browser fetch API buffers the entire HTTP response body regardless of read pace — pausing reads bounded the decode but not the network download, so the whole ~970 MB body accumulated in browser memory even with the application decoding only a window of it. We shipped Direction B. The forward stream now issues sequential 4 MB bounded Range requests (SegmentSizeBytes = 4 MB), fetched via RunSegmentedStreamAsync in StreamingAudioPlayerService, each issued only after PlaybackScheduler.evaluateProductionPause() clears below low-water. Browser holds ~one 4 MB segment of raw bytes; 21.4 confirmed network-memory bounding in Daniel's browser run. The decode-side windowing (21.1/21.2) is unchanged and pairs with Direction B; seek/refill converge on the same segmented loop via RecoverFromFailedRefill. Direction A is recorded as tried-in-validation and found insufficient for the WASM fetch runtime. Sections §3.2–§3.3 below retain the original A vs. B vs. C analysis as the decision record; Direction B is what shipped.

Surface: public listener site only (DeepDrftPublic.Client player stack + DeepDrftPublic TypeScript audio interop). No CMS (DeepDrftManager) change. No data-model or schema change. The one server touch is reuse, not new surface: the existing DeepDrftAPI HTTP Range: bytes=X- partial-content primitive (Phase 4, landed) is the load-bearing dependency; this phase adds no new API endpoint.

Phase 18 (Opus Low-Data Streaming) has LANDED (2026-06-23, COMPLETED.md). This spec is reconciled to the as-built reality. Phase 18 changed the landscape in two ways that reshape this phase:

  1. There are now TWO decode paths feeding the one PlaybackScheduler, not one. (a) The original WAV/MP3/FLAC path — StreamDecoderIFormatDecoder (wrap-each-segment + decodeAudioData). (b) A new Opus path — OggDemuxerOpusStreamDecoder (the IStreamingDecoder seam, a stateful WebCodecs AudioDecoder pipeline). The §3.1 unbounded-memory root cause (the scheduler's push-only AudioBuffer[]) applies to both — but the Opus path adds a second accumulation locus upstream of the scheduler (the WebCodecs decode queue + decodedQueue: AudioData[]), so windowing it is not the same mechanism as windowing WAV. See §3.1.
  2. The accurate index-driven Opus seek the original spec assumed Phase 21 would build is ALREADY LIVE. Phase 18 ships resolveOpusByteOffset (binary-search the precomputed seek index in OpusSeekData) → Range fetch → OpusStreamDecoder.reinitializeForRangeContinuation(landingTime, target) with frame-accurate lead-trim. Opus seek is accurate, not approximate — and already shipping. Phase 21 does not build Opus seek; it reuses that live seek for window-miss refills.

Correction of stale spec language. The original draft described Opus as a future-wired OpusFormatDecoder.calculateByteOffset joining the IFormatDecoder registry, with seek as "approximate vs accurate." All of that is now wrong against the landed code: Opus does not use IFormatDecoder (it diverged to the IStreamingDecoder/WebCodecs seam precisely because per-segment decodeAudioData is architecturally wrong for Opus — see IStreamingDecoder.ts), and its seek is accurate and shipping. The body below is rewritten to the two-path reality. The headline is unchanged: bound client memory to a sliding window regardless of stream length, for the canonical 1 GB mix, across both delivery formats.


1. Goal

Bound the client memory a playing track consumes to a small, configurable forward window — independent of total stream length — so a 1 GB+ DJ MIX (Phase 9 Mix medium: a single long track) plays without the whole decoded PCM accumulating in the browser.

The defect, stated precisely — and it now has two faces, one shared. The network path already streams in adaptive 1664 KB chunks (StreamingAudioPlayerService.StreamAudioWithEarlyPlayback) — that part is fine. The accumulation is on the decode side, and Phase 18 split the decode side into two pipelines that both terminate at the same sink:

  • The shared sink (both paths) — the unbounded scheduler. PlaybackScheduler holds private buffers: AudioBuffer[] and never evicts ("Supports pause/resume/seek by retaining all buffers" — its own doc comment). Both decode paths call scheduler.addBuffer() (via AudioPlayer.processFormatChunk for WAV/MP3/FLAC and processOpusChunk for Opus); nothing is ever removed. Decoded PCM is larger than the source in memory (Web Audio AudioBuffer is 32-bit float per sample per channel — a 16-bit stereo WAV roughly doubles once decoded; Opus decodes to the same 48 kHz float PCM regardless of how few bytes the compressed stream was). So a 1 GB WAV becomes ~2 GB of retained float, and a low-data Opus mix becomes the same ~2 GB of decoded float once played — the compressed transfer is small, but the decoded footprint is identical. The scheduler is the OOM for both. This is the §3.1 root cause, unchanged from the original spec — it just now afflicts two producers.
  • The Opus-only second locus — upstream decode-ahead. The Opus path accumulates before the scheduler too: the WebCodecs AudioDecoder work queue (decodeQueueSize), the decodedQueue: AudioData[] awaiting conversion, and the OggDemuxer's partial-page state. Bounding the scheduler alone does not bound these — they fill from the same C# ReadAsync loop, so they need their own back-pressure (on the demuxer/decoder feed), not only the read loop's. WAV has no equivalent upstream queue (its StreamDecoder decodes synchronously into the scheduler), so this is genuinely Opus-specific.

One-line framing: today the player decodes the whole track into memory and keeps it — true for both formats; Phase 21 makes it keep only a sliding forward window and discard what has already played, refilling on demand from the Range primitive both paths already use for seek (WAV via IFormatDecoder, Opus via the live index-driven resolveOpusByteOffset).


2. Constraints / invariants (the contract that must hold)

These are non-negotiable. The §3.5 streaming seam (root CLAUDE.md "Streaming-first audio playback"; CONTEXT.md §3.5) is called the most architecturally load-bearing part of the playback path by both docs. This phase modifies that seam — so the contract it must preserve is spelled out here.

  • C1 — The seek-beyond-buffer Range path is the substrate, kept intact. Phase 4 landed HTTP Range: bytes={offset}-206 Partial Content end to end (client TrackMediaClientDeepDrftPublic proxy → DeepDrftAPI), and StreamDecoder.reinitializeForRangeContinuation retains the parsed format header on a continuation body (no re-parse). Windowed refill is a generalization of this exact path (§3.1) — it must not require a second, divergent fetch mechanism.
  • C2 — Playback start latency unchanged. Today playback starts as soon as a configurable minimum buffer count is queued (header-derived duration, not full-file). The window model must keep first-audio latency at parity — bounding memory must not reintroduce a fetch-then-play stall.
  • C3 — Neither decoder seam's contract is forked; windowing lives in the shared layer plus a thin per-seam hook. There are two decoder seams as of Phase 18: IFormatDecoder (WAV/MP3/FLAC, owns format byte math; AudioPlayer.createFormatDecoder dispatches on Content-Type) and IStreamingDecoder (Opus, the WebCodecs pipeline; selected in initializeStreaming when the content type is audio/ogg/audio/opus and a sidecar is present). The eviction half of windowing is fully shared — it lives in PlaybackScheduler, which both seams feed identically via addBuffer, so eviction adds zero format branches. The back-pressure / decode-ahead half is necessarily seam-aware — the WAV path back-pressures the C# ReadAsync loop; the Opus path must additionally bound the WebCodecs decode-ahead and the decodedQueue (§3.1). Express that as a small uniform signal ("the scheduler is full, stop producing") that each decode path honors in its own way, rather than a windowing controller that reaches into either decoder's internals. The goal the original C3 stated still holds — no format-specific logic leaking into the scheduler — but the spec now acknowledges the producer side has two shapes, not one.
  • C4 — Read-only playback only. This is a memory-management change, not a UX change. No new user-visible control, no change to seek/transport semantics beyond what the listener already experiences. Seek must still feel identical.
  • C5 — Window both decode paths without forking the scheduler/seam, reusing the live index-driven seek for refill. Both delivery formats must be windowed, and the byte↔time mapping each refill needs is already accurate and already shipping for both:
    • WAV/MP3/FLACIFormatDecoder.calculateByteOffset (CBR byteRate for WAV; the MP3/FLAC seek accelerators for those), reached through StreamDecoder.calculateByteOffset / AudioPlayer.seekBeyondBuffer.
    • OpusresolveOpusByteOffset(activeOpusSidecar, t) (binary search the precomputed granule→byte seek index in OpusSeekData), returning an exact page-start offset and a landingTimeSeconds for the decoder's frame-accurate lead-trim. This is accurate, not approximate, and landed in Phase 18. Phase 21 does not build either mapping. The window's refill trigger calls whichever resolver the active path already uses — for Opus, the same resolveOpusByteOffset an explicit listener seek calls (the live path in AudioPlayer.seekBeyondBuffer), so windowed refill is literally "a seek the listener didn't initiate." A window opening away from byte 0 decodes correctly on the Opus path because the setup header (OpusHead/OpusTags) is already cached from the sidecar and re-applied by reinitializeForRangeContinuation (Phase 18 §3.4a B); the WAV path re-applies its retained header the same way. No new offset math, no approximation, no header re-fetch — all reused. The invariant is therefore not "make refill format-agnostic" (the two paths legitimately resolve offsets through different code); it is "reuse the live seek of each path verbatim; add only the eviction and the refill trigger, never a second seek mechanism."
  • C6 — No regression to the single-writer decoder concurrency guarantee — now covering both decoders. The C# loop is careful that only one streaming task feeds the active JS decoder at a time (DrainActiveStreamingTaskAsync, the _streamingCancellation identity dance in StreamingAudioPlayerService). This matters more for Opus: the WebCodecs AudioDecoder is stateful and async — a reset()+configure() on a range-continuation (reinitializeForRangeContinuation) racing a still-draining push() from a stale loop would corrupt inter-frame state, not merely deliver a wrong buffer. Windowed refill introduces more mid-stream fetches against whichever decoder is active; every one must route through the same drain/cancellation discipline, not around it. The discipline is already decoder-agnostic at the C# layer (it cancels the loop, not the decoder), so this is a "keep using it" invariant — but it is the rule most likely to be violated by a naive Opus refill, and is the hardest failure to diagnose, so it is called out as a hard invariant for both paths.
  • C7 — The Mix visualizer's data source is independent and must stay that way. The Phase 10/12 WebGL2 lava visualizer renders from a preprocessed high-res waveform datum fetched per-track (GET api/track/{entryKey}/waveform/high-res), not from live decoded PCM. Confirmed: evicting played AudioBuffers cannot starve the visualizer — it never read them. The window model is invisible to the visualizer. (This is the canonical 1 GB case and the case that proves the eviction is safe.)

3. Architectural shape

3.0 The mental model

A track's audio is a byte range [0, fileLength) on disk. At any moment the listener is at playback position P (seconds → byte offset via the active path's resolver — IFormatDecoder.calculateByteOffset for WAV/MP3/FLAC, resolveOpusByteOffset over the seek index for Opus). The player should hold decoded AudioBuffers only for a bounded window roughly [P - back, P + ahead] — and, on the Opus path, keep the upstream WebCodecs decode queue near-empty too (§3.1):

  • forward fill (ahead) — enough decoded lookahead that playback never starves (covers the existing 500 ms scheduler lookahead plus network jitter headroom);
  • back-retain (back) — a small amount of already-played audio kept so a short seek-back does not trigger a network refetch;
  • evict — anything older than P - back is dropped (AudioBuffer references released → GC reclaims the float data);
  • refill — when forward decoded lookahead drops below a low-water mark, fetch+decode more from the current byte position; when the window's tail is evicted and the listener seeks back past it, refetch that region via the Range primitive (the seek-beyond-buffer path, run backwards).

This is a ring/sliding-window buffer keyed on playback position, driven by high/low-water marks — the standard bounded-producer/bounded-consumer pattern, transplanted onto the decode→schedule seam.

3.1 Why refill is a generalization of seek-beyond-buffer, not a new mechanism — for both paths

The seek-beyond-buffer path already does every refill primitive the window needs, just triggered manually and one-shot. As of Phase 18 each primitive has a WAV branch and an Opus branch, both live:

Window operation WAV/MP3/FLAC machinery reused Opus machinery reused (Phase 18, landed)
Discard buffers, keep offset PlaybackScheduler.clearForSeek() + setPlaybackOffset() same — the scheduler is shared
Fetch from a byte offset TrackMediaClientRange: bytes=X- → 206 same (with ?format=opus) — the Range path is shared
Map time → byte offset StreamDecoder.calculateByteOffset()IFormatDecoder resolveOpusByteOffset(activeOpusSidecar, t) (index binary search → exact page)
Decode a header-less body StreamDecoder.reinitializeForRangeContinuation(len) OpusStreamDecoder.reinitializeForRangeContinuation(landingTime, target) (demux/codec reset + lead-trim)
Single-loop safety on refetch _streamingCancellation swap + DrainActiveStreamingTaskAsync() same — the C# discipline is decoder-agnostic

The genuinely-new work, by path:

  • Shared (both paths): partial eviction on PlaybackScheduler (today it only ever clear()s wholesale), and a position-driven refill trigger (a continuous low-water loop, not a one-shot seek).
  • WAV path: back-pressure on the C# ReadAsync loop — stop reading the socket above the high-water mark, resume below low-water. WAV's StreamDecoder decodes synchronously into the scheduler, so the read loop is the only producer to throttle; pausing ReadAsync bounds it fully.
  • Opus path: the same C# back-pressure, plus bounding the WebCodecs decode-ahead. Throttling ReadAsync alone is not sufficient for Opus, because OpusStreamDecoder.push() is async and the WebCodecs AudioDecoder keeps its own internal work queue (decodeQueueSize) plus a decodedQueue: AudioData[] of decoded-but-not-yet-converted frames. The Opus producer must also stop feeding the decoder (stop demuxing/decoding new packets) when the scheduler is full, and resume below low-water — back-pressure on the demuxer/decoder feed, not only on the socket read. This is the one place the two paths' windowing genuinely diverges.

Everything else — the fetch, the offset resolution, the header-carry continuation, the single-loop cancellation safety — is reused verbatim on both paths. Phase 21 builds eviction + the refill trigger

  • the (per-path) back-pressure; it builds no new fetch, offset, or seek mechanism.

3.2 The three candidate directions

Per file convention the alternatives are recorded; the recommendation follows.

Direction A — Sliding window on the existing single forward stream (recommended). Keep the current model where the C# loop reads one forward HTTP stream and pumps chunks into the active JS decoder. Add three things: (1) PlaybackScheduler gains partial eviction — drop buffers whose absolute-time end is older than P - back, adjusting its index bookkeeping so getCurrentPosition() and scheduling stay correct against a buffer array that no longer starts at index 0 (shared by both paths — the scheduler is the common sink); (2) back-pressure on the C# read loop — when forward decoded lookahead exceeds the high-water mark, the C# loop pauses reading the HTTP stream (stops calling ReadAsync) until playback drains it below low-water, then resumes; (3) for the Opus path only, back-pressure on the WebCodecs decode-ahead — the producer also stops demuxing/decoding new packets when the scheduler is full, so the AudioDecoder work queue and decodedQueue do not balloon behind a throttled socket. Memory is bounded by high-water + back-retain on both paths. Seek-back beyond the retained window falls through to the existing seek-beyond-buffer path (the right one per format) unchanged. Why recommended: smallest change to the load-bearing seam; reuses the live forward stream (no extra connections in the common case); eviction and back-pressure are the only genuinely new mechanisms, all local (the scheduler; the read loop; for Opus, the demux/decode feed). Back-pressure via "stop reading the socket" is exactly how TCP flow control already wants to behave — pausing ReadAsync lets the kernel window close; we are not fighting the transport. The Opus decode-ahead bound is the one addition Phase 18 forces, and it is local to the Opus producer. Open question it raises (OQ6, new): whether the two paths' back-pressure is driven by one shared window controller that exposes a "scheduler full / drained" signal both producers poll, or by two parallel implementations sharing only the eviction code. Recommend the shared signal — see §6 OQ6.

Direction B — Discrete window segments, each its own Range fetch. Treat the file as fixed-size byte segments (e.g. 4 MB). Hold N decoded segments around P; fetch the next/previous segment via a fresh Range request as the window slides; discard the far segment. No live long-lived forward stream — every window is an independent 206. Why not (default): turns one connection into many short Range requests (more proxy hops through DeepDrftPublic, more server-side WavOffsetService-style header synthesis, more places a fetch can fail mid-stream — worsening the §1.6 error surface), and the byte↔time segment math must be exact at every boundary. It is the cleaner model for true random-access (and the better base if seeking-heavy usage dominates), so keep it as the fallback if Direction A's back-pressure proves leaky in practice. Borrowed prior art: HLS/DASH segment windows and the MSE SourceBuffer.remove() eviction model — this is how every production HTML5 adaptive player bounds memory. We are doing the hand-rolled equivalent because the stack is a bespoke Web Audio graph, not <media> + MSE.

Direction C — Adopt MediaSource Extensions (MSE) and let the browser manage the buffer. Stop hand-rolling the decode→schedule graph for long tracks; feed the Range stream into a SourceBuffer and let the browser evict via its built-in quota + remove(). Memory management becomes the platform's problem. Why not — RESOLVED, rejected (Daniel, 2026-06-23; see OQ5): MSE does not accept raw WAV/PCM — it wants containerized formats (fragmented MP4/WebM, or MP3/AAC elementary streams). The entire bespoke visualizer/spectrum graph is wired to the Web Audio AudioContext, not a <media> element. Adopting MSE is a rewrite of the playback substrate, not a windowing change. It looked like the real long-term answer once compressed delivery arrived — but compressed delivery (Phase 18 Opus, now landed) feeds the same bespoke graph via the WebCodecs IStreamingDecoder seam (parallel to the WAV IFormatDecoder seam, both terminating at the shared PlaybackScheduler), so the compressed-delivery move that would have justified MSE happened without surrendering the graph. Notably, Phase 18 chose a WebCodecs AudioDecoder for Opus rather than decodeAudioData — which is itself the "use the platform codec, keep the bespoke graph" move, but at the decoder granularity, not the media-element granularity MSE would impose. The bespoke graph is a deliberate long-term commitment; MSE is rejected. Direction A is therefore the permanent destination, not a stopgap that MSE will retire. Recorded as considered-and-declined.

Direction A is the smallest coherent change that hits the headline (bounded memory under a 1 GB stream) while honoring C1C7. It keeps the live forward stream, reuses each path's seek-beyond-buffer machinery for the only genuinely random-access case (seek-back past the retained tail), and isolates the new mechanisms (eviction shared; back-pressure per path). The final architecture and the exact eviction/back-pressure API are staff-engineer's call at implementation (per file convention); this spec fixes the shape and the invariants, not the method signatures.

3.4 SOLID / road-not-taken rationale

  • SRP, preserved. Eviction is a PlaybackScheduler concern (it already owns buffer storage, and is the single shared sink both decode paths feed); refill orchestration is a player-service concern (it already owns the C# fetch loop and the seek dispatch); byte↔time math stays where each path already keeps it — IFormatDecoder.calculateByteOffset for WAV/MP3/FLAC, resolveOpusByteOffset (over OpusSeekData) for Opus. No responsibility crosses a boundary it does not already own.
  • OCP, via the shared sink + the live per-path seek. Eviction added at the scheduler changes zero decoder code on either path. Refill reuses each path's already-implemented offset resolver — Phase 21 adds no offset math to either seam. The one place windowing is not purely additive is the Opus decode-ahead bound (§3.1), which lives inside the Opus producer, not in the shared layer.
  • The seam stays single-writer (C6) — for both decoders. Every new refetch routes through the existing C# cancellation/drain discipline, so "only one loop feeds the active decoder" remains true for the WAV StreamDecoder and the stateful Opus AudioDecoder alike. This is the rule most likely to be violated by a naive Opus refill (a stale push() racing a reset()+configure()), and is called out as a hard invariant.
  • Road not taken — eager full decode with a memory cap that just stops decoding. Tempting (decode until you hit a byte budget, then stop) but it breaks playback of long tracks past the cap entirely — it bounds memory by refusing to play the rest, not by sliding. Rejected: it is a degradation, not a feature.

4. Use cases

  • UC1 — Play a 1 GB+ DJ MIX start to finish (the headline). Memory stays bounded throughout; the listener experiences continuous playback identical to a short track. Holds in both formats — the lossless WAV mix (~2 GB decoded if unbounded) and the low-data Opus mix (small transfer, but the same ~2 GB decoded float once played, so it needs windowing just as much; see §1).
  • UC1-Opus — The same mix streamed as Opus, windowed. The low-data win (Phase 18) shrinks the transfer; Phase 21 shrinks the decoded footprint. The two compound: a metered-connection listener on Opus gets both the small download and the bounded memory. Windowing the Opus path additionally bounds the WebCodecs decode-ahead and decodedQueue, not only the scheduler (§3.1).
  • UC2 — Seek forward within a long track. Already handled by seek-beyond-buffer (the right resolver per format — IFormatDecoder for WAV, the live resolveOpusByteOffset for Opus); under windowing the forward seek clears the window and refills at the target — no behavior change, now with eviction so the pre-seek region does not linger.
  • UC3 — Seek back a few seconds. Served from the back-retain window with no network refetch (the reason back exists).
  • UC4 — Seek back far, past the evicted tail. Falls through to the existing seek-beyond-buffer Range fetch, run toward an earlier offset. (Open question OQ2 — see §6.)
  • UC5 — Pause a long track for a long time. Memory stays at the bounded window size while paused (no continued decode). On resume, forward fill restarts from the low-water trigger.
  • UC6 — Mix detail page with the lava visualizer running. Visualizer reads its preprocessed datum (C7); windowing is invisible to it. Confirmed non-interaction.

5. Interaction with the deferred Phase 1 streaming features

This phase touches the same decoder/scheduler seam as the deferred Phase 1.3/1.4/1.5 items and the 1.6/1.7 robustness items. The interactions, explicitly:

  • 1.3 Preload / prefetch (deferred; preload half). Shares machinery, does not conflict — and should be sequenced after. Preload stages the next track into a second decoder instance during the current track's tail; windowing bounds the current track's forward buffer. They are orthogonal axes (next-track vs. current-track-window), but they compound the memory question: a naive preload of a second 1 GB mix would reintroduce the OOM this phase fixes. Recommendation: land windowing first, so that when preload arrives, the staged next-track decoder is also windowed by construction (it inherits the bounded scheduler). Windowing makes preload safe for long tracks; without it, preload of mixes is a memory hazard.
  • 1.4 Crossfade (deferred). Needs two simultaneous PlaybackScheduler instances briefly overlapping. Both would be windowed instances — the overlap doubles the window size momentarily, not the whole track. Windowing makes crossfade between two long mixes affordable. No reordering needed; 1.4 still gates on 1.3.
  • 1.5 Gapless (deferred). Sample-accurate hand-off of the next track's first buffer at the current track's last buffer. Windowing changes which buffers are retained but not the hand-off mechanism; the only care point is that the current track's final window must not be evicted before the gapless boundary is scheduled. A minor invariant for whoever builds 1.5, not a blocker. Phase 18 note: the former "1.5 is WAV-only" caveat is superseded — Opus is live, and it has its own encoder pre-skip/priming (handled once by the WebCodecs decoder, see OpusStreamDecoder.ts), so a gapless Opus hand-off must respect the end-trim against the sidecar's authoritative total length. That is 1.5's problem to absorb, not Phase 21's; flagged so 1.5 inherits it.
  • 1.6 Track-skip on error (deferred). Windowing enlarges the error surface — call this out. Today a fetch failure happens at load (one fetch) or at a user seek (one fetch). Windowed refill issues mid-stream fetches the listener did not initiate; one of those can fail at byte 700 M of a 1 GB mix. So Phase 21 should ship with at least the cheap half of 1.6: a mid-stream refill failure must surface a clear error and not wedge the player (it must not leave playback "running" with a starved scheduler — mirror the playFromPosition end-of-buffer recovery already in PlaybackScheduler). The rich half (byte-scan to next valid frame) stays deferred. Recommendation: fold the minimal refill- failure handling into Phase 21's acceptance criteria (AC6) rather than leaving it entirely to 1.6 — it is created by this phase.
  • 1.7 Safari compatibility (deferred). Windowing adds no new Safari-specific surface beyond what the streaming path already has. Two adjacencies, both Phase-18-introduced: (a) more frequent AudioContext activity during refill should be checked against older-Safari webkitAudioContext quirks; (b) the Opus path depends on WebCodecs AudioDecoder, whose Safari availability is narrower than decodeAudioData Ogg-Opus support — Phase 18's capability gate already falls a non-WebCodecs browser back to the lossless WAV path, so a Safari that can't run the Opus pipeline windows the WAV path (which has no decode-ahead locus, only the scheduler), i.e. the simpler windowing case. Note it; do not block on it.

6. Open questions for Daniel (genuine product decisions, not implementation detail)

These are policy calls with user-visible or resource trade-offs — flagged rather than decided here.

  • OQ1 — Window size policy. What bounds the window — a fixed byte/time budget (e.g. "hold at most ~30 s decoded ahead + ~10 s behind"), or a configurable memory budget (e.g. "≤ N MB of decoded PCM") that derives the time window from the stream's byte rate? Recommend a time-based forward window + small time-based back-retain as the primary knob (intuitive, format-portable), with a hard memory ceiling as a secondary guard. The exact numbers are tunable post-landing; Daniel picks the policy axis. [Daniel decision]
  • OQ2 — Seek-back past the evicted window. When the listener seeks back earlier than the retained tail, we must refetch (the audio is gone). Acceptable to take the same brief re-buffer the forward seek-beyond-buffer takes today? (Recommend yes — it is the symmetric case and listeners already accept it forward.) Or should back-retain be generous enough that this is rare? [Daniel decision]
  • OQ3 — Configurable total in-flight memory cap. Should there be a single hard byte ceiling on total decoded audio held by the player (a safety net independent of the window-size policy), exposed as a config value? Recommend yes, as a guard rail even if the window policy is time-based — it is the backstop that makes "1 GB stream never OOMs" a guarantee rather than a tuning hope. [Daniel decision]
  • OQ4 — Apply windowing to all tracks, or only long ones? A 3-minute Cut decoded whole is ~3060 MB — harmless today. Windowing everything is simpler (one code path) but adds refill machinery to short tracks that never needed it. Recommend window everything (one path, C6-safe, and short tracks simply never hit a refill because they fit inside the forward window) — but Daniel may prefer a size threshold. [Daniel decision]
  • OQ5 — Is MSE (Direction C) the real destination? — RESOLVED: NO (Daniel, 2026-06-23). Do not adopt MSE. The bespoke Web Audio decode→schedule graph stays — it is bespoke by deliberate choice, a long-term commitment, not a stopgap. Daniel's rationale: the player is intentionally a custom graph, not an HTML <media> element; the compressed-delivery move that would have made MSE tempting was met instead by Phase 18 (Opus low-data path, now landed) feeding the same bespoke graph through the WebCodecs IStreamingDecoder seam (parallel to the WAV IFormatDecoder seam) — so compressed delivery arrived without surrendering the graph. Consequence for this phase: Direction A (the hand-rolled sliding window) is the destination, not a placeholder; invest in it as permanent machinery. It windows both the WAV and the Opus path (the header note). Direction C is recorded as considered and declined per file convention; kept visible so a future reader sees the road not taken and why. [RESOLVED — bespoke graph retained; MSE rejected]
  • OQ6 — One window controller for both decode paths, or two? (NEW — raised by the Phase 18 two-path reality.) Eviction is unambiguously shared (the scheduler is the one sink). Back-pressure is not: the WAV path throttles the C# ReadAsync loop; the Opus path must also throttle the WebCodecs decode-ahead (§3.1). Should there be one window controller exposing a uniform "scheduler full / drained" signal that both producers honor in their own way (recommended — keeps the policy — window sizes, water-marks, OQ1/OQ3 — in one place, with two thin per-path back-pressure hooks), or two parallel windowing implementations sharing only the eviction code (simpler per-path, but duplicates the water-mark logic and risks the two drifting)? Recommend the shared controller + per-path hook. This is more an architecture call than a product call — flagged for staff-engineer at implementation, with the recommendation as the default. [staff-engineer call; recommendation: shared controller]
  • OQ7 — How does the Opus WebCodecs decode-ahead bound interact with scheduler eviction? (NEW; technical, for staff-engineer.) The Opus producer has two queues to bound (the AudioDecoder work queue and decodedQueue: AudioData[]) plus the shared scheduler. The clean rule is "stop feeding the decoder when decoded-lookahead-in-the-scheduler exceeds high-water" — i.e. the scheduler's fill level is the single back-pressure signal, and the upstream Opus queues are kept near-empty by simply not demuxing ahead. The alternative (let the decoder run ahead into decodedQueue and bound that separately) adds a second budget to tune and a second eviction point. Recommend the former: one fill signal (scheduler decoded-lookahead), drive both the read-loop pause and the demux/decode pause from it. Confirm at implementation that the WebCodecs decoder tolerates being starved of input mid-stream and resumes cleanly (it should — it is fed packet-by-packet via decode()), and that decodedQueue is drained promptly so it never holds more than one push() worth. [staff-engineer call; recommendation: single scheduler-fill signal]

7. Acceptance criteria

  • AC1 (headline) — Bounded memory under a 1 GB stream, in BOTH formats. Playing a 1 GB+ mix start to finish — as lossless WAV and as low-data Opus — the browser tab's retained decoded-audio memory stays bounded to the configured window (not growing toward ~2 GB). Verifiable via browser memory tooling: peak decoded-audio footprint is independent of track length and tracks the window-size policy, not the file size. The Opus case must be verified explicitly — its small transfer does not imply a small decoded footprint (§1), so "Opus already streams small" is not sufficient.
  • AC1-Opus — The Opus upstream decode-ahead is bounded too (§3.1 / OQ7). Under a long Opus stream, the WebCodecs decode queue and decodedQueue do not grow unboundedly behind the scheduler — back-pressure reaches the demux/decode feed, not only the scheduler. Verifiable: the upstream queues stay near-empty (one push() worth) regardless of stream length.
  • AC2 — Playback-start latency at parity (C2). First-audio latency for a track is unchanged from pre-windowing (within noise). Windowing does not introduce a fetch-then-play stall.
  • AC3 — Continuous playback, no starvation. A long mix plays edge to edge with no audible gaps, underruns, or stalls under normal network conditions — the forward fill stays ahead of the playhead.
  • AC4 — Seek-back within the window is instant (UC3). A short backward seek into retained audio produces no network request.
  • AC5 — Seek (forward, and back past the window) still works (UC2/UC4). Both resolve via the existing Range path with the same behavior the listener sees today; the pre-seek region is evicted, not retained.
  • AC6 — A mid-stream refill failure degrades cleanly (the 1.6 adjacency). A failed refill fetch surfaces a clear user-visible error and leaves the player in a recoverable state (not a wedged "playing" with a starved scheduler). It must not silently hang.
  • AC7 — The Mix visualizer is unaffected (C7). With the lava visualizer running on a long mix, the visualizer renders identically (it reads the preprocessed datum, never the evicted buffers).
  • AC8 — Single-writer decoder concurrency invariant holds (C6) — both decoders. Under rapid seek + refill activity, no interleaved ProcessStreamingChunk / push calls corrupt the active decoder — the existing drain/cancel discipline still governs every fetch. For Opus this is stricter: no stale push() may land against the WebCodecs AudioDecoder across a reinitializeForRangeContinuation reset+reconfigure (which would corrupt inter-frame state, not just a buffer). Verify under a rapid seek-storm on an Opus mix specifically.

8. Wave decomposition

Decomposition choice: split by concern (eviction → back-pressure → seek-back refill → validate), not by path (WAV-track vs Opus-track). Rationale: the eviction concern (21.1) is genuinely shared — the scheduler is the one sink both paths feed — so a path-split would duplicate the hardest correctness work or arbitrarily assign it to one track. The concern spine keeps that shared work as a single cold-start wave and lets the one genuinely path-divergent concern (back-pressure, 21.2) carry an explicit two-track split inside the wave rather than fracturing the whole phase. This also matches how the seek-back refill (21.3) reuses each path's already-live seek — it is one concern (window-miss → refetch) with a per-path resolver underneath, not two features. The spine is unchanged from the original spec; the mechanisms inside 21.2 and 21.3 are made correct for both paths.

Dependency shape: 21.1 → 21.2 → 21.3, with 21.4 validating the whole. 21.1 is the cold-start prerequisite and the load-bearing change; the rest layer on it.

  • 21.1 — Partial eviction in PlaybackScheduler (cold-start; the load-bearing change; SHARED by both paths). Give the scheduler the ability to drop already-played buffers and keep its position/index bookkeeping correct against a buffer array that no longer begins at absolute time 0 (today getCurrentPosition, playFromPosition, and the scheduling loop all assume buffers[0] is the track start). This is the hardest correctness work in the phase — the time-anchor math must stay exact through eviction. Because both decode paths feed the scheduler identically via addBuffer, eviction is written once and serves both — no per-path branch. No refill yet; with eviction alone and the forward producers unchanged, this is provably memory-bounded for the played region on both paths. Independent of the §6 open questions — it can begin immediately; the window sizes (OQ1/OQ3) are parameters fed in later. Settled and cold-start.
  • 21.2 — Back-pressure (the bound on the unplayed region) — two tracks, one signal. Bound the not-yet-played decoded audio by stopping production above a high-water mark and resuming below low-water, driven by the scheduler's decoded-lookahead fill (OQ7). The fill signal is shared; the throttle has two sites because Phase 18 gave the two paths different producers:
    • 21.2a — C# read-loop back-pressure (serves both paths). Make StreamAudioWithEarlyPlayback stop calling ReadAsync above high-water and resume below low-water. Routes resume/pause through the existing cancellation-safe single-loop discipline (C6). For the WAV path this is sufficient (its StreamDecoder decodes synchronously into the scheduler).
    • 21.2b — Opus decode-ahead back-pressure (Opus path only). Additionally stop demuxing/decoding new packets when the same fill signal is over high-water, so the WebCodecs decode queue and decodedQueue do not balloon behind a throttled socket (§3.1, OQ7). This is the one mechanism with no WAV analogue. Confirm the WebCodecs decoder resumes cleanly after being starved of input mid-stream. Together with 21.1 this bounds both the played and unplayed sides on both formats — the full memory guarantee (AC1 + AC1-Opus). Depends on 21.1 (eviction must exist so the drained region is reclaimed, not merely un-read). Per OQ6, 21.2a and 21.2b ideally share one window controller exposing the fill signal; the recommendation is the shared controller + two thin hooks.
  • 21.3 — Seek-back-past-window refill (close the random-access case; one concern, per-path resolver). Wire UC4 — when a backward seek lands earlier than the retained tail, refetch via the existing seek-beyond-buffer path pointed at the earlier offset, using whichever resolver the active path already ships (IFormatDecoder/StreamDecoder.calculateByteOffset for WAV; the live resolveOpusByteOffset + OpusStreamDecoder.reinitializeForRangeContinuation for Opus) — plus the minimal AC6 refill-failure handling. Mostly reuse of the landed seek paths; the new work is the trigger (window-miss detection) and the clean-failure path, both format-agnostic. Depends on 21.1 + 21.2 (needs the window boundaries they define).
  • 21.4 — Validation pass against the 1 GB target, BOTH formats (acceptance). Exercise AC1AC8 against a real 1 GB+ mix streamed as WAV and as Opus: memory profiling (AC1 both formats + AC1-Opus upstream queues), latency parity (AC2), edge-to-edge playback (AC3), the seek matrix (AC4/AC5), induced refill failure (AC6), visualizer-running (AC7), and rapid-seek concurrency (AC8 — including the Opus seek-storm). Largely test/measurement; any break is likely a tuning fix in the 21.1 anchor math, the 21.2 water-marks, or the 21.2b Opus decode-ahead bound. Depends on 21.121.3.

9. Cross-references (read before implementing)

  • Root CLAUDE.md "Streaming-first audio playback" / CONTEXT.md §3.5 — the seam this phase modifies; the §2 invariants here restate its contract. Both flag it as the most load-bearing path.
  • COMPLETED.md Phase 18 — Opus Low-Data Streaming (landed 2026-06-23) — read this first. The "as-built divergence" note records why Opus uses a WebCodecs AudioDecoder streaming pipeline (IStreamingDecoder) rather than the spec'd-and-replaced per-segment decodeAudioData/IFormatDecoder model. This is the two-path reality this phase reconciles to. product-notes/phase-18-opus-low-data-streaming.md is the design memo (note: its §3.4 OpusFormatDecoder framing predates the WebCodecs divergence — the seek-index/sidecar design in §3.4a is accurate and landed; the decoder-shape discussion was superseded by IStreamingDecoder).
  • PLAN.md Phase 4 (landed) / COMPLETED.md — the HTTP Range bytes=X- primitive this generalizes (now serving both ?format=lossless and ?format=opus).
  • PLAN.md Phase 1.3 / 1.4 / 1.5 / 1.6 / 1.7 — the deferred decoder/scheduler-seam features; §5 above reconciles each (1.5 and 1.7 updated for the Opus path).
  • PLAN.md Phase 9 — defines the Mix medium (single long track), the canonical 1 GB case.
  • PLAN.md Phase 10 / product-notes/phase-10-mix-visualizer-lava-reframe.md / product-notes/phase-12-waveform-visualizer-generalization.md — establishes the preprocessed per-track high-res waveform datum; the basis for C7 (visualizer does not read live PCM).
  • DeepDrftPublic/Interop/audio/PlaybackScheduler.ts — owns the unbounded buffers: AudioBuffer[], the shared sink for both decode paths; 21.1 (eviction) lives here.
  • DeepDrftPublic/Interop/audio/AudioPlayer.ts — the dispatch: processFormatChunk (WAV/MP3/FLAC) vs processOpusChunk (Opus), both calling scheduler.addBuffer; seekBeyondBuffer/reinitializeFromOffset branch per path; the place the refill trigger (21.3) and the fill-signal wiring (21.2) hook.
  • DeepDrftPublic/Interop/audio/StreamDecoder.ts + IFormatDecoder.ts — the WAV/MP3/FLAC refill substrate (reinitializeForRangeContinuation, calculateByteOffset).
  • DeepDrftPublic/Interop/audio/IStreamingDecoder.ts + OpusStreamDecoder.ts + OggDemuxer.ts + OpusSidecar.ts — the Opus path: the WebCodecs decode pipeline, the decodeQueueSize/decodedQueue upstream accumulation 21.2b must bound, and the live resolveOpusByteOffset / reinitializeForRangeContinuation(landingTime, target) seek 21.3 reuses. IStreamingDecoder.ts is the seam the Opus windowing hooks into (push/complete/reinitialize lifecycle).
  • DeepDrftPublic.Client/Services/StreamingAudioPlayerService.cs — the C# forward read loop (StreamAudioWithEarlyPlayback, feeding both decoders), the seek-beyond-buffer path (SeekBeyondBuffer), and the cancellation/drain discipline (C6); 21.2a/21.3 live here.
  • DeepDrftPublic.Client/Clients/TrackMediaClient.cs — the Range-capable media fetch (with the ?format= param) reused by refill on both paths.