Files
deepdrft/product-notes/phase-21-windowed-streaming-buffer.md
T
daniel-c-harvey ccf7d3dbe3 docs: reconcile Phase 21 spec with as-built Phase 18 (two decode paths)
Window both the WAV StreamDecoder and Opus WebCodecs paths feeding one PlaybackScheduler — shared eviction, per-path back-pressure; reuse the now-live index-driven Opus seek for refill. Drops stale approximate-seek language; adds OQ6/OQ7.
2026-06-23 22:01:49 -04:00

547 lines
44 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 21 — Windowed Streaming Buffer (bounded client memory for long streams)
Product spec. Status: **design / framing — reconciled to as-built Phase 18 (two decode paths);
implementation-ready pending Daniel's open-question calls (OQ1OQ4 product; OQ6OQ7 staff-engineer
architecture).** Author: product-designer. Date: 2026-06-23 (reconciliation pass after Phase 18 landed).
**No code has been written by this doc.**
Surface: **public listener site only** (`DeepDrftPublic.Client` player stack + `DeepDrftPublic`
TypeScript audio interop). No CMS (`DeepDrftManager`) change. No data-model or schema change. The one
server touch is **reuse, not new surface**: the existing `DeepDrftAPI` HTTP `Range: bytes=X-`
partial-content primitive (Phase 4, landed) is the load-bearing dependency; this phase adds no new API
endpoint.
> **Phase 18 (Opus Low-Data Streaming) has LANDED (2026-06-23, `COMPLETED.md`). This spec is reconciled
> to the as-built reality.** Phase 18 changed the landscape in two ways that reshape this phase:
>
> 1. **There are now TWO decode paths feeding the one `PlaybackScheduler`, not one.** (a) The original
> **WAV/MP3/FLAC** path — `StreamDecoder` → `IFormatDecoder` (wrap-each-segment + `decodeAudioData`).
> (b) A new **Opus** path — `OggDemuxer` → `OpusStreamDecoder` (the `IStreamingDecoder` seam, a stateful
> **WebCodecs `AudioDecoder`** pipeline). The §3.1 unbounded-memory root cause (the scheduler's
> push-only `AudioBuffer[]`) applies to **both** — but the Opus path adds a *second* accumulation locus
> upstream of the scheduler (the WebCodecs decode queue + `decodedQueue: AudioData[]`), so windowing it
> is not the same mechanism as windowing WAV. See §3.1.
> 2. **The accurate index-driven Opus seek the original spec assumed Phase 21 would build is ALREADY
> LIVE.** Phase 18 ships `resolveOpusByteOffset` (binary-search the precomputed seek index in
> `OpusSeekData`) → Range fetch → `OpusStreamDecoder.reinitializeForRangeContinuation(landingTime,
> target)` with frame-accurate lead-trim. Opus seek is **accurate, not approximate** — and **already
> shipping**. Phase 21 does **not** build Opus seek; it **reuses** that live seek for window-miss
> refills.
>
> **Correction of stale spec language.** The original draft described Opus as a future-wired
> `OpusFormatDecoder.calculateByteOffset` joining the `IFormatDecoder` registry, with seek as "approximate
> vs accurate." All of that is now wrong against the landed code: Opus does **not** use `IFormatDecoder`
> (it diverged to the `IStreamingDecoder`/WebCodecs seam precisely because per-segment `decodeAudioData` is
> architecturally wrong for Opus — see `IStreamingDecoder.ts`), and its seek is accurate and shipping. The
> body below is rewritten to the two-path reality. **The headline is unchanged:** bound client memory to a
> sliding window regardless of stream length, for the canonical 1 GB mix, across both delivery formats.
---
## 1. Goal
Bound the **client memory** a playing track consumes to a small, configurable forward window —
**independent of total stream length** — so a 1 GB+ DJ MIX (Phase 9 `Mix` medium: a single long track)
plays without the whole decoded PCM accumulating in the browser.
**The defect, stated precisely — and it now has two faces, one shared.** The network path already
streams in adaptive 1664 KB chunks (`StreamingAudioPlayerService.StreamAudioWithEarlyPlayback`) — that
part is fine. The accumulation is on the **decode side**, and Phase 18 split the decode side into two
pipelines that both terminate at the same sink:
- **The shared sink (both paths) — the unbounded scheduler.** `PlaybackScheduler` holds
`private buffers: AudioBuffer[]` and **never evicts** ("Supports pause/resume/seek by **retaining all
buffers**" — its own doc comment). Both decode paths call `scheduler.addBuffer()` (via
`AudioPlayer.processFormatChunk` for WAV/MP3/FLAC and `processOpusChunk` for Opus); nothing is ever
removed. Decoded PCM is **larger than the source** in memory (Web Audio `AudioBuffer` is 32-bit float
per sample per channel — a 16-bit stereo WAV roughly **doubles** once decoded; Opus decodes to the same
48 kHz float PCM regardless of how few bytes the *compressed* stream was). So a 1 GB WAV becomes ~2 GB
of retained float, **and a low-data Opus mix becomes the same ~2 GB of decoded float once played**
the compressed transfer is small, but the *decoded* footprint is identical. The scheduler is the OOM for
both. **This is the §3.1 root cause, unchanged from the original spec — it just now afflicts two
producers.**
- **The Opus-only second locus — upstream decode-ahead.** The Opus path accumulates *before* the
scheduler too: the WebCodecs `AudioDecoder` work queue (`decodeQueueSize`), the `decodedQueue:
AudioData[]` awaiting conversion, and the `OggDemuxer`'s partial-page state. Bounding the scheduler
alone does not bound these — they fill from the same C# `ReadAsync` loop, so they need their own
back-pressure (on the *demuxer/decoder feed*), not only the read loop's. WAV has no equivalent
upstream queue (its `StreamDecoder` decodes synchronously into the scheduler), so this is genuinely
Opus-specific.
**One-line framing:** today the player decodes the whole track into memory and keeps it — true for both
formats; Phase 21 makes it keep only a sliding forward window and discard what has already played,
refilling on demand from the Range primitive both paths already use for seek (WAV via `IFormatDecoder`,
Opus via the live index-driven `resolveOpusByteOffset`).
---
## 2. Constraints / invariants (the contract that must hold)
These are non-negotiable. The §3.5 streaming seam (root `CLAUDE.md` "Streaming-first audio playback";
`CONTEXT.md §3.5`) is called *the most architecturally load-bearing part of the playback path* by both
docs. This phase **modifies that seam** — so the contract it must preserve is spelled out here.
- **C1 — The seek-beyond-buffer Range path is the substrate, kept intact.** Phase 4 landed HTTP
`Range: bytes={offset}-``206 Partial Content` end to end (client `TrackMediaClient`
`DeepDrftPublic` proxy → `DeepDrftAPI`), and `StreamDecoder.reinitializeForRangeContinuation` retains
the parsed format header on a continuation body (no re-parse). Windowed refill is a **generalization of
this exact path** (§3.1) — it must not require a second, divergent fetch mechanism.
- **C2 — Playback start latency unchanged.** Today playback starts as soon as a configurable minimum
buffer count is queued (header-derived duration, not full-file). The window model must keep first-audio
latency at parity — bounding memory must not reintroduce a fetch-then-play stall.
- **C3 — Neither decoder seam's contract is forked; windowing lives in the shared layer plus a thin
per-seam hook.** There are two decoder seams as of Phase 18: `IFormatDecoder` (WAV/MP3/FLAC, owns
format byte math; `AudioPlayer.createFormatDecoder` dispatches on `Content-Type`) and `IStreamingDecoder`
(Opus, the WebCodecs pipeline; selected in `initializeStreaming` when the content type is
`audio/ogg`/`audio/opus` and a sidecar is present). **The eviction half of windowing is fully shared**
it lives in `PlaybackScheduler`, which both seams feed identically via `addBuffer`, so eviction adds
**zero** format branches. **The back-pressure / decode-ahead half is necessarily seam-aware** — the WAV
path back-pressures the C# `ReadAsync` loop; the Opus path must additionally bound the WebCodecs
decode-ahead and the `decodedQueue` (§3.1). Express that as a **small uniform signal** ("the scheduler is
full, stop producing") that each decode path honors in its own way, rather than a windowing controller
that reaches into either decoder's internals. The goal the original C3 stated still holds — no
format-specific logic leaking into the *scheduler* — but the spec now acknowledges the producer side has
two shapes, not one.
- **C4 — Read-only playback only.** This is a memory-management change, not a UX change. No new
user-visible control, no change to seek/transport semantics beyond what the listener already
experiences. Seek must still feel identical.
- **C5 — Window both decode paths without forking the scheduler/seam, reusing the live index-driven
seek for refill.** Both delivery formats must be windowed, and the byte↔time mapping each refill needs is
**already accurate and already shipping** for both:
- **WAV/MP3/FLAC** — `IFormatDecoder.calculateByteOffset` (CBR `byteRate` for WAV; the MP3/FLAC seek
accelerators for those), reached through `StreamDecoder.calculateByteOffset` / `AudioPlayer.seekBeyondBuffer`.
- **Opus** — `resolveOpusByteOffset(activeOpusSidecar, t)` (binary search the precomputed granule→byte
seek index in `OpusSeekData`), returning an exact page-start offset **and** a `landingTimeSeconds` for
the decoder's frame-accurate lead-trim. This is **accurate, not approximate, and landed in Phase 18.**
Phase 21 does **not** build either mapping. The window's refill trigger calls *whichever resolver the
active path already uses* — for Opus, the **same** `resolveOpusByteOffset` an explicit listener seek
calls (the live path in `AudioPlayer.seekBeyondBuffer`), so windowed refill is literally "a seek the
listener didn't initiate." A window opening away from byte 0 decodes correctly on the Opus path because
the setup header (`OpusHead`/`OpusTags`) is already cached from the sidecar and re-applied by
`reinitializeForRangeContinuation` (Phase 18 §3.4a B); the WAV path re-applies its retained header the
same way. **No new offset math, no approximation, no header re-fetch — all reused.** The invariant is
therefore *not* "make refill format-agnostic" (the two paths legitimately resolve offsets through
different code); it is **"reuse the live seek of each path verbatim; add only the eviction and the
refill *trigger*, never a second seek mechanism."**
- **C6 — No regression to the single-writer decoder concurrency guarantee — now covering both decoders.**
The C# loop is careful that only one streaming task feeds the active JS decoder at a time
(`DrainActiveStreamingTaskAsync`, the `_streamingCancellation` identity dance in
`StreamingAudioPlayerService`). This matters *more* for Opus: the WebCodecs `AudioDecoder` is stateful
and async — a `reset()`+`configure()` on a range-continuation (`reinitializeForRangeContinuation`) racing
a still-draining `push()` from a stale loop would corrupt inter-frame state, not merely deliver a wrong
buffer. Windowed refill introduces *more* mid-stream fetches against whichever decoder is active; every
one must route through the **same** drain/cancellation discipline, not around it. The discipline is
already decoder-agnostic at the C# layer (it cancels the loop, not the decoder), so this is a "keep using
it" invariant — but it is the rule most likely to be violated by a naive Opus refill, and is the hardest
failure to diagnose, so it is called out as a hard invariant for both paths.
- **C7 — The Mix visualizer's data source is independent and must stay that way.** The Phase 10/12
WebGL2 lava visualizer renders from a **preprocessed high-res waveform datum** fetched per-track
(`GET api/track/{entryKey}/waveform/high-res`), **not** from live decoded PCM. Confirmed: evicting
played `AudioBuffer`s cannot starve the visualizer — it never read them. The window model is invisible
to the visualizer. (This is the canonical 1 GB case *and* the case that proves the eviction is safe.)
---
## 3. Architectural shape
### 3.0 The mental model
A track's audio is a byte range `[0, fileLength)` on disk. At any moment the listener is at playback
position `P` (seconds → byte offset via the active path's resolver — `IFormatDecoder.calculateByteOffset`
for WAV/MP3/FLAC, `resolveOpusByteOffset` over the seek index for Opus). The player should hold decoded
`AudioBuffer`s only for a bounded window roughly `[P - back, P + ahead]` — and, on the Opus path, keep the
upstream WebCodecs decode queue near-empty too (§3.1):
- **forward fill (`ahead`)** — enough decoded lookahead that playback never starves (covers the existing
500 ms scheduler lookahead plus network jitter headroom);
- **back-retain (`back`)** — a small amount of *already-played* audio kept so a short seek-back does not
trigger a network refetch;
- **evict** — anything older than `P - back` is dropped (`AudioBuffer` references released → GC reclaims
the float data);
- **refill** — when forward decoded lookahead drops below a low-water mark, fetch+decode more from the
current byte position; when the window's tail is evicted and the listener seeks back past it, refetch
that region via the Range primitive (the seek-beyond-buffer path, run *backwards*).
This is a **ring/sliding-window buffer keyed on playback position**, driven by high/low-water marks —
the standard bounded-producer/bounded-consumer pattern, transplanted onto the decode→schedule seam.
### 3.1 Why refill is a generalization of seek-beyond-buffer, not a new mechanism — for both paths
The seek-beyond-buffer path already does **every refill primitive** the window needs, just triggered
manually and one-shot. As of Phase 18 each primitive has a WAV branch and an Opus branch, both live:
| Window operation | WAV/MP3/FLAC machinery reused | Opus machinery reused (Phase 18, landed) |
|-------------------------------|--------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| Discard buffers, keep offset | `PlaybackScheduler.clearForSeek()` + `setPlaybackOffset()` | *same* — the scheduler is shared |
| Fetch from a byte offset | `TrackMediaClient``Range: bytes=X-` → 206 | *same* (with `?format=opus`) — the Range path is shared |
| Map time → byte offset | `StreamDecoder.calculateByteOffset()``IFormatDecoder` | `resolveOpusByteOffset(activeOpusSidecar, t)` (index binary search → exact page) |
| Decode a header-less body | `StreamDecoder.reinitializeForRangeContinuation(len)` | `OpusStreamDecoder.reinitializeForRangeContinuation(landingTime, target)` (demux/codec reset + lead-trim) |
| Single-loop safety on refetch | `_streamingCancellation` swap + `DrainActiveStreamingTaskAsync()` | *same* — the C# discipline is decoder-agnostic |
The genuinely-new work, by path:
- **Shared (both paths):** *partial eviction* on `PlaybackScheduler` (today it only ever `clear()`s
wholesale), and a *position-driven refill trigger* (a continuous low-water loop, not a one-shot seek).
- **WAV path:** *back-pressure on the C# `ReadAsync` loop* — stop reading the socket above the high-water
mark, resume below low-water. WAV's `StreamDecoder` decodes synchronously into the scheduler, so the
read loop is the *only* producer to throttle; pausing `ReadAsync` bounds it fully.
- **Opus path:** *the same C# back-pressure, plus bounding the WebCodecs decode-ahead.* Throttling
`ReadAsync` alone is **not sufficient** for Opus, because `OpusStreamDecoder.push()` is async and the
WebCodecs `AudioDecoder` keeps its own internal work queue (`decodeQueueSize`) plus a `decodedQueue:
AudioData[]` of decoded-but-not-yet-converted frames. The Opus producer must also stop *feeding the
decoder* (stop demuxing/decoding new packets) when the scheduler is full, and resume below low-water —
back-pressure on the **demuxer/decoder feed**, not only on the socket read. This is the one place the
two paths' windowing genuinely diverges.
Everything else — the fetch, the offset resolution, the header-carry continuation, the single-loop
cancellation safety — is **reused verbatim** on both paths. Phase 21 builds eviction + the refill trigger
+ the (per-path) back-pressure; it builds **no** new fetch, offset, or seek mechanism.
### 3.2 The three candidate directions
Per file convention the alternatives are recorded; the recommendation follows.
**Direction A — Sliding window on the existing single forward stream (recommended).**
Keep the current model where the C# loop reads one forward HTTP stream and pumps chunks into the active
JS decoder. Add three things: (1) `PlaybackScheduler` gains *partial eviction* — drop buffers whose
absolute-time end is older than `P - back`, adjusting its index bookkeeping so `getCurrentPosition()`
and scheduling stay correct against a buffer array that no longer starts at index 0 (**shared by both
paths** — the scheduler is the common sink); (2) *back-pressure on the C# read loop* — when forward
decoded lookahead exceeds the high-water mark, the C# loop **pauses reading** the HTTP stream (stops
calling `ReadAsync`) until playback drains it below low-water, then resumes; (3) **for the Opus path
only, back-pressure on the WebCodecs decode-ahead** — the producer also stops demuxing/decoding new
packets when the scheduler is full, so the `AudioDecoder` work queue and `decodedQueue` do not balloon
behind a throttled socket. Memory is bounded by high-water + back-retain on both paths. Seek-back beyond
the retained window falls through to the **existing** seek-beyond-buffer path (the right one per format)
unchanged.
*Why recommended:* smallest change to the load-bearing seam; reuses the live forward stream (no extra
connections in the common case); eviction and back-pressure are the only genuinely new mechanisms, all
local (the scheduler; the read loop; for Opus, the demux/decode feed). Back-pressure via "stop reading
the socket" is exactly how TCP flow control already wants to behave — pausing `ReadAsync` lets the kernel
window close; we are not fighting the transport. The Opus decode-ahead bound is the one addition Phase 18
forces, and it is local to the Opus producer.
*Open question it raises (OQ6, new):* whether the two paths' back-pressure is driven by **one shared
window controller** that exposes a "scheduler full / drained" signal both producers poll, or by **two
parallel implementations** sharing only the eviction code. Recommend the **shared signal** — see §6 OQ6.
**Direction B — Discrete window segments, each its own Range fetch.**
Treat the file as fixed-size byte segments (e.g. 4 MB). Hold N decoded segments around `P`; fetch the
next/previous segment via a fresh Range request as the window slides; discard the far segment. No live
long-lived forward stream — every window is an independent 206.
*Why not (default):* turns one connection into many short Range requests (more proxy hops through
`DeepDrftPublic`, more server-side `WavOffsetService`-style header synthesis, more places a fetch can
fail mid-stream — worsening the §1.6 error surface), and the byte↔time segment math must be exact at
every boundary. It *is* the cleaner model for true random-access (and the better base if seeking-heavy
usage dominates), so keep it as the fallback if Direction A's back-pressure proves leaky in practice.
Borrowed prior art: HLS/DASH segment windows and the MSE `SourceBuffer.remove()` eviction model — this
is how every production HTML5 adaptive player bounds memory. We are doing the hand-rolled equivalent
because the stack is a bespoke Web Audio graph, not `<media>` + MSE.
**Direction C — Adopt MediaSource Extensions (MSE) and let the browser manage the buffer.**
Stop hand-rolling the decode→schedule graph for long tracks; feed the Range stream into a `SourceBuffer`
and let the browser evict via its built-in quota + `remove()`. Memory management becomes the platform's
problem.
*Why not — RESOLVED, rejected (Daniel, 2026-06-23; see OQ5):* MSE does not accept raw WAV/PCM — it
wants containerized formats (fragmented MP4/WebM, or MP3/AAC elementary streams). The entire bespoke
visualizer/spectrum graph is wired to the Web Audio `AudioContext`, not a `<media>` element. Adopting
MSE is a **rewrite of the playback substrate**, not a windowing change. It *looked* like the real
long-term answer once compressed delivery arrived — but compressed delivery (**Phase 18 Opus, now
landed**) feeds the **same bespoke graph** via the WebCodecs `IStreamingDecoder` seam (parallel to the
WAV `IFormatDecoder` seam, both terminating at the shared `PlaybackScheduler`), so the compressed-delivery
move that would have justified MSE happened *without* surrendering the graph. Notably, Phase 18 chose a
**WebCodecs `AudioDecoder`** for Opus rather than `decodeAudioData` — which is itself the "use the platform
codec, keep the bespoke graph" move, but at the *decoder* granularity, not the *media-element* granularity
MSE would impose. **The bespoke graph is a deliberate long-term commitment; MSE is rejected.** Direction A
is therefore the permanent destination, not a stopgap that MSE will retire. Recorded as
considered-and-declined.
### 3.3 Recommended direction: A, with B held as the documented fallback
Direction A is the smallest coherent change that hits the headline (bounded memory under a 1 GB stream)
while honoring C1C7. It keeps the live forward stream, reuses each path's seek-beyond-buffer machinery
for the only genuinely random-access case (seek-back past the retained tail), and isolates the new
mechanisms (eviction shared; back-pressure per path). **The final architecture and the exact
eviction/back-pressure API are staff-engineer's call at implementation** (per file convention); this spec
fixes the *shape* and the invariants, not the method signatures.
### 3.4 SOLID / road-not-taken rationale
- **SRP, preserved.** Eviction is a `PlaybackScheduler` concern (it already owns buffer storage, and is
the single shared sink both decode paths feed); refill orchestration is a player-service concern (it
already owns the C# fetch loop and the seek dispatch); byte↔time math stays where each path already keeps
it — `IFormatDecoder.calculateByteOffset` for WAV/MP3/FLAC, `resolveOpusByteOffset` (over `OpusSeekData`)
for Opus. No responsibility crosses a boundary it does not already own.
- **OCP, via the shared sink + the live per-path seek.** Eviction added at the scheduler changes zero
decoder code on either path. Refill reuses each path's *already-implemented* offset resolver — Phase 21
adds no offset math to either seam. The one place windowing is not purely additive is the Opus
decode-ahead bound (§3.1), which lives inside the Opus producer, not in the shared layer.
- **The seam stays single-writer (C6) — for both decoders.** Every new refetch routes through the existing
C# cancellation/drain discipline, so "only one loop feeds the active decoder" remains true for the WAV
`StreamDecoder` and the stateful Opus `AudioDecoder` alike. This is the rule most likely to be violated
by a naive Opus refill (a stale `push()` racing a `reset()`+`configure()`), and is called out as a hard
invariant.
- **Road not taken — eager full decode with a memory cap that just stops decoding.** Tempting (decode
until you hit a byte budget, then stop) but it breaks playback of long tracks past the cap entirely —
it bounds memory by *refusing to play the rest*, not by sliding. Rejected: it is a degradation, not a
feature.
---
## 4. Use cases
- **UC1 — Play a 1 GB+ DJ MIX start to finish (the headline).** Memory stays bounded throughout; the
listener experiences continuous playback identical to a short track. **Holds in both formats** — the
lossless WAV mix (~2 GB decoded if unbounded) and the low-data Opus mix (small transfer, but the *same*
~2 GB decoded float once played, so it needs windowing just as much; see §1).
- **UC1-Opus — The same mix streamed as Opus, windowed.** The low-data win (Phase 18) shrinks the
*transfer*; Phase 21 shrinks the *decoded footprint*. The two compound: a metered-connection listener on
Opus gets both the small download and the bounded memory. Windowing the Opus path additionally bounds the
WebCodecs decode-ahead and `decodedQueue`, not only the scheduler (§3.1).
- **UC2 — Seek forward within a long track.** Already handled by seek-beyond-buffer (the right resolver per
format — `IFormatDecoder` for WAV, the live `resolveOpusByteOffset` for Opus); under windowing the
forward seek clears the window and refills at the target — no behavior change, now with eviction so the
pre-seek region does not linger.
- **UC3 — Seek back a few seconds.** Served from the back-retain window with **no** network refetch
(the reason `back` exists).
- **UC4 — Seek back far, past the evicted tail.** Falls through to the existing seek-beyond-buffer Range
fetch, run toward an earlier offset. (Open question OQ2 — see §6.)
- **UC5 — Pause a long track for a long time.** Memory stays at the bounded window size while paused (no
continued decode). On resume, forward fill restarts from the low-water trigger.
- **UC6 — Mix detail page with the lava visualizer running.** Visualizer reads its preprocessed datum
(C7); windowing is invisible to it. Confirmed non-interaction.
---
## 5. Interaction with the deferred Phase 1 streaming features
This phase touches the **same decoder/scheduler seam** as the deferred Phase 1.3/1.4/1.5 items and the
1.6/1.7 robustness items. The interactions, explicitly:
- **1.3 Preload / prefetch (deferred; preload half).** *Shares machinery, does not conflict — and should
be sequenced after.* Preload stages the **next track** into a second decoder instance during the
current track's tail; windowing bounds the **current track's** forward buffer. They are orthogonal
axes (next-track vs. current-track-window), but they compound the memory question: a naive preload of a
second 1 GB mix would reintroduce the OOM this phase fixes. **Recommendation: land windowing first**,
so that when preload arrives, the staged next-track decoder is *also* windowed by construction (it
inherits the bounded scheduler). Windowing makes preload *safe for long tracks*; without it, preload of
mixes is a memory hazard.
- **1.4 Crossfade (deferred).** Needs two simultaneous `PlaybackScheduler` instances briefly overlapping.
Both would be windowed instances — the overlap doubles the *window* size momentarily, not the whole
track. Windowing makes crossfade between two long mixes affordable. No reordering needed; 1.4 still
gates on 1.3.
- **1.5 Gapless (deferred).** Sample-accurate hand-off of the next track's first buffer at the current
track's last buffer. Windowing changes *which* buffers are retained but not the hand-off mechanism;
the only care point is that the current track's **final** window must not be evicted before the gapless
boundary is scheduled. A minor invariant for whoever builds 1.5, not a blocker. **Phase 18 note:** the
former "1.5 is WAV-only" caveat is superseded — Opus is live, and it has its own encoder pre-skip/priming
(handled once by the WebCodecs decoder, see `OpusStreamDecoder.ts`), so a gapless Opus hand-off must
respect the end-trim against the sidecar's authoritative total length. That is 1.5's problem to absorb,
not Phase 21's; flagged so 1.5 inherits it.
- **1.6 Track-skip on error (deferred).** *Windowing enlarges the error surface — call this out.* Today
a fetch failure happens at load (one fetch) or at a user seek (one fetch). Windowed refill issues
**mid-stream** fetches the listener did not initiate; one of those can fail at byte 700 M of a 1 GB
mix. So Phase 21 should ship with at least the *cheap* half of 1.6: a mid-stream refill failure must
**surface a clear error and not wedge the player** (it must not leave playback "running" with a starved
scheduler — mirror the `playFromPosition` end-of-buffer recovery already in `PlaybackScheduler`). The
rich half (byte-scan to next valid frame) stays deferred. **Recommendation: fold the minimal refill-
failure handling into Phase 21's acceptance criteria** (AC6) rather than leaving it entirely to 1.6 —
it is created by this phase.
- **1.7 Safari compatibility (deferred).** Windowing adds no new Safari-specific surface beyond what the
streaming path already has. Two adjacencies, both Phase-18-introduced: (a) more frequent `AudioContext`
activity during refill should be checked against older-Safari `webkitAudioContext` quirks; (b) the Opus
path depends on **WebCodecs `AudioDecoder`**, whose Safari availability is narrower than `decodeAudioData`
Ogg-Opus support — Phase 18's capability gate already falls a non-WebCodecs browser back to the lossless
WAV path, so a Safari that can't run the Opus pipeline windows the *WAV* path (which has no decode-ahead
locus, only the scheduler), i.e. the simpler windowing case. Note it; do not block on it.
---
## 6. Open questions for Daniel (genuine product decisions, not implementation detail)
These are policy calls with user-visible or resource trade-offs — flagged rather than decided here.
- **OQ1 — Window size policy.** What bounds the window — a **fixed byte/time budget** (e.g. "hold at
most ~30 s decoded ahead + ~10 s behind"), or a **configurable memory budget** (e.g. "≤ N MB of
decoded PCM") that derives the time window from the stream's byte rate? Recommend a **time-based
forward window + small time-based back-retain** as the primary knob (intuitive, format-portable), with
a hard **memory ceiling** as a secondary guard. The exact numbers are tunable post-landing; Daniel
picks the *policy axis*. `[Daniel decision]`
- **OQ2 — Seek-back past the evicted window.** When the listener seeks back earlier than the retained
tail, we must refetch (the audio is gone). Acceptable to take the same brief re-buffer the forward
seek-beyond-buffer takes today? (Recommend yes — it is the symmetric case and listeners already accept
it forward.) Or should back-retain be generous enough that this is rare? `[Daniel decision]`
- **OQ3 — Configurable total in-flight memory cap.** Should there be a single hard byte ceiling on total
decoded audio held by the player (a safety net independent of the window-size policy), exposed as a
config value? Recommend **yes, as a guard rail** even if the window policy is time-based — it is the
backstop that makes "1 GB stream never OOMs" a guarantee rather than a tuning hope. `[Daniel
decision]`
- **OQ4 — Apply windowing to all tracks, or only long ones?** A 3-minute Cut decoded whole is ~3060 MB
— harmless today. Windowing everything is simpler (one code path) but adds refill machinery to short
tracks that never needed it. Recommend **window everything** (one path, C6-safe, and short tracks
simply never hit a refill because they fit inside the forward window) — but Daniel may prefer a
size threshold. `[Daniel decision]`
- **OQ5 — Is MSE (Direction C) the real destination? — RESOLVED: NO (Daniel, 2026-06-23).** **Do not
adopt MSE. The bespoke Web Audio decode→schedule graph stays — it is bespoke by deliberate choice, a
long-term commitment, not a stopgap.** Daniel's rationale: the player is intentionally a custom
graph, not an HTML `<media>` element; the compressed-delivery move that *would* have made MSE
tempting was met instead by **Phase 18 (Opus low-data path, now landed)** feeding the **same bespoke
graph** through the WebCodecs `IStreamingDecoder` seam (parallel to the WAV `IFormatDecoder` seam) — so
compressed delivery arrived *without* surrendering the graph. Consequence for this phase: Direction A
(the hand-rolled sliding window) is the destination, not a placeholder; invest in it as permanent
machinery. It windows both the WAV and the Opus path (the header note). Direction C is recorded as
**considered and declined** per file convention; kept visible so a future reader sees the road not taken
and why. `[RESOLVED — bespoke graph retained; MSE rejected]`
- **OQ6 — One window controller for both decode paths, or two? (NEW — raised by the Phase 18 two-path
reality.)** Eviction is unambiguously shared (the scheduler is the one sink). Back-pressure is not: the
WAV path throttles the C# `ReadAsync` loop; the Opus path must *also* throttle the WebCodecs
decode-ahead (§3.1). Should there be **one window controller** exposing a uniform "scheduler full /
drained" signal that both producers honor in their own way (recommended — keeps the *policy* — window
sizes, water-marks, OQ1/OQ3 — in one place, with two thin per-path back-pressure hooks), or **two
parallel windowing implementations** sharing only the eviction code (simpler per-path, but duplicates the
water-mark logic and risks the two drifting)? Recommend the **shared controller + per-path hook**. This
is more an architecture call than a product call — flagged for staff-engineer at implementation, with the
recommendation as the default. `[staff-engineer call; recommendation: shared controller]`
- **OQ7 — How does the Opus WebCodecs decode-ahead bound interact with scheduler eviction? (NEW; technical,
for staff-engineer.)** The Opus producer has two queues to bound (the `AudioDecoder` work queue and
`decodedQueue: AudioData[]`) *plus* the shared scheduler. The clean rule is "stop feeding the decoder when
decoded-lookahead-in-the-scheduler exceeds high-water" — i.e. the **scheduler's** fill level is the
single back-pressure signal, and the upstream Opus queues are kept near-empty by simply not demuxing
ahead. The alternative (let the decoder run ahead into `decodedQueue` and bound *that* separately) adds a
second budget to tune and a second eviction point. Recommend the former: **one fill signal (scheduler
decoded-lookahead), drive both the read-loop pause and the demux/decode pause from it.** Confirm at
implementation that the WebCodecs decoder tolerates being starved of input mid-stream and resumes cleanly
(it should — it is fed packet-by-packet via `decode()`), and that `decodedQueue` is drained promptly so
it never holds more than one `push()` worth. `[staff-engineer call; recommendation: single
scheduler-fill signal]`
---
## 7. Acceptance criteria
- **AC1 (headline) — Bounded memory under a 1 GB stream, in BOTH formats.** Playing a 1 GB+ mix start to
finish — **as lossless WAV and as low-data Opus** — the browser tab's retained decoded-audio memory
stays bounded to the configured window (not growing toward ~2 GB). Verifiable via browser memory tooling:
peak decoded-audio footprint is independent of track length and tracks the window-size policy, not the
file size. The Opus case must be verified explicitly — its small *transfer* does not imply a small
*decoded* footprint (§1), so "Opus already streams small" is **not** sufficient.
- **AC1-Opus — The Opus upstream decode-ahead is bounded too (§3.1 / OQ7).** Under a long Opus stream, the
WebCodecs decode queue and `decodedQueue` do not grow unboundedly behind the scheduler — back-pressure
reaches the demux/decode feed, not only the scheduler. Verifiable: the upstream queues stay near-empty
(one `push()` worth) regardless of stream length.
- **AC2 — Playback-start latency at parity (C2).** First-audio latency for a track is unchanged from
pre-windowing (within noise). Windowing does not introduce a fetch-then-play stall.
- **AC3 — Continuous playback, no starvation.** A long mix plays edge to edge with no audible gaps,
underruns, or stalls under normal network conditions — the forward fill stays ahead of the playhead.
- **AC4 — Seek-back within the window is instant (UC3).** A short backward seek into retained audio
produces no network request.
- **AC5 — Seek (forward, and back past the window) still works (UC2/UC4).** Both resolve via the
existing Range path with the same behavior the listener sees today; the pre-seek region is evicted, not
retained.
- **AC6 — A mid-stream refill failure degrades cleanly (the 1.6 adjacency).** A failed refill fetch
surfaces a clear user-visible error and leaves the player in a recoverable state (not a wedged
"playing" with a starved scheduler). It must not silently hang.
- **AC7 — The Mix visualizer is unaffected (C7).** With the lava visualizer running on a long mix, the
visualizer renders identically (it reads the preprocessed datum, never the evicted buffers).
- **AC8 — Single-writer decoder concurrency invariant holds (C6) — both decoders.** Under rapid seek +
refill activity, no interleaved `ProcessStreamingChunk` / `push` calls corrupt the active decoder — the
existing drain/cancel discipline still governs every fetch. **For Opus this is stricter:** no stale
`push()` may land against the WebCodecs `AudioDecoder` across a `reinitializeForRangeContinuation`
reset+reconfigure (which would corrupt inter-frame state, not just a buffer). Verify under a rapid
seek-storm on an Opus mix specifically.
---
## 8. Wave decomposition
**Decomposition choice: split by *concern* (eviction → back-pressure → seek-back refill → validate), not
by *path* (WAV-track vs Opus-track).** Rationale: the eviction concern (21.1) is genuinely shared — the
scheduler is the one sink both paths feed — so a path-split would duplicate the hardest correctness work or
arbitrarily assign it to one track. The concern spine keeps that shared work as a single cold-start wave
and lets the *one* genuinely path-divergent concern (back-pressure, 21.2) carry an explicit two-track
split *inside* the wave rather than fracturing the whole phase. This also matches how the seek-back refill
(21.3) reuses each path's already-live seek — it is one concern (window-miss → refetch) with a per-path
resolver underneath, not two features. The spine is unchanged from the original spec; the mechanisms
inside 21.2 and 21.3 are made correct for both paths.
Dependency shape: `21.1 → 21.2 → 21.3`, with `21.4` validating the whole. 21.1 is the cold-start
prerequisite and the load-bearing change; the rest layer on it.
- **21.1 — Partial eviction in `PlaybackScheduler` (cold-start; the load-bearing change; SHARED by both
paths).** Give the scheduler the ability to drop already-played buffers and keep its position/index
bookkeeping correct against a buffer array that no longer begins at absolute time 0 (today
`getCurrentPosition`, `playFromPosition`, and the scheduling loop all assume `buffers[0]` is the track
start). This is the hardest correctness work in the phase — the time-anchor math must stay exact through
eviction. Because both decode paths feed the scheduler identically via `addBuffer`, **eviction is written
once and serves both** — no per-path branch. No refill yet; with eviction alone and the forward producers
unchanged, this is provably memory-bounded for the *played* region on both paths. **Independent of the §6
open questions** — it can begin immediately; the window *sizes* (OQ1/OQ3) are parameters fed in later.
Settled and cold-start.
- **21.2 — Back-pressure (the bound on the *unplayed* region) — two tracks, one signal.** Bound the
not-yet-played decoded audio by stopping production above a high-water mark and resuming below low-water,
driven by the scheduler's decoded-lookahead fill (OQ7). The fill *signal* is shared; the *throttle* has
two sites because Phase 18 gave the two paths different producers:
- **21.2a — C# read-loop back-pressure (serves both paths).** Make `StreamAudioWithEarlyPlayback` stop
calling `ReadAsync` above high-water and resume below low-water. Routes resume/pause through the
existing cancellation-safe single-loop discipline (C6). For the WAV path this is *sufficient* (its
`StreamDecoder` decodes synchronously into the scheduler).
- **21.2b — Opus decode-ahead back-pressure (Opus path only).** Additionally stop demuxing/decoding new
packets when the same fill signal is over high-water, so the WebCodecs decode queue and `decodedQueue`
do not balloon behind a throttled socket (§3.1, OQ7). This is the one mechanism with no WAV analogue.
Confirm the WebCodecs decoder resumes cleanly after being starved of input mid-stream.
Together with 21.1 this bounds *both* the played and unplayed sides on *both* formats — the full memory
guarantee (AC1 + AC1-Opus). **Depends on 21.1** (eviction must exist so the drained region is reclaimed,
not merely un-read). Per OQ6, 21.2a and 21.2b ideally share one window controller exposing the fill
signal; the recommendation is the shared controller + two thin hooks.
- **21.3 — Seek-back-past-window refill (close the random-access case; one concern, per-path resolver).**
Wire UC4 — when a backward seek lands earlier than the retained tail, refetch via the existing
seek-beyond-buffer path pointed at the earlier offset, **using whichever resolver the active path already
ships** (`IFormatDecoder`/`StreamDecoder.calculateByteOffset` for WAV; the live
`resolveOpusByteOffset` + `OpusStreamDecoder.reinitializeForRangeContinuation` for Opus) — plus the
minimal AC6 refill-failure handling. Mostly **reuse** of the landed seek paths; the new work is the
trigger (window-miss detection) and the clean-failure path, both format-agnostic. **Depends on 21.1 +
21.2** (needs the window boundaries they define).
- **21.4 — Validation pass against the 1 GB target, BOTH formats (acceptance).** Exercise AC1AC8 against a
real 1 GB+ mix **streamed as WAV and as Opus**: memory profiling (AC1 both formats + AC1-Opus upstream
queues), latency parity (AC2), edge-to-edge playback (AC3), the seek matrix (AC4/AC5), induced refill
failure (AC6), visualizer-running (AC7), and rapid-seek concurrency (AC8 — including the Opus
seek-storm). Largely test/measurement; any break is likely a tuning fix in the 21.1 anchor math, the
21.2 water-marks, or the 21.2b Opus decode-ahead bound. **Depends on 21.121.3.**
---
## 9. Cross-references (read before implementing)
- Root `CLAUDE.md` "Streaming-first audio playback" / `CONTEXT.md §3.5` — the seam this phase modifies;
the §2 invariants here restate its contract. Both flag it as the most load-bearing path.
- **`COMPLETED.md` Phase 18 — Opus Low-Data Streaming (landed 2026-06-23) — read this first.** The
"as-built divergence" note records why Opus uses a **WebCodecs `AudioDecoder`** streaming pipeline
(`IStreamingDecoder`) rather than the spec'd-and-replaced per-segment `decodeAudioData`/`IFormatDecoder`
model. This is the two-path reality this phase reconciles to. `product-notes/phase-18-opus-low-data-streaming.md`
is the design memo (note: its §3.4 `OpusFormatDecoder` framing predates the WebCodecs divergence — the
*seek-index/sidecar* design in §3.4a is accurate and landed; the *decoder-shape* discussion was superseded
by `IStreamingDecoder`).
- `PLAN.md` Phase 4 (landed) / `COMPLETED.md` — the HTTP Range `bytes=X-` primitive this generalizes
(now serving both `?format=lossless` and `?format=opus`).
- `PLAN.md` Phase 1.3 / 1.4 / 1.5 / 1.6 / 1.7 — the deferred decoder/scheduler-seam features; §5 above
reconciles each (1.5 and 1.7 updated for the Opus path).
- `PLAN.md` Phase 9 — defines the `Mix` medium (single long track), the canonical 1 GB case.
- `PLAN.md` Phase 10 / `product-notes/phase-10-mix-visualizer-lava-reframe.md` /
`product-notes/phase-12-waveform-visualizer-generalization.md` — establishes the preprocessed
per-track high-res waveform datum; the basis for C7 (visualizer does not read live PCM).
- `DeepDrftPublic/Interop/audio/PlaybackScheduler.ts` — owns the unbounded `buffers: AudioBuffer[]`, the
**shared sink for both decode paths**; 21.1 (eviction) lives here.
- `DeepDrftPublic/Interop/audio/AudioPlayer.ts` — the dispatch: `processFormatChunk` (WAV/MP3/FLAC) vs
`processOpusChunk` (Opus), both calling `scheduler.addBuffer`; `seekBeyondBuffer`/`reinitializeFromOffset`
branch per path; the place the refill trigger (21.3) and the fill-signal wiring (21.2) hook.
- `DeepDrftPublic/Interop/audio/StreamDecoder.ts` + `IFormatDecoder.ts` — the WAV/MP3/FLAC refill substrate
(`reinitializeForRangeContinuation`, `calculateByteOffset`).
- `DeepDrftPublic/Interop/audio/IStreamingDecoder.ts` + `OpusStreamDecoder.ts` + `OggDemuxer.ts` +
`OpusSidecar.ts` — the **Opus** path: the WebCodecs decode pipeline, the `decodeQueueSize`/`decodedQueue`
upstream accumulation 21.2b must bound, and the live `resolveOpusByteOffset` /
`reinitializeForRangeContinuation(landingTime, target)` seek 21.3 reuses. **`IStreamingDecoder.ts` is the
seam the Opus windowing hooks into** (push/complete/reinitialize lifecycle).
- `DeepDrftPublic.Client/Services/StreamingAudioPlayerService.cs` — the C# forward read loop
(`StreamAudioWithEarlyPlayback`, feeding *both* decoders), the seek-beyond-buffer path (`SeekBeyondBuffer`),
and the cancellation/drain discipline (C6); 21.2a/21.3 live here.
- `DeepDrftPublic.Client/Clients/TrackMediaClient.cs` — the Range-capable media fetch (with the `?format=`
param) reused by refill on both paths.