036ee1f78e
Move Phase 21 from PLAN to COMPLETED with the as-built record, and annotate the spec that Direction B shipped after WASM fetch buffering defeated A.
562 lines
46 KiB
Markdown
562 lines
46 KiB
Markdown
# Phase 21 — Windowed Streaming Buffer (bounded client memory for long streams)
|
||
|
||
Product spec. Status: **LANDED 2026-06-24 on `streaming-overhaul`.** See `COMPLETED.md` for the full
|
||
as-built record. Author: product-designer. Date: 2026-06-23 (reconciliation pass after Phase 18 landed).
|
||
|
||
> **AS-BUILT NOTE — Direction A→B pivot (2026-06-24).** This spec recommended **Direction A** (sliding
|
||
> window on one open-ended forward stream, pausing `ReadAsync`/the segment loop to backpressure the
|
||
> socket) and held **Direction B** (discrete bounded `Range: bytes=start-end` segments, §3.2) as the
|
||
> documented fallback. **21.4 browser validation proved Direction A insufficient for Blazor WASM:** the
|
||
> browser `fetch` API buffers the entire HTTP response body regardless of read pace — pausing reads
|
||
> bounded the *decode* but not the *network download*, so the whole ~970 MB body accumulated in browser
|
||
> memory even with the application decoding only a window of it. **We shipped Direction B.** The forward
|
||
> stream now issues sequential 4 MB bounded Range requests (`SegmentSizeBytes = 4 MB`), fetched via
|
||
> `RunSegmentedStreamAsync` in `StreamingAudioPlayerService`, each issued only after
|
||
> `PlaybackScheduler.evaluateProductionPause()` clears below low-water. Browser holds ~one 4 MB segment
|
||
> of raw bytes; 21.4 confirmed network-memory bounding in Daniel's browser run. The decode-side windowing
|
||
> (21.1/21.2) is unchanged and pairs with Direction B; seek/refill converge on the same segmented loop
|
||
> via `RecoverFromFailedRefill`. Direction A is recorded as tried-in-validation and found insufficient
|
||
> for the WASM `fetch` runtime. Sections §3.2–§3.3 below retain the original A vs. B vs. C analysis as
|
||
> the decision record; **Direction B is what shipped.**
|
||
|
||
Surface: **public listener site only** (`DeepDrftPublic.Client` player stack + `DeepDrftPublic`
|
||
TypeScript audio interop). No CMS (`DeepDrftManager`) change. No data-model or schema change. The one
|
||
server touch is **reuse, not new surface**: the existing `DeepDrftAPI` HTTP `Range: bytes=X-`
|
||
partial-content primitive (Phase 4, landed) is the load-bearing dependency; this phase adds no new API
|
||
endpoint.
|
||
|
||
> **Phase 18 (Opus Low-Data Streaming) has LANDED (2026-06-23, `COMPLETED.md`). This spec is reconciled
|
||
> to the as-built reality.** Phase 18 changed the landscape in two ways that reshape this phase:
|
||
>
|
||
> 1. **There are now TWO decode paths feeding the one `PlaybackScheduler`, not one.** (a) The original
|
||
> **WAV/MP3/FLAC** path — `StreamDecoder` → `IFormatDecoder` (wrap-each-segment + `decodeAudioData`).
|
||
> (b) A new **Opus** path — `OggDemuxer` → `OpusStreamDecoder` (the `IStreamingDecoder` seam, a stateful
|
||
> **WebCodecs `AudioDecoder`** pipeline). The §3.1 unbounded-memory root cause (the scheduler's
|
||
> push-only `AudioBuffer[]`) applies to **both** — but the Opus path adds a *second* accumulation locus
|
||
> upstream of the scheduler (the WebCodecs decode queue + `decodedQueue: AudioData[]`), so windowing it
|
||
> is not the same mechanism as windowing WAV. See §3.1.
|
||
> 2. **The accurate index-driven Opus seek the original spec assumed Phase 21 would build is ALREADY
|
||
> LIVE.** Phase 18 ships `resolveOpusByteOffset` (binary-search the precomputed seek index in
|
||
> `OpusSeekData`) → Range fetch → `OpusStreamDecoder.reinitializeForRangeContinuation(landingTime,
|
||
> target)` with frame-accurate lead-trim. Opus seek is **accurate, not approximate** — and **already
|
||
> shipping**. Phase 21 does **not** build Opus seek; it **reuses** that live seek for window-miss
|
||
> refills.
|
||
>
|
||
> **Correction of stale spec language.** The original draft described Opus as a future-wired
|
||
> `OpusFormatDecoder.calculateByteOffset` joining the `IFormatDecoder` registry, with seek as "approximate
|
||
> vs accurate." All of that is now wrong against the landed code: Opus does **not** use `IFormatDecoder`
|
||
> (it diverged to the `IStreamingDecoder`/WebCodecs seam precisely because per-segment `decodeAudioData` is
|
||
> architecturally wrong for Opus — see `IStreamingDecoder.ts`), and its seek is accurate and shipping. The
|
||
> body below is rewritten to the two-path reality. **The headline is unchanged:** bound client memory to a
|
||
> sliding window regardless of stream length, for the canonical 1 GB mix, across both delivery formats.
|
||
|
||
---
|
||
|
||
## 1. Goal
|
||
|
||
Bound the **client memory** a playing track consumes to a small, configurable forward window —
|
||
**independent of total stream length** — so a 1 GB+ DJ MIX (Phase 9 `Mix` medium: a single long track)
|
||
plays without the whole decoded PCM accumulating in the browser.
|
||
|
||
**The defect, stated precisely — and it now has two faces, one shared.** The network path already
|
||
streams in adaptive 16–64 KB chunks (`StreamingAudioPlayerService.StreamAudioWithEarlyPlayback`) — that
|
||
part is fine. The accumulation is on the **decode side**, and Phase 18 split the decode side into two
|
||
pipelines that both terminate at the same sink:
|
||
|
||
- **The shared sink (both paths) — the unbounded scheduler.** `PlaybackScheduler` holds
|
||
`private buffers: AudioBuffer[]` and **never evicts** ("Supports pause/resume/seek by **retaining all
|
||
buffers**" — its own doc comment). Both decode paths call `scheduler.addBuffer()` (via
|
||
`AudioPlayer.processFormatChunk` for WAV/MP3/FLAC and `processOpusChunk` for Opus); nothing is ever
|
||
removed. Decoded PCM is **larger than the source** in memory (Web Audio `AudioBuffer` is 32-bit float
|
||
per sample per channel — a 16-bit stereo WAV roughly **doubles** once decoded; Opus decodes to the same
|
||
48 kHz float PCM regardless of how few bytes the *compressed* stream was). So a 1 GB WAV becomes ~2 GB
|
||
of retained float, **and a low-data Opus mix becomes the same ~2 GB of decoded float once played** —
|
||
the compressed transfer is small, but the *decoded* footprint is identical. The scheduler is the OOM for
|
||
both. **This is the §3.1 root cause, unchanged from the original spec — it just now afflicts two
|
||
producers.**
|
||
- **The Opus-only second locus — upstream decode-ahead.** The Opus path accumulates *before* the
|
||
scheduler too: the WebCodecs `AudioDecoder` work queue (`decodeQueueSize`), the `decodedQueue:
|
||
AudioData[]` awaiting conversion, and the `OggDemuxer`'s partial-page state. Bounding the scheduler
|
||
alone does not bound these — they fill from the same C# `ReadAsync` loop, so they need their own
|
||
back-pressure (on the *demuxer/decoder feed*), not only the read loop's. WAV has no equivalent
|
||
upstream queue (its `StreamDecoder` decodes synchronously into the scheduler), so this is genuinely
|
||
Opus-specific.
|
||
|
||
**One-line framing:** today the player decodes the whole track into memory and keeps it — true for both
|
||
formats; Phase 21 makes it keep only a sliding forward window and discard what has already played,
|
||
refilling on demand from the Range primitive both paths already use for seek (WAV via `IFormatDecoder`,
|
||
Opus via the live index-driven `resolveOpusByteOffset`).
|
||
|
||
---
|
||
|
||
## 2. Constraints / invariants (the contract that must hold)
|
||
|
||
These are non-negotiable. The §3.5 streaming seam (root `CLAUDE.md` "Streaming-first audio playback";
|
||
`CONTEXT.md §3.5`) is called *the most architecturally load-bearing part of the playback path* by both
|
||
docs. This phase **modifies that seam** — so the contract it must preserve is spelled out here.
|
||
|
||
- **C1 — The seek-beyond-buffer Range path is the substrate, kept intact.** Phase 4 landed HTTP
|
||
`Range: bytes={offset}-` → `206 Partial Content` end to end (client `TrackMediaClient` →
|
||
`DeepDrftPublic` proxy → `DeepDrftAPI`), and `StreamDecoder.reinitializeForRangeContinuation` retains
|
||
the parsed format header on a continuation body (no re-parse). Windowed refill is a **generalization of
|
||
this exact path** (§3.1) — it must not require a second, divergent fetch mechanism.
|
||
- **C2 — Playback start latency unchanged.** Today playback starts as soon as a configurable minimum
|
||
buffer count is queued (header-derived duration, not full-file). The window model must keep first-audio
|
||
latency at parity — bounding memory must not reintroduce a fetch-then-play stall.
|
||
- **C3 — Neither decoder seam's contract is forked; windowing lives in the shared layer plus a thin
|
||
per-seam hook.** There are two decoder seams as of Phase 18: `IFormatDecoder` (WAV/MP3/FLAC, owns
|
||
format byte math; `AudioPlayer.createFormatDecoder` dispatches on `Content-Type`) and `IStreamingDecoder`
|
||
(Opus, the WebCodecs pipeline; selected in `initializeStreaming` when the content type is
|
||
`audio/ogg`/`audio/opus` and a sidecar is present). **The eviction half of windowing is fully shared** —
|
||
it lives in `PlaybackScheduler`, which both seams feed identically via `addBuffer`, so eviction adds
|
||
**zero** format branches. **The back-pressure / decode-ahead half is necessarily seam-aware** — the WAV
|
||
path back-pressures the C# `ReadAsync` loop; the Opus path must additionally bound the WebCodecs
|
||
decode-ahead and the `decodedQueue` (§3.1). Express that as a **small uniform signal** ("the scheduler is
|
||
full, stop producing") that each decode path honors in its own way, rather than a windowing controller
|
||
that reaches into either decoder's internals. The goal the original C3 stated still holds — no
|
||
format-specific logic leaking into the *scheduler* — but the spec now acknowledges the producer side has
|
||
two shapes, not one.
|
||
- **C4 — Read-only playback only.** This is a memory-management change, not a UX change. No new
|
||
user-visible control, no change to seek/transport semantics beyond what the listener already
|
||
experiences. Seek must still feel identical.
|
||
- **C5 — Window both decode paths without forking the scheduler/seam, reusing the live index-driven
|
||
seek for refill.** Both delivery formats must be windowed, and the byte↔time mapping each refill needs is
|
||
**already accurate and already shipping** for both:
|
||
- **WAV/MP3/FLAC** — `IFormatDecoder.calculateByteOffset` (CBR `byteRate` for WAV; the MP3/FLAC seek
|
||
accelerators for those), reached through `StreamDecoder.calculateByteOffset` / `AudioPlayer.seekBeyondBuffer`.
|
||
- **Opus** — `resolveOpusByteOffset(activeOpusSidecar, t)` (binary search the precomputed granule→byte
|
||
seek index in `OpusSeekData`), returning an exact page-start offset **and** a `landingTimeSeconds` for
|
||
the decoder's frame-accurate lead-trim. This is **accurate, not approximate, and landed in Phase 18.**
|
||
Phase 21 does **not** build either mapping. The window's refill trigger calls *whichever resolver the
|
||
active path already uses* — for Opus, the **same** `resolveOpusByteOffset` an explicit listener seek
|
||
calls (the live path in `AudioPlayer.seekBeyondBuffer`), so windowed refill is literally "a seek the
|
||
listener didn't initiate." A window opening away from byte 0 decodes correctly on the Opus path because
|
||
the setup header (`OpusHead`/`OpusTags`) is already cached from the sidecar and re-applied by
|
||
`reinitializeForRangeContinuation` (Phase 18 §3.4a B); the WAV path re-applies its retained header the
|
||
same way. **No new offset math, no approximation, no header re-fetch — all reused.** The invariant is
|
||
therefore *not* "make refill format-agnostic" (the two paths legitimately resolve offsets through
|
||
different code); it is **"reuse the live seek of each path verbatim; add only the eviction and the
|
||
refill *trigger*, never a second seek mechanism."**
|
||
- **C6 — No regression to the single-writer decoder concurrency guarantee — now covering both decoders.**
|
||
The C# loop is careful that only one streaming task feeds the active JS decoder at a time
|
||
(`DrainActiveStreamingTaskAsync`, the `_streamingCancellation` identity dance in
|
||
`StreamingAudioPlayerService`). This matters *more* for Opus: the WebCodecs `AudioDecoder` is stateful
|
||
and async — a `reset()`+`configure()` on a range-continuation (`reinitializeForRangeContinuation`) racing
|
||
a still-draining `push()` from a stale loop would corrupt inter-frame state, not merely deliver a wrong
|
||
buffer. Windowed refill introduces *more* mid-stream fetches against whichever decoder is active; every
|
||
one must route through the **same** drain/cancellation discipline, not around it. The discipline is
|
||
already decoder-agnostic at the C# layer (it cancels the loop, not the decoder), so this is a "keep using
|
||
it" invariant — but it is the rule most likely to be violated by a naive Opus refill, and is the hardest
|
||
failure to diagnose, so it is called out as a hard invariant for both paths.
|
||
- **C7 — The Mix visualizer's data source is independent and must stay that way.** The Phase 10/12
|
||
WebGL2 lava visualizer renders from a **preprocessed high-res waveform datum** fetched per-track
|
||
(`GET api/track/{entryKey}/waveform/high-res`), **not** from live decoded PCM. Confirmed: evicting
|
||
played `AudioBuffer`s cannot starve the visualizer — it never read them. The window model is invisible
|
||
to the visualizer. (This is the canonical 1 GB case *and* the case that proves the eviction is safe.)
|
||
|
||
---
|
||
|
||
## 3. Architectural shape
|
||
|
||
### 3.0 The mental model
|
||
|
||
A track's audio is a byte range `[0, fileLength)` on disk. At any moment the listener is at playback
|
||
position `P` (seconds → byte offset via the active path's resolver — `IFormatDecoder.calculateByteOffset`
|
||
for WAV/MP3/FLAC, `resolveOpusByteOffset` over the seek index for Opus). The player should hold decoded
|
||
`AudioBuffer`s only for a bounded window roughly `[P - back, P + ahead]` — and, on the Opus path, keep the
|
||
upstream WebCodecs decode queue near-empty too (§3.1):
|
||
|
||
- **forward fill (`ahead`)** — enough decoded lookahead that playback never starves (covers the existing
|
||
500 ms scheduler lookahead plus network jitter headroom);
|
||
- **back-retain (`back`)** — a small amount of *already-played* audio kept so a short seek-back does not
|
||
trigger a network refetch;
|
||
- **evict** — anything older than `P - back` is dropped (`AudioBuffer` references released → GC reclaims
|
||
the float data);
|
||
- **refill** — when forward decoded lookahead drops below a low-water mark, fetch+decode more from the
|
||
current byte position; when the window's tail is evicted and the listener seeks back past it, refetch
|
||
that region via the Range primitive (the seek-beyond-buffer path, run *backwards*).
|
||
|
||
This is a **ring/sliding-window buffer keyed on playback position**, driven by high/low-water marks —
|
||
the standard bounded-producer/bounded-consumer pattern, transplanted onto the decode→schedule seam.
|
||
|
||
### 3.1 Why refill is a generalization of seek-beyond-buffer, not a new mechanism — for both paths
|
||
|
||
The seek-beyond-buffer path already does **every refill primitive** the window needs, just triggered
|
||
manually and one-shot. As of Phase 18 each primitive has a WAV branch and an Opus branch, both live:
|
||
|
||
| Window operation | WAV/MP3/FLAC machinery reused | Opus machinery reused (Phase 18, landed) |
|
||
|-------------------------------|--------------------------------------------------------------------|--------------------------------------------------------------------------------------|
|
||
| Discard buffers, keep offset | `PlaybackScheduler.clearForSeek()` + `setPlaybackOffset()` | *same* — the scheduler is shared |
|
||
| Fetch from a byte offset | `TrackMediaClient` → `Range: bytes=X-` → 206 | *same* (with `?format=opus`) — the Range path is shared |
|
||
| Map time → byte offset | `StreamDecoder.calculateByteOffset()` → `IFormatDecoder` | `resolveOpusByteOffset(activeOpusSidecar, t)` (index binary search → exact page) |
|
||
| Decode a header-less body | `StreamDecoder.reinitializeForRangeContinuation(len)` | `OpusStreamDecoder.reinitializeForRangeContinuation(landingTime, target)` (demux/codec reset + lead-trim) |
|
||
| Single-loop safety on refetch | `_streamingCancellation` swap + `DrainActiveStreamingTaskAsync()` | *same* — the C# discipline is decoder-agnostic |
|
||
|
||
The genuinely-new work, by path:
|
||
|
||
- **Shared (both paths):** *partial eviction* on `PlaybackScheduler` (today it only ever `clear()`s
|
||
wholesale), and a *position-driven refill trigger* (a continuous low-water loop, not a one-shot seek).
|
||
- **WAV path:** *back-pressure on the C# `ReadAsync` loop* — stop reading the socket above the high-water
|
||
mark, resume below low-water. WAV's `StreamDecoder` decodes synchronously into the scheduler, so the
|
||
read loop is the *only* producer to throttle; pausing `ReadAsync` bounds it fully.
|
||
- **Opus path:** *the same C# back-pressure, plus bounding the WebCodecs decode-ahead.* Throttling
|
||
`ReadAsync` alone is **not sufficient** for Opus, because `OpusStreamDecoder.push()` is async and the
|
||
WebCodecs `AudioDecoder` keeps its own internal work queue (`decodeQueueSize`) plus a `decodedQueue:
|
||
AudioData[]` of decoded-but-not-yet-converted frames. The Opus producer must also stop *feeding the
|
||
decoder* (stop demuxing/decoding new packets) when the scheduler is full, and resume below low-water —
|
||
back-pressure on the **demuxer/decoder feed**, not only on the socket read. This is the one place the
|
||
two paths' windowing genuinely diverges.
|
||
|
||
Everything else — the fetch, the offset resolution, the header-carry continuation, the single-loop
|
||
cancellation safety — is **reused verbatim** on both paths. Phase 21 builds eviction + the refill trigger
|
||
+ the (per-path) back-pressure; it builds **no** new fetch, offset, or seek mechanism.
|
||
|
||
### 3.2 The three candidate directions
|
||
|
||
Per file convention the alternatives are recorded; the recommendation follows.
|
||
|
||
**Direction A — Sliding window on the existing single forward stream (recommended).**
|
||
Keep the current model where the C# loop reads one forward HTTP stream and pumps chunks into the active
|
||
JS decoder. Add three things: (1) `PlaybackScheduler` gains *partial eviction* — drop buffers whose
|
||
absolute-time end is older than `P - back`, adjusting its index bookkeeping so `getCurrentPosition()`
|
||
and scheduling stay correct against a buffer array that no longer starts at index 0 (**shared by both
|
||
paths** — the scheduler is the common sink); (2) *back-pressure on the C# read loop* — when forward
|
||
decoded lookahead exceeds the high-water mark, the C# loop **pauses reading** the HTTP stream (stops
|
||
calling `ReadAsync`) until playback drains it below low-water, then resumes; (3) **for the Opus path
|
||
only, back-pressure on the WebCodecs decode-ahead** — the producer also stops demuxing/decoding new
|
||
packets when the scheduler is full, so the `AudioDecoder` work queue and `decodedQueue` do not balloon
|
||
behind a throttled socket. Memory is bounded by high-water + back-retain on both paths. Seek-back beyond
|
||
the retained window falls through to the **existing** seek-beyond-buffer path (the right one per format)
|
||
unchanged.
|
||
*Why recommended:* smallest change to the load-bearing seam; reuses the live forward stream (no extra
|
||
connections in the common case); eviction and back-pressure are the only genuinely new mechanisms, all
|
||
local (the scheduler; the read loop; for Opus, the demux/decode feed). Back-pressure via "stop reading
|
||
the socket" is exactly how TCP flow control already wants to behave — pausing `ReadAsync` lets the kernel
|
||
window close; we are not fighting the transport. The Opus decode-ahead bound is the one addition Phase 18
|
||
forces, and it is local to the Opus producer.
|
||
*Open question it raises (OQ6, new):* whether the two paths' back-pressure is driven by **one shared
|
||
window controller** that exposes a "scheduler full / drained" signal both producers poll, or by **two
|
||
parallel implementations** sharing only the eviction code. Recommend the **shared signal** — see §6 OQ6.
|
||
|
||
**Direction B — Discrete window segments, each its own Range fetch.**
|
||
Treat the file as fixed-size byte segments (e.g. 4 MB). Hold N decoded segments around `P`; fetch the
|
||
next/previous segment via a fresh Range request as the window slides; discard the far segment. No live
|
||
long-lived forward stream — every window is an independent 206.
|
||
*Why not (default):* turns one connection into many short Range requests (more proxy hops through
|
||
`DeepDrftPublic`, more server-side `WavOffsetService`-style header synthesis, more places a fetch can
|
||
fail mid-stream — worsening the §1.6 error surface), and the byte↔time segment math must be exact at
|
||
every boundary. It *is* the cleaner model for true random-access (and the better base if seeking-heavy
|
||
usage dominates), so keep it as the fallback if Direction A's back-pressure proves leaky in practice.
|
||
Borrowed prior art: HLS/DASH segment windows and the MSE `SourceBuffer.remove()` eviction model — this
|
||
is how every production HTML5 adaptive player bounds memory. We are doing the hand-rolled equivalent
|
||
because the stack is a bespoke Web Audio graph, not `<media>` + MSE.
|
||
|
||
**Direction C — Adopt MediaSource Extensions (MSE) and let the browser manage the buffer.**
|
||
Stop hand-rolling the decode→schedule graph for long tracks; feed the Range stream into a `SourceBuffer`
|
||
and let the browser evict via its built-in quota + `remove()`. Memory management becomes the platform's
|
||
problem.
|
||
*Why not — RESOLVED, rejected (Daniel, 2026-06-23; see OQ5):* MSE does not accept raw WAV/PCM — it
|
||
wants containerized formats (fragmented MP4/WebM, or MP3/AAC elementary streams). The entire bespoke
|
||
visualizer/spectrum graph is wired to the Web Audio `AudioContext`, not a `<media>` element. Adopting
|
||
MSE is a **rewrite of the playback substrate**, not a windowing change. It *looked* like the real
|
||
long-term answer once compressed delivery arrived — but compressed delivery (**Phase 18 Opus, now
|
||
landed**) feeds the **same bespoke graph** via the WebCodecs `IStreamingDecoder` seam (parallel to the
|
||
WAV `IFormatDecoder` seam, both terminating at the shared `PlaybackScheduler`), so the compressed-delivery
|
||
move that would have justified MSE happened *without* surrendering the graph. Notably, Phase 18 chose a
|
||
**WebCodecs `AudioDecoder`** for Opus rather than `decodeAudioData` — which is itself the "use the platform
|
||
codec, keep the bespoke graph" move, but at the *decoder* granularity, not the *media-element* granularity
|
||
MSE would impose. **The bespoke graph is a deliberate long-term commitment; MSE is rejected.** Direction A
|
||
is therefore the permanent destination, not a stopgap that MSE will retire. Recorded as
|
||
considered-and-declined.
|
||
|
||
### 3.3 Recommended direction: A, with B held as the documented fallback
|
||
|
||
Direction A is the smallest coherent change that hits the headline (bounded memory under a 1 GB stream)
|
||
while honoring C1–C7. It keeps the live forward stream, reuses each path's seek-beyond-buffer machinery
|
||
for the only genuinely random-access case (seek-back past the retained tail), and isolates the new
|
||
mechanisms (eviction shared; back-pressure per path). **The final architecture and the exact
|
||
eviction/back-pressure API are staff-engineer's call at implementation** (per file convention); this spec
|
||
fixes the *shape* and the invariants, not the method signatures.
|
||
|
||
### 3.4 SOLID / road-not-taken rationale
|
||
|
||
- **SRP, preserved.** Eviction is a `PlaybackScheduler` concern (it already owns buffer storage, and is
|
||
the single shared sink both decode paths feed); refill orchestration is a player-service concern (it
|
||
already owns the C# fetch loop and the seek dispatch); byte↔time math stays where each path already keeps
|
||
it — `IFormatDecoder.calculateByteOffset` for WAV/MP3/FLAC, `resolveOpusByteOffset` (over `OpusSeekData`)
|
||
for Opus. No responsibility crosses a boundary it does not already own.
|
||
- **OCP, via the shared sink + the live per-path seek.** Eviction added at the scheduler changes zero
|
||
decoder code on either path. Refill reuses each path's *already-implemented* offset resolver — Phase 21
|
||
adds no offset math to either seam. The one place windowing is not purely additive is the Opus
|
||
decode-ahead bound (§3.1), which lives inside the Opus producer, not in the shared layer.
|
||
- **The seam stays single-writer (C6) — for both decoders.** Every new refetch routes through the existing
|
||
C# cancellation/drain discipline, so "only one loop feeds the active decoder" remains true for the WAV
|
||
`StreamDecoder` and the stateful Opus `AudioDecoder` alike. This is the rule most likely to be violated
|
||
by a naive Opus refill (a stale `push()` racing a `reset()`+`configure()`), and is called out as a hard
|
||
invariant.
|
||
- **Road not taken — eager full decode with a memory cap that just stops decoding.** Tempting (decode
|
||
until you hit a byte budget, then stop) but it breaks playback of long tracks past the cap entirely —
|
||
it bounds memory by *refusing to play the rest*, not by sliding. Rejected: it is a degradation, not a
|
||
feature.
|
||
|
||
---
|
||
|
||
## 4. Use cases
|
||
|
||
- **UC1 — Play a 1 GB+ DJ MIX start to finish (the headline).** Memory stays bounded throughout; the
|
||
listener experiences continuous playback identical to a short track. **Holds in both formats** — the
|
||
lossless WAV mix (~2 GB decoded if unbounded) and the low-data Opus mix (small transfer, but the *same*
|
||
~2 GB decoded float once played, so it needs windowing just as much; see §1).
|
||
- **UC1-Opus — The same mix streamed as Opus, windowed.** The low-data win (Phase 18) shrinks the
|
||
*transfer*; Phase 21 shrinks the *decoded footprint*. The two compound: a metered-connection listener on
|
||
Opus gets both the small download and the bounded memory. Windowing the Opus path additionally bounds the
|
||
WebCodecs decode-ahead and `decodedQueue`, not only the scheduler (§3.1).
|
||
- **UC2 — Seek forward within a long track.** Already handled by seek-beyond-buffer (the right resolver per
|
||
format — `IFormatDecoder` for WAV, the live `resolveOpusByteOffset` for Opus); under windowing the
|
||
forward seek clears the window and refills at the target — no behavior change, now with eviction so the
|
||
pre-seek region does not linger.
|
||
- **UC3 — Seek back a few seconds.** Served from the back-retain window with **no** network refetch
|
||
(the reason `back` exists).
|
||
- **UC4 — Seek back far, past the evicted tail.** Falls through to the existing seek-beyond-buffer Range
|
||
fetch, run toward an earlier offset. (Open question OQ2 — see §6.)
|
||
- **UC5 — Pause a long track for a long time.** Memory stays at the bounded window size while paused (no
|
||
continued decode). On resume, forward fill restarts from the low-water trigger.
|
||
- **UC6 — Mix detail page with the lava visualizer running.** Visualizer reads its preprocessed datum
|
||
(C7); windowing is invisible to it. Confirmed non-interaction.
|
||
|
||
---
|
||
|
||
## 5. Interaction with the deferred Phase 1 streaming features
|
||
|
||
This phase touches the **same decoder/scheduler seam** as the deferred Phase 1.3/1.4/1.5 items and the
|
||
1.6/1.7 robustness items. The interactions, explicitly:
|
||
|
||
- **1.3 Preload / prefetch (deferred; preload half).** *Shares machinery, does not conflict — and should
|
||
be sequenced after.* Preload stages the **next track** into a second decoder instance during the
|
||
current track's tail; windowing bounds the **current track's** forward buffer. They are orthogonal
|
||
axes (next-track vs. current-track-window), but they compound the memory question: a naive preload of a
|
||
second 1 GB mix would reintroduce the OOM this phase fixes. **Recommendation: land windowing first**,
|
||
so that when preload arrives, the staged next-track decoder is *also* windowed by construction (it
|
||
inherits the bounded scheduler). Windowing makes preload *safe for long tracks*; without it, preload of
|
||
mixes is a memory hazard.
|
||
- **1.4 Crossfade (deferred).** Needs two simultaneous `PlaybackScheduler` instances briefly overlapping.
|
||
Both would be windowed instances — the overlap doubles the *window* size momentarily, not the whole
|
||
track. Windowing makes crossfade between two long mixes affordable. No reordering needed; 1.4 still
|
||
gates on 1.3.
|
||
- **1.5 Gapless (deferred).** Sample-accurate hand-off of the next track's first buffer at the current
|
||
track's last buffer. Windowing changes *which* buffers are retained but not the hand-off mechanism;
|
||
the only care point is that the current track's **final** window must not be evicted before the gapless
|
||
boundary is scheduled. A minor invariant for whoever builds 1.5, not a blocker. **Phase 18 note:** the
|
||
former "1.5 is WAV-only" caveat is superseded — Opus is live, and it has its own encoder pre-skip/priming
|
||
(handled once by the WebCodecs decoder, see `OpusStreamDecoder.ts`), so a gapless Opus hand-off must
|
||
respect the end-trim against the sidecar's authoritative total length. That is 1.5's problem to absorb,
|
||
not Phase 21's; flagged so 1.5 inherits it.
|
||
- **1.6 Track-skip on error (deferred).** *Windowing enlarges the error surface — call this out.* Today
|
||
a fetch failure happens at load (one fetch) or at a user seek (one fetch). Windowed refill issues
|
||
**mid-stream** fetches the listener did not initiate; one of those can fail at byte 700 M of a 1 GB
|
||
mix. So Phase 21 should ship with at least the *cheap* half of 1.6: a mid-stream refill failure must
|
||
**surface a clear error and not wedge the player** (it must not leave playback "running" with a starved
|
||
scheduler — mirror the `playFromPosition` end-of-buffer recovery already in `PlaybackScheduler`). The
|
||
rich half (byte-scan to next valid frame) stays deferred. **Recommendation: fold the minimal refill-
|
||
failure handling into Phase 21's acceptance criteria** (AC6) rather than leaving it entirely to 1.6 —
|
||
it is created by this phase.
|
||
- **1.7 Safari compatibility (deferred).** Windowing adds no new Safari-specific surface beyond what the
|
||
streaming path already has. Two adjacencies, both Phase-18-introduced: (a) more frequent `AudioContext`
|
||
activity during refill should be checked against older-Safari `webkitAudioContext` quirks; (b) the Opus
|
||
path depends on **WebCodecs `AudioDecoder`**, whose Safari availability is narrower than `decodeAudioData`
|
||
Ogg-Opus support — Phase 18's capability gate already falls a non-WebCodecs browser back to the lossless
|
||
WAV path, so a Safari that can't run the Opus pipeline windows the *WAV* path (which has no decode-ahead
|
||
locus, only the scheduler), i.e. the simpler windowing case. Note it; do not block on it.
|
||
|
||
---
|
||
|
||
## 6. Open questions for Daniel (genuine product decisions, not implementation detail)
|
||
|
||
These are policy calls with user-visible or resource trade-offs — flagged rather than decided here.
|
||
|
||
- **OQ1 — Window size policy.** What bounds the window — a **fixed byte/time budget** (e.g. "hold at
|
||
most ~30 s decoded ahead + ~10 s behind"), or a **configurable memory budget** (e.g. "≤ N MB of
|
||
decoded PCM") that derives the time window from the stream's byte rate? Recommend a **time-based
|
||
forward window + small time-based back-retain** as the primary knob (intuitive, format-portable), with
|
||
a hard **memory ceiling** as a secondary guard. The exact numbers are tunable post-landing; Daniel
|
||
picks the *policy axis*. `[Daniel decision]`
|
||
- **OQ2 — Seek-back past the evicted window.** When the listener seeks back earlier than the retained
|
||
tail, we must refetch (the audio is gone). Acceptable to take the same brief re-buffer the forward
|
||
seek-beyond-buffer takes today? (Recommend yes — it is the symmetric case and listeners already accept
|
||
it forward.) Or should back-retain be generous enough that this is rare? `[Daniel decision]`
|
||
- **OQ3 — Configurable total in-flight memory cap.** Should there be a single hard byte ceiling on total
|
||
decoded audio held by the player (a safety net independent of the window-size policy), exposed as a
|
||
config value? Recommend **yes, as a guard rail** even if the window policy is time-based — it is the
|
||
backstop that makes "1 GB stream never OOMs" a guarantee rather than a tuning hope. `[Daniel
|
||
decision]`
|
||
- **OQ4 — Apply windowing to all tracks, or only long ones?** A 3-minute Cut decoded whole is ~30–60 MB
|
||
— harmless today. Windowing everything is simpler (one code path) but adds refill machinery to short
|
||
tracks that never needed it. Recommend **window everything** (one path, C6-safe, and short tracks
|
||
simply never hit a refill because they fit inside the forward window) — but Daniel may prefer a
|
||
size threshold. `[Daniel decision]`
|
||
- **OQ5 — Is MSE (Direction C) the real destination? — RESOLVED: NO (Daniel, 2026-06-23).** **Do not
|
||
adopt MSE. The bespoke Web Audio decode→schedule graph stays — it is bespoke by deliberate choice, a
|
||
long-term commitment, not a stopgap.** Daniel's rationale: the player is intentionally a custom
|
||
graph, not an HTML `<media>` element; the compressed-delivery move that *would* have made MSE
|
||
tempting was met instead by **Phase 18 (Opus low-data path, now landed)** feeding the **same bespoke
|
||
graph** through the WebCodecs `IStreamingDecoder` seam (parallel to the WAV `IFormatDecoder` seam) — so
|
||
compressed delivery arrived *without* surrendering the graph. Consequence for this phase: Direction A
|
||
(the hand-rolled sliding window) is the destination, not a placeholder; invest in it as permanent
|
||
machinery. It windows both the WAV and the Opus path (the header note). Direction C is recorded as
|
||
**considered and declined** per file convention; kept visible so a future reader sees the road not taken
|
||
and why. `[RESOLVED — bespoke graph retained; MSE rejected]`
|
||
- **OQ6 — One window controller for both decode paths, or two? (NEW — raised by the Phase 18 two-path
|
||
reality.)** Eviction is unambiguously shared (the scheduler is the one sink). Back-pressure is not: the
|
||
WAV path throttles the C# `ReadAsync` loop; the Opus path must *also* throttle the WebCodecs
|
||
decode-ahead (§3.1). Should there be **one window controller** exposing a uniform "scheduler full /
|
||
drained" signal that both producers honor in their own way (recommended — keeps the *policy* — window
|
||
sizes, water-marks, OQ1/OQ3 — in one place, with two thin per-path back-pressure hooks), or **two
|
||
parallel windowing implementations** sharing only the eviction code (simpler per-path, but duplicates the
|
||
water-mark logic and risks the two drifting)? Recommend the **shared controller + per-path hook**. This
|
||
is more an architecture call than a product call — flagged for staff-engineer at implementation, with the
|
||
recommendation as the default. `[staff-engineer call; recommendation: shared controller]`
|
||
- **OQ7 — How does the Opus WebCodecs decode-ahead bound interact with scheduler eviction? (NEW; technical,
|
||
for staff-engineer.)** The Opus producer has two queues to bound (the `AudioDecoder` work queue and
|
||
`decodedQueue: AudioData[]`) *plus* the shared scheduler. The clean rule is "stop feeding the decoder when
|
||
decoded-lookahead-in-the-scheduler exceeds high-water" — i.e. the **scheduler's** fill level is the
|
||
single back-pressure signal, and the upstream Opus queues are kept near-empty by simply not demuxing
|
||
ahead. The alternative (let the decoder run ahead into `decodedQueue` and bound *that* separately) adds a
|
||
second budget to tune and a second eviction point. Recommend the former: **one fill signal (scheduler
|
||
decoded-lookahead), drive both the read-loop pause and the demux/decode pause from it.** Confirm at
|
||
implementation that the WebCodecs decoder tolerates being starved of input mid-stream and resumes cleanly
|
||
(it should — it is fed packet-by-packet via `decode()`), and that `decodedQueue` is drained promptly so
|
||
it never holds more than one `push()` worth. `[staff-engineer call; recommendation: single
|
||
scheduler-fill signal]`
|
||
|
||
---
|
||
|
||
## 7. Acceptance criteria
|
||
|
||
- **AC1 (headline) — Bounded memory under a 1 GB stream, in BOTH formats.** Playing a 1 GB+ mix start to
|
||
finish — **as lossless WAV and as low-data Opus** — the browser tab's retained decoded-audio memory
|
||
stays bounded to the configured window (not growing toward ~2 GB). Verifiable via browser memory tooling:
|
||
peak decoded-audio footprint is independent of track length and tracks the window-size policy, not the
|
||
file size. The Opus case must be verified explicitly — its small *transfer* does not imply a small
|
||
*decoded* footprint (§1), so "Opus already streams small" is **not** sufficient.
|
||
- **AC1-Opus — The Opus upstream decode-ahead is bounded too (§3.1 / OQ7).** Under a long Opus stream, the
|
||
WebCodecs decode queue and `decodedQueue` do not grow unboundedly behind the scheduler — back-pressure
|
||
reaches the demux/decode feed, not only the scheduler. Verifiable: the upstream queues stay near-empty
|
||
(one `push()` worth) regardless of stream length.
|
||
- **AC2 — Playback-start latency at parity (C2).** First-audio latency for a track is unchanged from
|
||
pre-windowing (within noise). Windowing does not introduce a fetch-then-play stall.
|
||
- **AC3 — Continuous playback, no starvation.** A long mix plays edge to edge with no audible gaps,
|
||
underruns, or stalls under normal network conditions — the forward fill stays ahead of the playhead.
|
||
- **AC4 — Seek-back within the window is instant (UC3).** A short backward seek into retained audio
|
||
produces no network request.
|
||
- **AC5 — Seek (forward, and back past the window) still works (UC2/UC4).** Both resolve via the
|
||
existing Range path with the same behavior the listener sees today; the pre-seek region is evicted, not
|
||
retained.
|
||
- **AC6 — A mid-stream refill failure degrades cleanly (the 1.6 adjacency).** A failed refill fetch
|
||
surfaces a clear user-visible error and leaves the player in a recoverable state (not a wedged
|
||
"playing" with a starved scheduler). It must not silently hang.
|
||
- **AC7 — The Mix visualizer is unaffected (C7).** With the lava visualizer running on a long mix, the
|
||
visualizer renders identically (it reads the preprocessed datum, never the evicted buffers).
|
||
- **AC8 — Single-writer decoder concurrency invariant holds (C6) — both decoders.** Under rapid seek +
|
||
refill activity, no interleaved `ProcessStreamingChunk` / `push` calls corrupt the active decoder — the
|
||
existing drain/cancel discipline still governs every fetch. **For Opus this is stricter:** no stale
|
||
`push()` may land against the WebCodecs `AudioDecoder` across a `reinitializeForRangeContinuation`
|
||
reset+reconfigure (which would corrupt inter-frame state, not just a buffer). Verify under a rapid
|
||
seek-storm on an Opus mix specifically.
|
||
|
||
---
|
||
|
||
## 8. Wave decomposition
|
||
|
||
**Decomposition choice: split by *concern* (eviction → back-pressure → seek-back refill → validate), not
|
||
by *path* (WAV-track vs Opus-track).** Rationale: the eviction concern (21.1) is genuinely shared — the
|
||
scheduler is the one sink both paths feed — so a path-split would duplicate the hardest correctness work or
|
||
arbitrarily assign it to one track. The concern spine keeps that shared work as a single cold-start wave
|
||
and lets the *one* genuinely path-divergent concern (back-pressure, 21.2) carry an explicit two-track
|
||
split *inside* the wave rather than fracturing the whole phase. This also matches how the seek-back refill
|
||
(21.3) reuses each path's already-live seek — it is one concern (window-miss → refetch) with a per-path
|
||
resolver underneath, not two features. The spine is unchanged from the original spec; the mechanisms
|
||
inside 21.2 and 21.3 are made correct for both paths.
|
||
|
||
Dependency shape: `21.1 → 21.2 → 21.3`, with `21.4` validating the whole. 21.1 is the cold-start
|
||
prerequisite and the load-bearing change; the rest layer on it.
|
||
|
||
- **21.1 — Partial eviction in `PlaybackScheduler` (cold-start; the load-bearing change; SHARED by both
|
||
paths).** Give the scheduler the ability to drop already-played buffers and keep its position/index
|
||
bookkeeping correct against a buffer array that no longer begins at absolute time 0 (today
|
||
`getCurrentPosition`, `playFromPosition`, and the scheduling loop all assume `buffers[0]` is the track
|
||
start). This is the hardest correctness work in the phase — the time-anchor math must stay exact through
|
||
eviction. Because both decode paths feed the scheduler identically via `addBuffer`, **eviction is written
|
||
once and serves both** — no per-path branch. No refill yet; with eviction alone and the forward producers
|
||
unchanged, this is provably memory-bounded for the *played* region on both paths. **Independent of the §6
|
||
open questions** — it can begin immediately; the window *sizes* (OQ1/OQ3) are parameters fed in later.
|
||
Settled and cold-start.
|
||
- **21.2 — Back-pressure (the bound on the *unplayed* region) — two tracks, one signal.** Bound the
|
||
not-yet-played decoded audio by stopping production above a high-water mark and resuming below low-water,
|
||
driven by the scheduler's decoded-lookahead fill (OQ7). The fill *signal* is shared; the *throttle* has
|
||
two sites because Phase 18 gave the two paths different producers:
|
||
- **21.2a — C# read-loop back-pressure (serves both paths).** Make `StreamAudioWithEarlyPlayback` stop
|
||
calling `ReadAsync` above high-water and resume below low-water. Routes resume/pause through the
|
||
existing cancellation-safe single-loop discipline (C6). For the WAV path this is *sufficient* (its
|
||
`StreamDecoder` decodes synchronously into the scheduler).
|
||
- **21.2b — Opus decode-ahead back-pressure (Opus path only).** Additionally stop demuxing/decoding new
|
||
packets when the same fill signal is over high-water, so the WebCodecs decode queue and `decodedQueue`
|
||
do not balloon behind a throttled socket (§3.1, OQ7). This is the one mechanism with no WAV analogue.
|
||
Confirm the WebCodecs decoder resumes cleanly after being starved of input mid-stream.
|
||
Together with 21.1 this bounds *both* the played and unplayed sides on *both* formats — the full memory
|
||
guarantee (AC1 + AC1-Opus). **Depends on 21.1** (eviction must exist so the drained region is reclaimed,
|
||
not merely un-read). Per OQ6, 21.2a and 21.2b ideally share one window controller exposing the fill
|
||
signal; the recommendation is the shared controller + two thin hooks.
|
||
- **21.3 — Seek-back-past-window refill (close the random-access case; one concern, per-path resolver).**
|
||
Wire UC4 — when a backward seek lands earlier than the retained tail, refetch via the existing
|
||
seek-beyond-buffer path pointed at the earlier offset, **using whichever resolver the active path already
|
||
ships** (`IFormatDecoder`/`StreamDecoder.calculateByteOffset` for WAV; the live
|
||
`resolveOpusByteOffset` + `OpusStreamDecoder.reinitializeForRangeContinuation` for Opus) — plus the
|
||
minimal AC6 refill-failure handling. Mostly **reuse** of the landed seek paths; the new work is the
|
||
trigger (window-miss detection) and the clean-failure path, both format-agnostic. **Depends on 21.1 +
|
||
21.2** (needs the window boundaries they define).
|
||
- **21.4 — Validation pass against the 1 GB target, BOTH formats (acceptance).** Exercise AC1–AC8 against a
|
||
real 1 GB+ mix **streamed as WAV and as Opus**: memory profiling (AC1 both formats + AC1-Opus upstream
|
||
queues), latency parity (AC2), edge-to-edge playback (AC3), the seek matrix (AC4/AC5), induced refill
|
||
failure (AC6), visualizer-running (AC7), and rapid-seek concurrency (AC8 — including the Opus
|
||
seek-storm). Largely test/measurement; any break is likely a tuning fix in the 21.1 anchor math, the
|
||
21.2 water-marks, or the 21.2b Opus decode-ahead bound. **Depends on 21.1–21.3.**
|
||
|
||
---
|
||
|
||
## 9. Cross-references (read before implementing)
|
||
|
||
- Root `CLAUDE.md` "Streaming-first audio playback" / `CONTEXT.md §3.5` — the seam this phase modifies;
|
||
the §2 invariants here restate its contract. Both flag it as the most load-bearing path.
|
||
- **`COMPLETED.md` Phase 18 — Opus Low-Data Streaming (landed 2026-06-23) — read this first.** The
|
||
"as-built divergence" note records why Opus uses a **WebCodecs `AudioDecoder`** streaming pipeline
|
||
(`IStreamingDecoder`) rather than the spec'd-and-replaced per-segment `decodeAudioData`/`IFormatDecoder`
|
||
model. This is the two-path reality this phase reconciles to. `product-notes/phase-18-opus-low-data-streaming.md`
|
||
is the design memo (note: its §3.4 `OpusFormatDecoder` framing predates the WebCodecs divergence — the
|
||
*seek-index/sidecar* design in §3.4a is accurate and landed; the *decoder-shape* discussion was superseded
|
||
by `IStreamingDecoder`).
|
||
- `PLAN.md` Phase 4 (landed) / `COMPLETED.md` — the HTTP Range `bytes=X-` primitive this generalizes
|
||
(now serving both `?format=lossless` and `?format=opus`).
|
||
- `PLAN.md` Phase 1.3 / 1.4 / 1.5 / 1.6 / 1.7 — the deferred decoder/scheduler-seam features; §5 above
|
||
reconciles each (1.5 and 1.7 updated for the Opus path).
|
||
- `PLAN.md` Phase 9 — defines the `Mix` medium (single long track), the canonical 1 GB case.
|
||
- `PLAN.md` Phase 10 / `product-notes/phase-10-mix-visualizer-lava-reframe.md` /
|
||
`product-notes/phase-12-waveform-visualizer-generalization.md` — establishes the preprocessed
|
||
per-track high-res waveform datum; the basis for C7 (visualizer does not read live PCM).
|
||
- `DeepDrftPublic/Interop/audio/PlaybackScheduler.ts` — owns the unbounded `buffers: AudioBuffer[]`, the
|
||
**shared sink for both decode paths**; 21.1 (eviction) lives here.
|
||
- `DeepDrftPublic/Interop/audio/AudioPlayer.ts` — the dispatch: `processFormatChunk` (WAV/MP3/FLAC) vs
|
||
`processOpusChunk` (Opus), both calling `scheduler.addBuffer`; `seekBeyondBuffer`/`reinitializeFromOffset`
|
||
branch per path; the place the refill trigger (21.3) and the fill-signal wiring (21.2) hook.
|
||
- `DeepDrftPublic/Interop/audio/StreamDecoder.ts` + `IFormatDecoder.ts` — the WAV/MP3/FLAC refill substrate
|
||
(`reinitializeForRangeContinuation`, `calculateByteOffset`).
|
||
- `DeepDrftPublic/Interop/audio/IStreamingDecoder.ts` + `OpusStreamDecoder.ts` + `OggDemuxer.ts` +
|
||
`OpusSidecar.ts` — the **Opus** path: the WebCodecs decode pipeline, the `decodeQueueSize`/`decodedQueue`
|
||
upstream accumulation 21.2b must bound, and the live `resolveOpusByteOffset` /
|
||
`reinitializeForRangeContinuation(landingTime, target)` seek 21.3 reuses. **`IStreamingDecoder.ts` is the
|
||
seam the Opus windowing hooks into** (push/complete/reinitialize lifecycle).
|
||
- `DeepDrftPublic.Client/Services/StreamingAudioPlayerService.cs` — the C# forward read loop
|
||
(`StreamAudioWithEarlyPlayback`, feeding *both* decoders), the seek-beyond-buffer path (`SeekBeyondBuffer`),
|
||
and the cancellation/drain discipline (C6); 21.2a/21.3 live here.
|
||
- `DeepDrftPublic.Client/Clients/TrackMediaClient.cs` — the Range-capable media fetch (with the `?format=`
|
||
param) reused by refill on both paths.
|