docs: spec Phase 21 — windowed streaming buffer for bounded client memory
This commit is contained in:
@@ -443,6 +443,85 @@ not the same work; this phase does not satisfy or depend on that one.
|
||||
|
||||
---
|
||||
|
||||
## Phase 21 — Windowed Streaming Buffer (bounded client memory for long streams)
|
||||
|
||||
Bound the **client memory** a playing track consumes to a small, configurable forward window —
|
||||
**independent of total stream length** — so a 1 GB+ DJ MIX (Phase 9 `Mix` medium: a single long track)
|
||||
plays without the whole decoded PCM accumulating in the browser. **Public listener site only**
|
||||
(`DeepDrftPublic.Client` player stack + `DeepDrftPublic` TypeScript audio interop); no CMS, no API
|
||||
endpoint, no schema change.
|
||||
|
||||
The network path already streams in adaptive 16–64 KB chunks. The accumulation is on the **decode
|
||||
side**: `PlaybackScheduler` holds an `AudioBuffer[]` it **never evicts** ("Supports pause/resume/seek by
|
||||
retaining all buffers" — its own doc comment). Decoded PCM is larger than the source (Web Audio is
|
||||
32-bit float per sample/channel — a 16-bit stereo WAV roughly doubles once decoded), so a 1 GB WAV
|
||||
becomes ~2 GB of retained float data. That is the OOM. The fix: hold only a sliding forward window plus a
|
||||
small back-retain, discard already-played buffers, and refill on demand.
|
||||
|
||||
**Architectural spine — a sliding window keyed on playback position, built as a generalization of the
|
||||
landed seek-beyond-buffer path.** The Phase 4 HTTP `Range: bytes=X-` → 206 primitive already does every
|
||||
plumbing primitive the window needs (discard-buffers-keep-offset via `clearForSeek`/`setPlaybackOffset`;
|
||||
fetch-from-offset via `TrackMediaClient`; decode-header-less-body via
|
||||
`StreamDecoder.reinitializeForRangeContinuation`; time→byte via `IFormatDecoder.calculateByteOffset`),
|
||||
just triggered manually and one-shot. The only genuinely new mechanisms are **partial eviction** on the
|
||||
scheduler and **back-pressure** on the forward read loop (stop calling `ReadAsync` above a high-water
|
||||
mark, resume below low-water). Recommended **Direction A** (sliding window on the existing single forward
|
||||
stream); **Direction B** (discrete Range-fetched segments — the HLS/DASH/MSE-eviction analogue) held as
|
||||
the documented fallback; **Direction C** (adopt MSE and let the browser manage the buffer) flagged as the
|
||||
real long-term answer but out of scope — it is a playback-substrate rewrite entangled with non-WAV
|
||||
formats (Phase 1.2), surfaced as OQ5.
|
||||
|
||||
**Invariants that must hold (the §3.5 seam contract).** Reuse the Range path, don't fork it; playback-
|
||||
start latency at parity; the `IFormatDecoder` abstraction untouched (windowing is format-agnostic, so
|
||||
wiring MP3/FLAC later inherits it free); read-only playback (no new control); the single-instance JS
|
||||
decoder stays single-writer (every refill routes through the existing cancellation/drain discipline). The
|
||||
**Mix visualizer is provably unaffected** — it renders from the preprocessed per-track high-res datum
|
||||
(Phase 10/12), never from live decoded PCM, so evicting played buffers cannot starve it. The 1 GB mix is
|
||||
both the canonical case *and* the proof the eviction is safe.
|
||||
|
||||
**Interaction with deferred Phase 1 features (same seam):** windowing should land **before** preload
|
||||
(1.3) — it makes preload of long tracks memory-safe by construction (a staged next-track decoder inherits
|
||||
the bounded scheduler); it makes crossfade (1.4) between two long mixes affordable (the overlap doubles
|
||||
the *window*, not the track); it adds a minor "don't evict the final window before the gapless boundary"
|
||||
care point for 1.5. It **enlarges the error surface** (1.6): windowed refill issues mid-stream fetches
|
||||
the listener didn't initiate, one of which can fail deep into a 1 GB mix — so the *cheap* half of 1.6
|
||||
(clean refill-failure handling, no wedged player) is folded into this phase's acceptance criteria, not
|
||||
left fully to 1.6.
|
||||
|
||||
Full design, the three directions with SOLID/road-not-taken rationale, use cases, acceptance criteria,
|
||||
the open-question set, and the wave decomposition: `product-notes/phase-21-windowed-streaming-buffer.md`.
|
||||
|
||||
Sequenced as four waves. `21.1 → 21.2 → 21.3`, with `21.4` validating the whole. **21.1 is the cold-start
|
||||
prerequisite and the load-bearing change** — independent of the open questions (window *sizes* are
|
||||
parameters fed in later).
|
||||
|
||||
- **21.1 — Partial eviction in `PlaybackScheduler` (cold-start; load-bearing).** Drop already-played
|
||||
buffers while keeping the position/index/time-anchor bookkeeping exact against a buffer array that no
|
||||
longer begins at absolute time 0 (today `getCurrentPosition`/`playFromPosition`/the schedule loop all
|
||||
assume `buffers[0]` is the track start). The hardest correctness work in the phase. No refill yet.
|
||||
**Independent of the open questions — can begin immediately.**
|
||||
- **21.2 — Back-pressure on the forward read loop.** Stop `ReadAsync` above the high-water mark, resume
|
||||
below low-water; together with 21.1 this bounds *both* the played and unplayed regions (the AC1
|
||||
guarantee). Routes resume/pause through the existing single-loop cancellation discipline. **Depends on
|
||||
21.1.**
|
||||
- **21.3 — Seek-back-past-window refill.** When a backward seek lands earlier than the retained tail,
|
||||
refetch via the existing seek-beyond-buffer Range path pointed at the earlier offset; plus the minimal
|
||||
clean refill-failure handling (the 1.6 adjacency). Mostly reuse of the landed seek path. **Depends on
|
||||
21.1 + 21.2.**
|
||||
- **21.4 — Validation against the 1 GB target (acceptance).** Memory profiling (bounded under 1 GB is the
|
||||
headline), latency parity, edge-to-edge playback, the seek matrix, induced refill failure, visualizer-
|
||||
running, rapid-seek concurrency. Largely measurement; breaks are tuning fixes in 21.1's anchor math or
|
||||
21.2's water-marks. **Depends on 21.1–21.3.**
|
||||
|
||||
**Dependency shape:** `21.1 → 21.2 → 21.3 → 21.4`; 21.1 is the only cold-start wave. **Open questions for
|
||||
Daniel (spec §6):** window-size policy axis (time-based window + memory guard — recommended); seek-back-
|
||||
past-window re-buffer acceptable (recommend yes, symmetric to forward); a hard total in-flight memory cap
|
||||
as a guard rail (recommend yes); window everything vs. only long tracks (recommend everything — one path,
|
||||
short tracks never hit a refill); and whether MSE is the real destination (steer informing scope, not a
|
||||
blocker). None block 21.1.
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Working with this file
|
||||
|
||||
@@ -0,0 +1,349 @@
|
||||
# Phase 21 — Windowed Streaming Buffer (bounded client memory for long streams)
|
||||
|
||||
Product spec. Status: **design / framing — implementation-ready pending Daniel's open-question calls.**
|
||||
Author: product-designer. Date: 2026-06-23. **No code has been written by this doc.**
|
||||
Surface: **public listener site only** (`DeepDrftPublic.Client` player stack + `DeepDrftPublic`
|
||||
TypeScript audio interop). No CMS (`DeepDrftManager`) change. No data-model or schema change. The one
|
||||
server touch is **reuse, not new surface**: the existing `DeepDrftAPI` HTTP `Range: bytes=X-`
|
||||
partial-content primitive (Phase 4, landed) is the load-bearing dependency; this phase adds no new API
|
||||
endpoint.
|
||||
|
||||
---
|
||||
|
||||
## 1. Goal
|
||||
|
||||
Bound the **client memory** a playing track consumes to a small, configurable forward window —
|
||||
**independent of total stream length** — so a 1 GB+ DJ MIX (Phase 9 `Mix` medium: a single long track)
|
||||
plays without the whole decoded PCM accumulating in the browser.
|
||||
|
||||
**The defect, stated precisely.** The network path already streams in adaptive 16–64 KB chunks
|
||||
(`StreamingAudioPlayerService.StreamAudioWithEarlyPlayback`) — that part is fine. The accumulation is on
|
||||
the **decode side**: `PlaybackScheduler` holds `private buffers: AudioBuffer[]` and **never evicts**
|
||||
("Supports pause/resume/seek by **retaining all buffers**" — its own doc comment). Every 64 KB segment
|
||||
the `StreamDecoder` decodes is pushed via `addBuffer()` and kept for the life of the track. Decoded PCM
|
||||
is **larger than the compressed-or-raw source** in memory (Web Audio `AudioBuffer` is 32-bit float per
|
||||
sample per channel — a 16-bit stereo WAV roughly **doubles** in size once decoded), so a 1 GB WAV becomes
|
||||
~2 GB of retained `AudioBuffer` float data. That is the OOM.
|
||||
|
||||
**One-line framing:** today the player decodes the whole track into memory and keeps it; Phase 21 makes
|
||||
it keep only a sliding forward window and discard what has already played, refilling on demand from the
|
||||
Range primitive it already uses for seek.
|
||||
|
||||
---
|
||||
|
||||
## 2. Constraints / invariants (the contract that must hold)
|
||||
|
||||
These are non-negotiable. The §3.5 streaming seam (root `CLAUDE.md` "Streaming-first audio playback";
|
||||
`CONTEXT.md §3.5`) is called *the most architecturally load-bearing part of the playback path* by both
|
||||
docs. This phase **modifies that seam** — so the contract it must preserve is spelled out here.
|
||||
|
||||
- **C1 — The seek-beyond-buffer Range path is the substrate, kept intact.** Phase 4 landed HTTP
|
||||
`Range: bytes={offset}-` → `206 Partial Content` end to end (client `TrackMediaClient` →
|
||||
`DeepDrftPublic` proxy → `DeepDrftAPI`), and `StreamDecoder.reinitializeForRangeContinuation` retains
|
||||
the parsed format header on a continuation body (no re-parse). Windowed refill is a **generalization of
|
||||
this exact path** (§3.1) — it must not require a second, divergent fetch mechanism.
|
||||
- **C2 — Playback start latency unchanged.** Today playback starts as soon as a configurable minimum
|
||||
buffer count is queued (header-derived duration, not full-file). The window model must keep first-audio
|
||||
latency at parity — bounding memory must not reintroduce a fetch-then-play stall.
|
||||
- **C3 — The format-decoder abstraction is untouched.** `IFormatDecoder` (WAV active; MP3/FLAC
|
||||
implemented, not yet wired) owns all format-specific byte math. Windowing lives in the
|
||||
**format-agnostic** layer (`PlaybackScheduler` eviction + `StreamDecoder`/player refill
|
||||
orchestration); it must add **no** format-specific branches. A future wired MP3/FLAC decoder inherits
|
||||
windowing for free.
|
||||
- **C4 — Read-only playback only.** This is a memory-management change, not a UX change. No new
|
||||
user-visible control, no change to seek/transport semantics beyond what the listener already
|
||||
experiences. Seek must still feel identical.
|
||||
- **C5 — WAV-only is the shipping target; the design must not foreclose MP3/FLAC.** Byte↔time mapping
|
||||
for refill is exact and cheap for WAV (CBR: `byteRate` from the header). For VBR formats the mapping is
|
||||
approximate (the decoders already carry TOC/SEEKTABLE seek math). The window machinery must express
|
||||
refill in terms of the decoder's existing `calculateByteOffset`, so the same code works when those
|
||||
formats are wired — **no WAV-special-cased offset math in the window layer.**
|
||||
- **C6 — No regression to the single-instance JS decoder concurrency guarantees.** The current code is
|
||||
careful that only one streaming loop touches the single JS `StreamDecoder` at a time
|
||||
(`DrainActiveStreamingTaskAsync`, the `_streamingCancellation` identity dance). Windowed refill
|
||||
introduces *more* mid-stream fetches; it must route through the **same** drain/cancellation discipline,
|
||||
not around it.
|
||||
- **C7 — The Mix visualizer's data source is independent and must stay that way.** The Phase 10/12
|
||||
WebGL2 lava visualizer renders from a **preprocessed high-res waveform datum** fetched per-track
|
||||
(`GET api/track/{entryKey}/waveform/high-res`), **not** from live decoded PCM. Confirmed: evicting
|
||||
played `AudioBuffer`s cannot starve the visualizer — it never read them. The window model is invisible
|
||||
to the visualizer. (This is the canonical 1 GB case *and* the case that proves the eviction is safe.)
|
||||
|
||||
---
|
||||
|
||||
## 3. Architectural shape
|
||||
|
||||
### 3.0 The mental model
|
||||
|
||||
A track's audio is a byte range `[0, fileLength)` on disk. At any moment the listener is at playback
|
||||
position `P` (seconds → byte offset via the format decoder). The player should hold decoded
|
||||
`AudioBuffer`s only for a bounded window roughly `[P - back, P + ahead]`:
|
||||
|
||||
- **forward fill (`ahead`)** — enough decoded lookahead that playback never starves (covers the existing
|
||||
500 ms scheduler lookahead plus network jitter headroom);
|
||||
- **back-retain (`back`)** — a small amount of *already-played* audio kept so a short seek-back does not
|
||||
trigger a network refetch;
|
||||
- **evict** — anything older than `P - back` is dropped (`AudioBuffer` references released → GC reclaims
|
||||
the float data);
|
||||
- **refill** — when forward decoded lookahead drops below a low-water mark, fetch+decode more from the
|
||||
current byte position; when the window's tail is evicted and the listener seeks back past it, refetch
|
||||
that region via the Range primitive (the seek-beyond-buffer path, run *backwards*).
|
||||
|
||||
This is a **ring/sliding-window buffer keyed on playback position**, driven by high/low-water marks —
|
||||
the standard bounded-producer/bounded-consumer pattern, transplanted onto the decode→schedule seam.
|
||||
|
||||
### 3.1 Why this is a generalization of seek-beyond-buffer, not a new mechanism
|
||||
|
||||
The seek-beyond-buffer path already does **every primitive** the window needs, just triggered manually
|
||||
and one-shot:
|
||||
|
||||
| Window operation | Existing seek-beyond-buffer machinery it reuses |
|
||||
|-------------------------------|-----------------------------------------------------------------------------------|
|
||||
| Discard buffers, keep offset | `PlaybackScheduler.clearForSeek()` + `setPlaybackOffset()` (clears buffers, retains the absolute-time anchor) |
|
||||
| Fetch from a byte offset | `TrackMediaClient.GetTrackMedia(key, byteOffset)` → `Range: bytes=X-` → 206 |
|
||||
| Decode a header-less body | `StreamDecoder.reinitializeForRangeContinuation(remainingByteLength)` |
|
||||
| Map time → byte offset | `StreamDecoder.calculateByteOffset()` → `IFormatDecoder.calculateByteOffset()` |
|
||||
| Single-loop safety on refetch | `_streamingCancellation` swap + `DrainActiveStreamingTaskAsync()` |
|
||||
|
||||
The difference is **eviction does not exist yet** (the scheduler only ever `clear()`s wholesale) and
|
||||
**refill is one-shot** (a seek, not a continuous low-water-triggered loop). So the new work is two
|
||||
seams: a *partial-evict* on the scheduler, and a *position-driven refill controller* on the player. The
|
||||
fetch/decode/offset plumbing is reused verbatim.
|
||||
|
||||
### 3.2 The three candidate directions
|
||||
|
||||
Per file convention the alternatives are recorded; the recommendation follows.
|
||||
|
||||
**Direction A — Sliding window on the existing single forward stream (recommended).**
|
||||
Keep the current model where the C# loop reads one forward HTTP stream and pumps chunks into the JS
|
||||
decoder. Add two things: (1) `PlaybackScheduler` gains *partial eviction* — drop buffers whose
|
||||
absolute-time end is older than `P - back`, adjusting its index bookkeeping so `getCurrentPosition()`
|
||||
and scheduling stay correct against a buffer array that no longer starts at index 0; (2) a
|
||||
*back-pressure* signal — when forward decoded lookahead exceeds the high-water mark, the C# loop
|
||||
**pauses reading** the HTTP stream (stops calling `ReadAsync`) until playback drains it below low-water,
|
||||
then resumes. Memory is bounded by high-water + back-retain. Seek-back beyond the retained window falls
|
||||
through to the **existing** seek-beyond-buffer path unchanged.
|
||||
*Why recommended:* smallest change to the load-bearing seam; reuses the live forward stream (no extra
|
||||
connections in the common case); eviction and back-pressure are the only genuinely new mechanisms, and
|
||||
both are local (one to the scheduler, one to the read loop). Back-pressure via "stop reading the socket"
|
||||
is exactly how TCP flow control already wants to behave — pausing `ReadAsync` lets the kernel window
|
||||
close; we are not fighting the transport.
|
||||
|
||||
**Direction B — Discrete window segments, each its own Range fetch.**
|
||||
Treat the file as fixed-size byte segments (e.g. 4 MB). Hold N decoded segments around `P`; fetch the
|
||||
next/previous segment via a fresh Range request as the window slides; discard the far segment. No live
|
||||
long-lived forward stream — every window is an independent 206.
|
||||
*Why not (default):* turns one connection into many short Range requests (more proxy hops through
|
||||
`DeepDrftPublic`, more server-side `WavOffsetService`-style header synthesis, more places a fetch can
|
||||
fail mid-stream — worsening the §1.6 error surface), and the byte↔time segment math must be exact at
|
||||
every boundary. It *is* the cleaner model for true random-access (and the better base if seeking-heavy
|
||||
usage dominates), so keep it as the fallback if Direction A's back-pressure proves leaky in practice.
|
||||
Borrowed prior art: HLS/DASH segment windows and the MSE `SourceBuffer.remove()` eviction model — this
|
||||
is how every production HTML5 adaptive player bounds memory. We are doing the hand-rolled equivalent
|
||||
because the stack is a bespoke Web Audio graph, not `<media>` + MSE.
|
||||
|
||||
**Direction C — Adopt MediaSource Extensions (MSE) and let the browser manage the buffer.**
|
||||
Stop hand-rolling the decode→schedule graph for long tracks; feed the Range stream into a `SourceBuffer`
|
||||
and let the browser evict via its built-in quota + `remove()`. Memory management becomes the platform's
|
||||
problem.
|
||||
*Why not (now, but flag for Daniel):* MSE does not accept raw WAV/PCM — it wants containerized formats
|
||||
(fragmented MP4/WebM, or MP3/AAC elementary streams). The current producer is WAV-only, and the entire
|
||||
bespoke visualizer/spectrum graph is wired to the Web Audio `AudioContext`, not a `<media>` element.
|
||||
Adopting MSE is a **rewrite of the playback substrate**, not a windowing change — out of scope for this
|
||||
phase. But it is the *real* long-term answer and is entangled with Phase 1.2 (non-WAV formats): if
|
||||
DeepDrft moves to a compressed delivery format, MSE becomes viable and could retire the hand-rolled
|
||||
decoder, the seek-beyond-buffer path, *and* this phase's window machinery in one move. **Surfaced as
|
||||
open question OQ5** — not to decide now, but so this phase is built knowing it may be superseded.
|
||||
|
||||
### 3.3 Recommended direction: A, with B held as the documented fallback
|
||||
|
||||
Direction A is the smallest coherent change that hits the headline (bounded memory under a 1 GB stream)
|
||||
while honoring C1–C7. It keeps the live forward stream, reuses the seek-beyond-buffer path for the only
|
||||
genuinely random-access case (seek-back past the retained tail), and isolates the two new mechanisms.
|
||||
**The final architecture and the exact eviction/back-pressure API are staff-engineer's call at
|
||||
implementation** (per file convention); this spec fixes the *shape* and the invariants, not the method
|
||||
signatures.
|
||||
|
||||
### 3.4 SOLID / road-not-taken rationale
|
||||
|
||||
- **SRP, preserved.** Eviction is a `PlaybackScheduler` concern (it already owns buffer storage); refill
|
||||
orchestration is a player-service/`StreamDecoder` concern (they already own the fetch loop); byte↔time
|
||||
math stays in `IFormatDecoder`. No responsibility crosses a boundary it does not already own.
|
||||
- **OCP, via C3/C5.** Windowing added in the format-agnostic layer means wiring MP3/FLAC later changes
|
||||
zero window code. The window expresses refill through `calculateByteOffset` — the one seam the
|
||||
decoders already implement.
|
||||
- **The seam stays single-writer (C6).** Every new refetch routes through the existing
|
||||
cancellation/drain discipline, so "only one loop touches the JS decoder" remains true. This is the
|
||||
rule most likely to be violated by a naive implementation and is called out as a hard invariant.
|
||||
- **Road not taken — eager full decode with a memory cap that just stops decoding.** Tempting (decode
|
||||
until you hit a byte budget, then stop) but it breaks playback of long tracks past the cap entirely —
|
||||
it bounds memory by *refusing to play the rest*, not by sliding. Rejected: it is a degradation, not a
|
||||
feature.
|
||||
|
||||
---
|
||||
|
||||
## 4. Use cases
|
||||
|
||||
- **UC1 — Play a 1 GB+ DJ MIX start to finish (the headline).** Memory stays bounded throughout; the
|
||||
listener experiences continuous playback identical to a short track.
|
||||
- **UC2 — Seek forward within a long track.** Already handled by seek-beyond-buffer; under windowing the
|
||||
forward seek clears the window and refills at the target — no behavior change, now with eviction so the
|
||||
pre-seek region does not linger.
|
||||
- **UC3 — Seek back a few seconds.** Served from the back-retain window with **no** network refetch
|
||||
(the reason `back` exists).
|
||||
- **UC4 — Seek back far, past the evicted tail.** Falls through to the existing seek-beyond-buffer Range
|
||||
fetch, run toward an earlier offset. (Open question OQ2 — see §6.)
|
||||
- **UC5 — Pause a long track for a long time.** Memory stays at the bounded window size while paused (no
|
||||
continued decode). On resume, forward fill restarts from the low-water trigger.
|
||||
- **UC6 — Mix detail page with the lava visualizer running.** Visualizer reads its preprocessed datum
|
||||
(C7); windowing is invisible to it. Confirmed non-interaction.
|
||||
|
||||
---
|
||||
|
||||
## 5. Interaction with the deferred Phase 1 streaming features
|
||||
|
||||
This phase touches the **same decoder/scheduler seam** as the deferred Phase 1.3/1.4/1.5 items and the
|
||||
1.6/1.7 robustness items. The interactions, explicitly:
|
||||
|
||||
- **1.3 Preload / prefetch (deferred; preload half).** *Shares machinery, does not conflict — and should
|
||||
be sequenced after.* Preload stages the **next track** into a second decoder instance during the
|
||||
current track's tail; windowing bounds the **current track's** forward buffer. They are orthogonal
|
||||
axes (next-track vs. current-track-window), but they compound the memory question: a naive preload of a
|
||||
second 1 GB mix would reintroduce the OOM this phase fixes. **Recommendation: land windowing first**,
|
||||
so that when preload arrives, the staged next-track decoder is *also* windowed by construction (it
|
||||
inherits the bounded scheduler). Windowing makes preload *safe for long tracks*; without it, preload of
|
||||
mixes is a memory hazard.
|
||||
- **1.4 Crossfade (deferred).** Needs two simultaneous `PlaybackScheduler` instances briefly overlapping.
|
||||
Both would be windowed instances — the overlap doubles the *window* size momentarily, not the whole
|
||||
track. Windowing makes crossfade between two long mixes affordable. No reordering needed; 1.4 still
|
||||
gates on 1.3.
|
||||
- **1.5 Gapless (deferred).** Sample-accurate hand-off of the next track's first buffer at the current
|
||||
track's last buffer. Windowing changes *which* buffers are retained but not the hand-off mechanism;
|
||||
the only care point is that the current track's **final** window must not be evicted before the gapless
|
||||
boundary is scheduled. A minor invariant for whoever builds 1.5, not a blocker. Note 1.5's existing
|
||||
WAV-only caveat stands.
|
||||
- **1.6 Track-skip on error (deferred).** *Windowing enlarges the error surface — call this out.* Today
|
||||
a fetch failure happens at load (one fetch) or at a user seek (one fetch). Windowed refill issues
|
||||
**mid-stream** fetches the listener did not initiate; one of those can fail at byte 700 M of a 1 GB
|
||||
mix. So Phase 21 should ship with at least the *cheap* half of 1.6: a mid-stream refill failure must
|
||||
**surface a clear error and not wedge the player** (it must not leave playback "running" with a starved
|
||||
scheduler — mirror the `playFromPosition` end-of-buffer recovery already in `PlaybackScheduler`). The
|
||||
rich half (byte-scan to next valid frame) stays deferred. **Recommendation: fold the minimal refill-
|
||||
failure handling into Phase 21's acceptance criteria** (AC6) rather than leaving it entirely to 1.6 —
|
||||
it is created by this phase.
|
||||
- **1.7 Safari compatibility (deferred).** Windowing adds no new Safari-specific surface beyond what the
|
||||
streaming path already has. The one adjacency: more frequent `AudioContext` activity during refill
|
||||
should be checked against the older-Safari `webkitAudioContext` quirks when 1.7 is addressed — note it,
|
||||
do not block on it.
|
||||
|
||||
---
|
||||
|
||||
## 6. Open questions for Daniel (genuine product decisions, not implementation detail)
|
||||
|
||||
These are policy calls with user-visible or resource trade-offs — flagged rather than decided here.
|
||||
|
||||
- **OQ1 — Window size policy.** What bounds the window — a **fixed byte/time budget** (e.g. "hold at
|
||||
most ~30 s decoded ahead + ~10 s behind"), or a **configurable memory budget** (e.g. "≤ N MB of
|
||||
decoded PCM") that derives the time window from the stream's byte rate? Recommend a **time-based
|
||||
forward window + small time-based back-retain** as the primary knob (intuitive, format-portable), with
|
||||
a hard **memory ceiling** as a secondary guard. The exact numbers are tunable post-landing; Daniel
|
||||
picks the *policy axis*. `[Daniel decision]`
|
||||
- **OQ2 — Seek-back past the evicted window.** When the listener seeks back earlier than the retained
|
||||
tail, we must refetch (the audio is gone). Acceptable to take the same brief re-buffer the forward
|
||||
seek-beyond-buffer takes today? (Recommend yes — it is the symmetric case and listeners already accept
|
||||
it forward.) Or should back-retain be generous enough that this is rare? `[Daniel decision]`
|
||||
- **OQ3 — Configurable total in-flight memory cap.** Should there be a single hard byte ceiling on total
|
||||
decoded audio held by the player (a safety net independent of the window-size policy), exposed as a
|
||||
config value? Recommend **yes, as a guard rail** even if the window policy is time-based — it is the
|
||||
backstop that makes "1 GB stream never OOMs" a guarantee rather than a tuning hope. `[Daniel
|
||||
decision]`
|
||||
- **OQ4 — Apply windowing to all tracks, or only long ones?** A 3-minute Cut decoded whole is ~30–60 MB
|
||||
— harmless today. Windowing everything is simpler (one code path) but adds refill machinery to short
|
||||
tracks that never needed it. Recommend **window everything** (one path, C6-safe, and short tracks
|
||||
simply never hit a refill because they fit inside the forward window) — but Daniel may prefer a
|
||||
size threshold. `[Daniel decision]`
|
||||
- **OQ5 — Is MSE (Direction C) the real destination?** Not for this phase, but it bears on how much to
|
||||
invest here. If DeepDrft will move to compressed delivery (Phase 1.2) and MSE within ~a year, Phase 21
|
||||
should be the *minimal* Direction-A change (don't gold-plate machinery MSE would retire). If WAV +
|
||||
bespoke graph is the long-term commitment, a more thorough windowing investment is justified.
|
||||
`[Daniel steer — informs scope, not a blocker]`
|
||||
|
||||
---
|
||||
|
||||
## 7. Acceptance criteria
|
||||
|
||||
- **AC1 (headline) — Bounded memory under a 1 GB stream.** Playing a 1 GB+ WAV mix start to finish, the
|
||||
browser tab's retained decoded-audio memory stays bounded to the configured window (not growing toward
|
||||
~2 GB). Verifiable via browser memory tooling: peak decoded-audio footprint is independent of track
|
||||
length and tracks the window-size policy, not the file size.
|
||||
- **AC2 — Playback-start latency at parity (C2).** First-audio latency for a track is unchanged from
|
||||
pre-windowing (within noise). Windowing does not introduce a fetch-then-play stall.
|
||||
- **AC3 — Continuous playback, no starvation.** A long mix plays edge to edge with no audible gaps,
|
||||
underruns, or stalls under normal network conditions — the forward fill stays ahead of the playhead.
|
||||
- **AC4 — Seek-back within the window is instant (UC3).** A short backward seek into retained audio
|
||||
produces no network request.
|
||||
- **AC5 — Seek (forward, and back past the window) still works (UC2/UC4).** Both resolve via the
|
||||
existing Range path with the same behavior the listener sees today; the pre-seek region is evicted, not
|
||||
retained.
|
||||
- **AC6 — A mid-stream refill failure degrades cleanly (the 1.6 adjacency).** A failed refill fetch
|
||||
surfaces a clear user-visible error and leaves the player in a recoverable state (not a wedged
|
||||
"playing" with a starved scheduler). It must not silently hang.
|
||||
- **AC7 — The Mix visualizer is unaffected (C7).** With the lava visualizer running on a long mix, the
|
||||
visualizer renders identically (it reads the preprocessed datum, never the evicted buffers).
|
||||
- **AC8 — Single-decoder concurrency invariant holds (C6).** Under rapid seek + refill activity, no
|
||||
interleaved `ProcessStreamingChunk` calls corrupt the single JS decoder (the existing drain/cancel
|
||||
discipline still governs every fetch).
|
||||
|
||||
---
|
||||
|
||||
## 8. Wave decomposition
|
||||
|
||||
Dependency shape: `21.1 → 21.2 → 21.3`, with `21.4` validating the whole. 21.1 is the cold-start
|
||||
prerequisite and the load-bearing change; the rest layer on it.
|
||||
|
||||
- **21.1 — Partial eviction in `PlaybackScheduler` (cold-start; the load-bearing change).** Give the
|
||||
scheduler the ability to drop already-played buffers and keep its position/index bookkeeping correct
|
||||
against a buffer array that no longer begins at absolute time 0 (today `getCurrentPosition`,
|
||||
`playFromPosition`, and the scheduling loop all assume `buffers[0]` is the track start). This is the
|
||||
hardest correctness work in the phase — the time-anchor math must stay exact through eviction. No
|
||||
refill yet; with eviction alone and the forward read loop unchanged, this is provably memory-bounded
|
||||
for the *played* region. **Independent of the §6 open questions** — it can begin immediately; the
|
||||
window *sizes* (OQ1/OQ3) are parameters fed in later. Settled and cold-start.
|
||||
- **21.2 — Back-pressure on the forward read loop (the bound on the *unplayed* region).** Make the C#
|
||||
`StreamAudioWithEarlyPlayback` loop stop calling `ReadAsync` when forward decoded lookahead exceeds the
|
||||
high-water mark, and resume below low-water. Together with 21.1, this bounds *both* the played and
|
||||
unplayed sides — the full memory guarantee (AC1). Must route resume/pause through the existing
|
||||
cancellation-safe single-loop discipline (C6). **Depends on 21.1** (eviction must exist so the drained
|
||||
region is reclaimed, not merely un-read).
|
||||
- **21.3 — Seek-back-past-window refill (close the random-access case).** Wire UC4 — when a backward
|
||||
seek lands earlier than the retained tail, refetch via the existing seek-beyond-buffer Range path
|
||||
pointed at the earlier offset, and the minimal AC6 refill-failure handling. Mostly **reuse** of the
|
||||
landed seek path; the new work is the trigger (window-miss detection) and the clean-failure path.
|
||||
**Depends on 21.1 + 21.2** (needs the window boundaries they define).
|
||||
- **21.4 — Validation pass against the 1 GB target (acceptance).** Exercise AC1–AC8 against a real 1 GB+
|
||||
mix: memory profiling (AC1), latency parity (AC2), edge-to-edge playback (AC3), the seek matrix
|
||||
(AC4/AC5), induced refill failure (AC6), visualizer-running (AC7), and rapid-seek concurrency (AC8).
|
||||
Largely test/measurement; any break is likely a tuning fix in the 21.1 anchor math or the 21.2
|
||||
water-marks. **Depends on 21.1–21.3.**
|
||||
|
||||
---
|
||||
|
||||
## 9. Cross-references (read before implementing)
|
||||
|
||||
- Root `CLAUDE.md` "Streaming-first audio playback" / `CONTEXT.md §3.5` — the seam this phase modifies;
|
||||
the §2 invariants here restate its contract. Both flag it as the most load-bearing path.
|
||||
- `PLAN.md` Phase 4 (landed) / `COMPLETED.md` — the HTTP Range `bytes=X-` primitive this generalizes.
|
||||
- `PLAN.md` Phase 1.3 / 1.4 / 1.5 / 1.6 / 1.7 — the deferred decoder/scheduler-seam features; §5 above
|
||||
reconciles each.
|
||||
- `PLAN.md` Phase 9 — defines the `Mix` medium (single long track), the canonical 1 GB case.
|
||||
- `PLAN.md` Phase 10 / `product-notes/phase-10-mix-visualizer-lava-reframe.md` /
|
||||
`product-notes/phase-12-waveform-visualizer-generalization.md` — establishes the preprocessed
|
||||
per-track high-res waveform datum; the basis for C7 (visualizer does not read live PCM).
|
||||
- `DeepDrftPublic/Interop/audio/PlaybackScheduler.ts` — owns the unbounded `buffers: AudioBuffer[]`;
|
||||
21.1 lives here.
|
||||
- `DeepDrftPublic/Interop/audio/StreamDecoder.ts` — `reinitializeForRangeContinuation`,
|
||||
`calculateByteOffset`; the refill substrate.
|
||||
- `DeepDrftPublic.Client/Services/StreamingAudioPlayerService.cs` — the C# forward read loop
|
||||
(`StreamAudioWithEarlyPlayback`), the seek-beyond-buffer path (`SeekBeyondBuffer`), and the
|
||||
cancellation/drain discipline (C6); 21.2/21.3 live here.
|
||||
- `DeepDrftPublic.Client/Clients/TrackMediaClient.cs` — the Range-capable media fetch reused by refill.
|
||||
Reference in New Issue
Block a user