docs(plan): add Phase 18 Opus low-data streaming; resolve Phase 21 OQ5 (no MSE)

This commit is contained in:
daniel-c-harvey
2026-06-23 04:58:21 -04:00
parent a84a99c309
commit 1bdaeaa164
3 changed files with 610 additions and 29 deletions
@@ -8,6 +8,16 @@ server touch is **reuse, not new surface**: the existing `DeepDrftAPI` HTTP `Ran
partial-content primitive (Phase 4, landed) is the load-bearing dependency; this phase adds no new API
endpoint.
> **Sequencing dependency (Daniel, 2026-06-23): Phase 18 (Opus Low-Data Streaming) comes BEFORE this
> phase.** Format support — specifically the derived **Ogg Opus fullband 320** low-data delivery path
> (`product-notes/phase-18-opus-low-data-streaming.md`) — is a prerequisite that sequences ahead of
> windowing. Phase 21's windowing must work across **both** delivery formats (lossless WAV and Opus).
> Its C5 invariant below already anticipated this ("must not foreclose MP3/FLAC"); **Opus is now the
> concrete VBR/containerized driver of C5.** Windowing an Opus stream uses the decoder's *approximate*
> byte↔time mapping (`OpusFormatDecoder.calculateByteOffset` — Ogg-page interpolation), exactly the C5
> case — not the exact CBR-WAV `byteRate` math. Build the window machinery format-agnostically
> (§2 C3/C5) so it inherits Opus for free.
---
## 1. Goal
@@ -45,19 +55,25 @@ docs. This phase **modifies that seam** — so the contract it must preserve is
- **C2 — Playback start latency unchanged.** Today playback starts as soon as a configurable minimum
buffer count is queued (header-derived duration, not full-file). The window model must keep first-audio
latency at parity — bounding memory must not reintroduce a fetch-then-play stall.
- **C3 — The format-decoder abstraction is untouched.** `IFormatDecoder` (WAV active; MP3/FLAC
implemented, not yet wired) owns all format-specific byte math. Windowing lives in the
- **C3 — The format-decoder abstraction is untouched.** `IFormatDecoder` owns all format-specific
byte math; `AudioPlayer.createFormatDecoder` already dispatches on `Content-Type` (WAV/MP3/FLAC
decoders all wired today — verified 2026-06-23; an `OpusFormatDecoder` joins them in Phase 18).
Windowing lives in the
**format-agnostic** layer (`PlaybackScheduler` eviction + `StreamDecoder`/player refill
orchestration); it must add **no** format-specific branches. A future wired MP3/FLAC decoder inherits
windowing for free.
- **C4 — Read-only playback only.** This is a memory-management change, not a UX change. No new
user-visible control, no change to seek/transport semantics beyond what the listener already
experiences. Seek must still feel identical.
- **C5 — WAV-only is the shipping target; the design must not foreclose MP3/FLAC.** Byte↔time mapping
for refill is exact and cheap for WAV (CBR: `byteRate` from the header). For VBR formats the mapping is
approximate (the decoders already carry TOC/SEEKTABLE seek math). The window machinery must express
refill in terms of the decoder's existing `calculateByteOffset`, so the same code works when those
formats are wired — **no WAV-special-cased offset math in the window layer.**
- **C5 — Must window both delivery formats (WAV lossless AND Opus low-data).** Byte↔time mapping for
refill is exact and cheap for WAV (CBR: `byteRate` from the header). For VBR/containerized formats it
is approximate (the decoders carry TOC/SEEKTABLE/Ogg-page seek math). **Phase 18 (Opus) is sequenced
before this phase and is the concrete driver here:** an Ogg Opus 320 stream is VBR and page-paged, so
its `calculateByteOffset` is an *approximate* page-interpolation, not exact-offset. The window
machinery must express refill purely in terms of the decoder's existing `calculateByteOffset`, so the
same code windows WAV exactly and Opus approximately — **no WAV-special-cased offset math in the
window layer.** (MP3/FLAC decoders are already wired in the registry too — the registry dispatches on
content-type today; an `OpusFormatDecoder` joins them in Phase 18.)
- **C6 — No regression to the single-instance JS decoder concurrency guarantees.** The current code is
careful that only one streaming loop touches the single JS `StreamDecoder` at a time
(`DrainActiveStreamingTaskAsync`, the `_streamingCancellation` identity dance). Windowed refill
@@ -146,14 +162,15 @@ because the stack is a bespoke Web Audio graph, not `<media>` + MSE.
Stop hand-rolling the decode→schedule graph for long tracks; feed the Range stream into a `SourceBuffer`
and let the browser evict via its built-in quota + `remove()`. Memory management becomes the platform's
problem.
*Why not (now, but flag for Daniel):* MSE does not accept raw WAV/PCM — it wants containerized formats
(fragmented MP4/WebM, or MP3/AAC elementary streams). The current producer is WAV-only, and the entire
bespoke visualizer/spectrum graph is wired to the Web Audio `AudioContext`, not a `<media>` element.
Adopting MSE is a **rewrite of the playback substrate**, not a windowing change — out of scope for this
phase. But it is the *real* long-term answer and is entangled with Phase 1.2 (non-WAV formats): if
DeepDrft moves to a compressed delivery format, MSE becomes viable and could retire the hand-rolled
decoder, the seek-beyond-buffer path, *and* this phase's window machinery in one move. **Surfaced as
open question OQ5** — not to decide now, but so this phase is built knowing it may be superseded.
*Why not — RESOLVED, rejected (Daniel, 2026-06-23; see OQ5):* MSE does not accept raw WAV/PCM — it
wants containerized formats (fragmented MP4/WebM, or MP3/AAC elementary streams). The entire bespoke
visualizer/spectrum graph is wired to the Web Audio `AudioContext`, not a `<media>` element. Adopting
MSE is a **rewrite of the playback substrate**, not a windowing change. It *looked* like the real
long-term answer once compressed delivery arrived — but Daniel has decided compressed delivery
(**Phase 18 Opus**) will feed the **same bespoke graph** via the `IFormatDecoder` seam, so the
compressed-delivery move that would have justified MSE happens *without* surrendering the graph. **The
bespoke graph is a deliberate long-term commitment; MSE is rejected.** Direction A is therefore the
permanent destination, not a stopgap that MSE will retire. Recorded as considered-and-declined.
### 3.3 Recommended direction: A, with B held as the documented fallback
@@ -262,11 +279,17 @@ These are policy calls with user-visible or resource trade-offs — flagged rath
tracks that never needed it. Recommend **window everything** (one path, C6-safe, and short tracks
simply never hit a refill because they fit inside the forward window) — but Daniel may prefer a
size threshold. `[Daniel decision]`
- **OQ5 — Is MSE (Direction C) the real destination?** Not for this phase, but it bears on how much to
invest here. If DeepDrft will move to compressed delivery (Phase 1.2) and MSE within ~a year, Phase 21
should be the *minimal* Direction-A change (don't gold-plate machinery MSE would retire). If WAV +
bespoke graph is the long-term commitment, a more thorough windowing investment is justified.
`[Daniel steer — informs scope, not a blocker]`
- **OQ5 — Is MSE (Direction C) the real destination? — RESOLVED: NO (Daniel, 2026-06-23).** **Do not
adopt MSE. The bespoke Web Audio decode→schedule graph stays — it is bespoke by deliberate choice, a
long-term commitment, not a stopgap.** Daniel's rationale: the player is intentionally a custom
graph, not an HTML `<media>` element; the compressed-delivery move that *would* have made MSE
tempting is being met instead by **Phase 18 (Opus low-data path)** feeding the **same bespoke graph**
through the `IFormatDecoder` seam — so compressed delivery arrives *without* surrendering the graph.
Consequence for this phase: Direction A (the hand-rolled sliding window) is the destination, not a
placeholder; invest in it as permanent machinery. It will window both the WAV and the Opus path
(the sequencing note at the top). Direction C is recorded as **considered and declined** per file
convention; kept visible so a future reader sees the road not taken and why.
`[RESOLVED — bespoke graph retained; MSE rejected]`
---