docs(plan): add Phase 18 Opus low-data streaming; resolve Phase 21 OQ5 (no MSE)
This commit is contained in:
@@ -0,0 +1,462 @@
|
||||
# Phase 18 — Opus Low-Data Streaming (dual-format lossless + Opus delivery)
|
||||
|
||||
Product spec. Status: **design / framing — implementation-ready pending Daniel's open-question calls.**
|
||||
Author: product-designer. Date: 2026-06-23. **No code has been written by this doc.**
|
||||
|
||||
This phase is the concrete realization of the long-deferred **"Non-WAV formats"** intent
|
||||
(`CONTEXT.md §5`, the "1.2" the streaming-feature items reference). It supersedes the abstract "a
|
||||
processor per format + a decoder strategy" framing with a specific, Daniel-directed product: **two
|
||||
delivery formats per track — the existing lossless WAV path and a new low-data Ogg Opus path — so the
|
||||
listener gets a choice, with Opus the bandwidth-friendly default-candidate.**
|
||||
|
||||
Surfaces (named precisely):
|
||||
|
||||
- **Ingest / preprocessing:** `DeepDrftContent` (`AudioProcessor` / `AudioProcessorRouter` /
|
||||
`TrackContentService` / `WaveformProfileService`) + `DeepDrftAPI` (upload/persist —
|
||||
`UnifiedTrackService.UploadAsync`, replace-audio) + `DeepDrftManager` (CMS upload form, only if a
|
||||
per-upload control is wanted — see OQ4).
|
||||
- **Delivery / decode:** `DeepDrftAPI` (the track stream endpoint + `Range` handler) +
|
||||
`DeepDrftPublic` proxy (`TrackProxyController`) + `DeepDrftPublic.Client` player stack
|
||||
(`StreamingAudioPlayerService`, `TrackMediaClient`) + `DeepDrftPublic/Interop/audio` TS decoders
|
||||
(`AudioPlayer.createFormatDecoder` registry, a new `OpusFormatDecoder`).
|
||||
|
||||
**Sequencing headline: Phase 18 comes BEFORE Phase 21 (Windowed Streaming Buffer).** Phase 21's
|
||||
windowing must work across both formats — its C5 invariant already anticipated this ("must not
|
||||
foreclose MP3/FLAC"); Opus is now the concrete VBR/containerized driver of that invariant. See §6 and
|
||||
the Phase 21 cross-reference.
|
||||
|
||||
---
|
||||
|
||||
## 0. State of the world (what already exists — verified 2026-06-23)
|
||||
|
||||
This phase is **much further along than the "Non-WAV formats" backlog line implies**, on both sides.
|
||||
Two prior efforts already built most of the multi-format substrate; what is *missing* is specifically
|
||||
the **derived-Opus-artifact** idea, not generic format support.
|
||||
|
||||
**Producer side is already multi-format (router landed):**
|
||||
- `AudioProcessorRouter.ProcessAudioFileAsync(filePath)` routes by extension — `.wav` →
|
||||
`AudioProcessor`, `.mp3` → `Mp3AudioProcessor`, `.flac` → `FlacAudioProcessor`
|
||||
(`DeepDrftContent/CLAUDE.md`).
|
||||
- `TrackContentService.AddTrackAsync(filePath, mimeType)` is **format-agnostic**: it selects the
|
||||
processor, generates an entry GUID, and **stores the original bytes** with correct extension/MIME
|
||||
in the `tracks` vault.
|
||||
- So today the system can *ingest and store* WAV/MP3/FLAC. It **does not transcode** — it keeps the
|
||||
original. There is no derived artifact and no second format per track.
|
||||
|
||||
**Decoder side is a wired strategy registry (not "implemented-not-wired" anymore):**
|
||||
- `AudioPlayer.createFormatDecoder(contentType)` (`AudioPlayer.ts:117`) dispatches on `Content-Type`:
|
||||
`audio/mpeg|audio/mp3` → `Mp3FormatDecoder`, `audio/flac|audio/x-flac` → `FlacFormatDecoder`,
|
||||
default → `WavFormatDecoder`. All three decoders exist and implement `IFormatDecoder`.
|
||||
- `IFormatDecoder` (`IFormatDecoder.ts`) is a clean per-format strategy: `tryParseHeader`,
|
||||
`getAlignedSegmentSize`, `wrapSegment`, `calculateByteOffset`, plus a `FormatInfo` carrying
|
||||
`byteRate`, `blockAlign`, `audioDataOffset`, and a `seekData` accelerator slot (already polymorphic:
|
||||
`Mp3VbrSeekData | FlacSeekData`). **This is the seam an `OpusFormatDecoder` slots into.**
|
||||
- **Correction to the Phase 21 spec's §2 C3 note** ("MP3/FLAC implemented, not yet wired"): the
|
||||
registry *is* wired and dispatches on content-type today. Phase 21's invariant still holds; the
|
||||
parenthetical is stale and is corrected by this phase's reconciliation.
|
||||
|
||||
**What this means for the gap.** Daniel's direction is **not** "add format support" — that substrate
|
||||
exists. It is "**derive a second, low-data artifact (Opus fullband 320) at ingest and let the listener
|
||||
pick which to stream.**" That is two genuinely new things: (1) a **transcode-at-ingest** step that
|
||||
produces a derived artifact per track (the router stores originals; nothing derives Opus), and (2) a
|
||||
**per-format delivery selection** so the same track can be served as either WAV or Opus on request.
|
||||
|
||||
---
|
||||
|
||||
## 1. Goal
|
||||
|
||||
**Dual-format delivery.** Every track is streamable in two formats:
|
||||
|
||||
- **Lossless** — the existing WAV path, unchanged. The archival / audiophile option.
|
||||
- **Low-data** — a derived **Ogg Opus, fullband, 320 kbps** artifact. The bandwidth-friendly
|
||||
default-candidate.
|
||||
|
||||
The listener chooses; Opus is the recommended default. The bespoke Web Audio decode→schedule graph is
|
||||
**retained by deliberate choice** (Daniel) — Opus is fed through the same `IFormatDecoder` strategy
|
||||
seam, not through an HTML `<media>` element or MSE.
|
||||
|
||||
**Why Opus fullband 320.** Opus is the modern, royalty-free, best-in-class lossy codec; "fullband"
|
||||
(48 kHz, full 20 kHz audio bandwidth) at 320 kbps is transparent-to-most-listeners quality at roughly
|
||||
**1/4 to 1/5 the bytes of 16-bit/44.1 stereo WAV** (~1411 kbps). For a 1 GB DJ MIX (Phase 9 `Mix`
|
||||
medium), that is the difference between a ~1 GB transfer and a ~220 MB transfer — the headline
|
||||
low-data win, and directly relevant to the Phase 21 long-stream case.
|
||||
|
||||
**Non-goals.** This phase does not retire WAV (it stays as the lossless option), does not change the
|
||||
bespoke graph for MSE (explicitly rejected — see §2 / Phase 21 OQ5), and does not add new transport
|
||||
mechanisms beyond the existing stream + `Range` primitive.
|
||||
|
||||
---
|
||||
|
||||
## 2. Constraints / invariants (the contract that must hold)
|
||||
|
||||
- **C1 — Keep the bespoke Web Audio graph. MSE is rejected (Daniel, deliberate).** The custom
|
||||
decode→schedule graph is a long-term commitment, not a stopgap. Opus is fed through the existing
|
||||
`IFormatDecoder` → `StreamDecoder` → `PlaybackScheduler` pipeline. (This is the same decision
|
||||
recorded as **Phase 21 OQ5 = NO**; the two phases share it.)
|
||||
- **C2 — Preprocessing is additive; the WAV path is untouched.** The Opus artifact is a **second
|
||||
derived artifact per track**, not a replacement. The existing WAV in the `tracks` vault stays
|
||||
byte-for-byte as it is today; the lossless stream path is unchanged. A track with no Opus artifact
|
||||
(legacy rows, or a transcode that hasn't run yet) must still play losslessly — Opus is strictly
|
||||
additive.
|
||||
- **C3 — Reuse the landed `Range`/offset seek path; do not fork it.** Phase 4's
|
||||
`Range: bytes=X-` → `206` primitive (client `TrackMediaClient` → `DeepDrftPublic` proxy →
|
||||
`DeepDrftAPI`) is the substrate for Opus seek too. Opus seek math differs from WAV (VBR /
|
||||
container-paged, see §3.4) but it is expressed through the **same** `IFormatDecoder.calculateByteOffset`
|
||||
seam the MP3/FLAC decoders already use — no second seek mechanism.
|
||||
- **C4 — Opus slots the `IFormatDecoder` registry; no format branches leak elsewhere.** The new
|
||||
`OpusFormatDecoder` is selected by `AudioPlayer.createFormatDecoder` on `Content-Type:
|
||||
audio/ogg`/`audio/opus`. The rest of the player stack stays format-agnostic. No `if (opus)` outside
|
||||
the decoder and the one selection point.
|
||||
- **C5 — Format selection is a delivery-time decision, resolved server-side from a listener
|
||||
signal.** The same `TrackEntity` / `EntryKey` addresses both artifacts; the *format* is a parameter
|
||||
on the stream request (query param or `Accept` negotiation — see §3.3), not a different track id and
|
||||
not a different vault entry key. One track, two renderings (the standing "one source, multiple
|
||||
views" preference applied to delivery).
|
||||
- **C6 — Transcode failure must not block ingest.** If the Opus transcode fails or is slow, the
|
||||
track still persists with its lossless artifact and is playable. Opus is generated best-effort and
|
||||
can be (re)generated later — mirror the **waveform-datum** model (`WaveformProfileService`: compute
|
||||
on upload, regenerate on demand via a CMS action), which is exactly the "derived artifact, generated
|
||||
at ingest, regenerable" pattern this needs.
|
||||
- **C7 — The vault model holds: derived artifact is a new entry, not a mutation.** The Opus bytes
|
||||
live in the FileDatabase under the track's `EntryKey` — either in the existing `tracks` vault under
|
||||
a derived key, or in a new sibling vault (see §3.2 options). Either way it is `AudioBinary` with the
|
||||
`.opus`/`.ogg` extension and correct MIME, registered like any other vault resource.
|
||||
|
||||
---
|
||||
|
||||
## 3. Architectural shape
|
||||
|
||||
### 3.0 The mental model
|
||||
|
||||
A track has one **source artifact** (the uploaded WAV/MP3/FLAC, stored as-is today) and gains one
|
||||
**derived low-data artifact** (Ogg Opus fullband 320, produced at ingest). The stream endpoint serves
|
||||
*either*, selected per request. The player picks a decoder by the response `Content-Type` exactly as
|
||||
it does today. Seeking uses the same `Range` primitive; the byte↔time math is the decoder's job.
|
||||
|
||||
```
|
||||
INGEST (DeepDrftContent + DeepDrftAPI)
|
||||
upload → AudioProcessorRouter (existing) → store SOURCE artifact in vault [unchanged]
|
||||
→ TRANSCODE to Opus 320 → store DERIVED artifact [NEW]
|
||||
→ WaveformProfileService (existing, unchanged)
|
||||
|
||||
DELIVERY (DeepDrftAPI → DeepDrftPublic proxy → DeepDrftPublic.Client → Interop/audio)
|
||||
GET api/track/{id}?format=opus|lossless → serve the chosen artifact's bytes (+ Range) [NEW param]
|
||||
player: createFormatDecoder(Content-Type) → OpusFormatDecoder | Wav | Mp3 | Flac [+1 decoder]
|
||||
```
|
||||
|
||||
### 3.1 Where the transcode lives (relative to existing processing)
|
||||
|
||||
The transcode is a **new processor sibling** to the existing format processors, invoked **after** the
|
||||
source is stored, in the same orchestration that already calls `WaveformProfileService`:
|
||||
|
||||
- It belongs in `DeepDrftContent` (the binary-content domain library) as e.g. an
|
||||
`OpusTranscodeService` / `OpusProcessor`, **not** in a host and **not** in a controller (per the
|
||||
`*.Services`-owns-domain-logic convention).
|
||||
- It is invoked from `UnifiedTrackService.UploadAsync` (the same place `WaveformProfileService`
|
||||
computes the high-res datum on every new track) and from the **replace-audio** path (which already
|
||||
regenerates both waveform datums — Opus is the third derived thing to regenerate there).
|
||||
- Like the waveform datum, it gets a **regenerate trigger**: a CMS per-track / bulk action and an
|
||||
ApiKey-gated endpoint, so existing tracks can be backfilled. This mirrors the landed
|
||||
"Generate All Profiles / Backfill High-res" bulk actions on `Releases.razor` — **Backfill Opus**
|
||||
is the natural third bulk action.
|
||||
|
||||
**The transcode engine itself is staff-engineer's call** (FFmpeg/libopus via a process invocation, a
|
||||
managed binding, or a libopus P/Invoke). The spec fixes the *artifact* (Ogg Opus, fullband, 320 kbps)
|
||||
and the *seam* (a derived artifact produced post-store, regenerable, failure-tolerant), not the tool.
|
||||
Note a real operational constraint to flag for implementation: transcoding a 1 GB WAV is **CPU- and
|
||||
time-expensive** and must not block the upload response — it wants the same off-the-hot-path treatment
|
||||
the upload body staging already gets (`Upload:StagingPath`), likely a background/queued step. This is
|
||||
the single biggest implementation risk and is called out as such.
|
||||
|
||||
### 3.2 Where the Opus artifact is stored (two options)
|
||||
|
||||
**Option S1 — derived key in the existing `tracks` vault (recommended).** Store the Opus bytes under
|
||||
a derived entry key alongside the source, e.g. `{entryKey}` for source and `{entryKey}.opus` (or a
|
||||
parallel key convention) in the same `tracks` vault. *Pro:* no new vault type, co-located with the
|
||||
source, simplest lookup. *Con:* mixes two artifacts per logical track in one vault's index.
|
||||
|
||||
**Option S2 — a new sibling vault (e.g. `track-opus`).** Mirror the `track-waveforms` precedent
|
||||
(Phase 12 added a dedicated vault for the derived high-res datum). Opus bytes keyed by the same
|
||||
`EntryKey` in a `track-opus` vault. *Pro:* clean separation of source vs. derived, matches the
|
||||
established "derived artifacts get their own vault" pattern (`track-waveforms`), easy to enumerate /
|
||||
backfill / purge independently. *Con:* one more vault to register.
|
||||
|
||||
**Recommendation: S2** — it is the pattern the codebase already chose for the *other* derived
|
||||
per-track artifact (the high-res waveform datum), so it is the least surprising and keeps the source
|
||||
`tracks` vault meaning exactly one thing. **Final call is staff-engineer's**; both are viable.
|
||||
|
||||
### 3.3 How a listener's format choice reaches the bytes
|
||||
|
||||
The stream endpoint gains a **format selector**. Two candidate mechanisms:
|
||||
|
||||
- **D-a — explicit query param** `GET api/track/{id}?format=opus|lossless` (recommended). Mirrors the
|
||||
existing `offset` query param the proxy already forwards (`TrackProxyController`). Explicit,
|
||||
cache-friendly (distinct URLs), trivial to thread through the proxy, and the player already knows
|
||||
which it asked for. Server resolves the param → the right artifact → sets the right `Content-Type`,
|
||||
which the player's existing `createFormatDecoder` then dispatches on. **No new decoder-selection
|
||||
mechanism** — the response content-type does the work it already does.
|
||||
- **D-b — HTTP content negotiation** (`Accept: audio/ogg` vs `audio/wav`). More "correct" REST, but
|
||||
the proxy + WASM client wiring is fussier and caches are content-type-varied. Not worth it here.
|
||||
|
||||
**Recommended: D-a.** The selection *policy* (which format a given listener gets by default, and how
|
||||
they switch) is a genuine **product call — see OQ1/OQ2**, deliberately not decided here. The
|
||||
*mechanism* (a query param resolved server-side to an artifact + content-type) is settled.
|
||||
|
||||
Server-side fallback rule (C2): if `format=opus` is requested but no Opus artifact exists for that
|
||||
track (not yet transcoded / backfilled), the endpoint **falls back to lossless** rather than 404ing —
|
||||
Opus is additive, so its absence degrades to "you get the lossless one," never to "no audio."
|
||||
|
||||
### 3.4 The Opus decoder + seek math (the genuinely new decode work)
|
||||
|
||||
`OpusFormatDecoder implements IFormatDecoder` is the new code on the delivery side. Two things make it
|
||||
harder than the WAV decoder and need to be flagged:
|
||||
|
||||
- **Containerized, paged format — not raw-frame-sliceable.** WAV's `wrapSegment` prepends a 44-byte
|
||||
PCM header to any PCM-aligned byte run; the current model assumes you can wrap an arbitrary aligned
|
||||
raw-audio slice and hand it to `decodeAudioData`. **Ogg Opus is page-structured** (Ogg pages
|
||||
carrying Opus packets, plus mandatory `OpusHead`/`OpusTags` setup pages at the start). A mid-stream
|
||||
byte slice is not independently decodable without the setup header and without landing on Ogg page
|
||||
boundaries. So `OpusFormatDecoder`'s `getAlignedSegmentSize` must align to **Ogg page boundaries**
|
||||
(scan for the `OggS` capture pattern — analogous to FLAC's frame-sync scan, for which the
|
||||
`IFormatDecoder` interface already passes `rawData` to `getAlignedSegmentSize`), and
|
||||
`wrapSegment`/the continuation path must carry the `OpusHead` setup (analogous to FLAC's
|
||||
`streamInfoBytes` in `FlacSeekData`). **The `IFormatDecoder` abstraction already has the shape for
|
||||
this** — a format-specific `seekData` accelerator and a setup-bytes carry — because FLAC needed the
|
||||
same kind of thing. A new `OpusSeekData` variant joins `Mp3VbrSeekData | FlacSeekData`.
|
||||
- **VBR byte↔time mapping is approximate (the Phase 21 C5 case, concretely).** Opus at "320 kbps" is
|
||||
effectively VBR; there is no exact `byteRate` for offset math the way CBR WAV has. Seek-by-offset
|
||||
uses an **approximate** mapping (granule-position/Ogg-page interpolation, the Opus analogue of MP3's
|
||||
Xing TOC or FLAC's SEEKTABLE). `calculateByteOffset` returns a best-effort page-aligned offset; the
|
||||
decoder then re-syncs to the next Ogg page. This is exactly the "VBR formats: the mapping is
|
||||
approximate" case Phase 21's C5 invariant anticipated — **Opus is the format that makes that
|
||||
invariant load-bearing rather than hypothetical.**
|
||||
|
||||
**Browser decode-support constraint (real, must be designed around).** The bespoke graph decodes
|
||||
segments via `AudioContext.decodeAudioData`. Ogg-Opus support in `decodeAudioData` is long-standing in
|
||||
Chrome and Firefox but arrived in **Safari only at 18.4 (macOS 15.4 / iOS 18.4, March 2025)**; older
|
||||
Safari decodes Opus only in a CAF container, not Ogg. iOS Safari is a primary music-listening surface,
|
||||
so this is not a corner case. Implications: (1) the **lossless WAV path is the universal fallback** for
|
||||
listeners whose browser can't decode Ogg Opus — which C2's additive design already provides for free;
|
||||
(2) format-default policy (OQ2) should consider capability detection — don't hand Ogg Opus to a Safari
|
||||
that can't decode it. This intersects Phase 1.7 (Safari compatibility) and is flagged there too.
|
||||
([Browser support: caniuse / WebKit 18.4 release notes — see Sources.])
|
||||
|
||||
### 3.5 The three candidate directions (shape-level)
|
||||
|
||||
Per file convention the alternatives are recorded; the recommendation follows.
|
||||
|
||||
**Direction A — Derived Opus artifact at ingest + format param on delivery (recommended).** What §3.1
|
||||
–3.4 describe: transcode to Opus 320 post-store, store as a derived artifact (S2 vault), serve via a
|
||||
`?format=` param resolved server-side to bytes + content-type, decode via a new `OpusFormatDecoder` in
|
||||
the existing registry. *Why recommended:* additive (C2), reuses every existing seam (the processor
|
||||
orchestration, the waveform-datum derived-artifact pattern, the `Range` path, the decoder registry),
|
||||
and the only genuinely new code is one transcode step + one decoder. Two derived artifacts per track,
|
||||
both regenerable.
|
||||
|
||||
**Direction B — On-the-fly transcode at delivery (no stored Opus artifact).** Transcode WAV→Opus per
|
||||
request in the stream endpoint, streaming the Opus out as it encodes. *Why not (default):* moves
|
||||
expensive CPU onto the **hot request path** (a 1 GB mix transcoded per play is untenable), breaks
|
||||
`Range`/seek (you can't byte-offset into a stream you're encoding live), and defeats caching. It *is*
|
||||
storage-cheaper (no second artifact on disk), so it is the fallback only if disk cost ever dominates —
|
||||
but for a music site where the same tracks are played repeatedly, precompute-once wins decisively.
|
||||
Rejected as the primary.
|
||||
|
||||
**Direction C — Replace WAV ingest with Opus-only (transcode and discard the lossless source).** Make
|
||||
Opus *the* stored format; drop WAV. *Why not:* violates Daniel's explicit "lossless streaming
|
||||
*optional* — two delivery formats, listener gets a choice." Lossless is a kept option, not a thing to
|
||||
transcode away. Also irreversibly lossy at ingest (you can never recover the WAV). Rejected outright;
|
||||
recorded only because "just store Opus" is the tempting simplification and the spec should say why not.
|
||||
|
||||
### 3.6 SOLID / road-not-taken rationale
|
||||
|
||||
- **OCP, via the existing seams.** The transcode is a new processor sibling (the router pattern is
|
||||
already open for extension); the decoder is a new `IFormatDecoder` (the registry is already open for
|
||||
extension); the artifact is a new derived vault resource (the `track-waveforms` precedent is exactly
|
||||
this). Phase 18 adds **three new leaf implementations** and **zero changes to existing format code**
|
||||
— the strongest possible OCP signal that the seams were designed right.
|
||||
- **SRP, preserved.** Transcoding is a content-domain processor concern (`DeepDrftContent`); delivery
|
||||
selection is a thin endpoint concern (`DeepDrftAPI` resolves a param to an artifact); decode is the
|
||||
`OpusFormatDecoder`'s concern; byte↔time math stays inside that decoder via `calculateByteOffset`.
|
||||
No responsibility crosses a boundary it doesn't already own.
|
||||
- **DIP / "one source, multiple views."** One `TrackEntity`/`EntryKey` is the single source; "lossless
|
||||
WAV" and "low-data Opus" are two *views* (renderings) of it, diverging only at the delivery/decode
|
||||
layer — the same discipline the dark-mode and track-browse surfaces follow.
|
||||
- **Road not taken — a separate `TrackEntity` row (or a new track id) per format.** Tempting (one row
|
||||
= one streamable file) but it fractures the track identity: shares, queues, play-counts (Phase 16),
|
||||
release membership, and waveform data all key on one track, and doubling rows to carry a format
|
||||
would force every one of those surfaces to dedupe. Format is a *delivery attribute of one track*,
|
||||
not a *second track*. Rejected — keep one identity, two artifacts.
|
||||
|
||||
---
|
||||
|
||||
## 4. Format selection — the product surface (deliberately under-specified; see OQ1/OQ2)
|
||||
|
||||
Daniel has **not** specified the selection UX. What is settled by his direction: there are two formats,
|
||||
Opus is the bandwidth-friendly **default-candidate**, lossless is the kept option. What is open: how a
|
||||
listener expresses the choice, whether it is remembered, and whether the default is global or adapts.
|
||||
These are genuine product calls — see §6. The *mechanism* (a `?format=` param the player sends; §3.3)
|
||||
supports any of the policies, so the policy can be decided after the substrate lands.
|
||||
|
||||
---
|
||||
|
||||
## 5. Use cases
|
||||
|
||||
- **UC1 — Listener streams the low-data Opus of a long mix (the headline win).** A ~1 GB lossless mix
|
||||
transfers as ~220 MB of Opus; playback through the bespoke graph is identical in feel, far cheaper
|
||||
on bandwidth. (Compounds with Phase 21 windowing for the memory side.)
|
||||
- **UC2 — Listener prefers lossless and switches to it.** The same track served as WAV via
|
||||
`?format=lossless`; the bespoke graph decodes it exactly as today.
|
||||
- **UC3 — Legacy / not-yet-transcoded track.** `?format=opus` requested, no Opus artifact yet →
|
||||
server falls back to lossless (C2); the listener still hears the track. A later Backfill-Opus pass
|
||||
produces the artifact.
|
||||
- **UC4 — Admin backfills Opus for the existing catalogue.** A bulk "Backfill Opus" CMS action (the
|
||||
third sibling to the existing Generate-Profiles / Backfill-High-res actions) transcodes every track
|
||||
lacking an Opus artifact.
|
||||
- **UC5 — Replace-audio regenerates Opus.** The existing replace-audio path (which already regenerates
|
||||
both waveform datums and re-derives duration) also regenerates the Opus artifact from the new
|
||||
source.
|
||||
- **UC6 — Seek within an Opus stream.** Backward/forward seek resolves via the existing `Range` path;
|
||||
the offset is the `OpusFormatDecoder`'s approximate page-aligned mapping (§3.4), re-syncing to the
|
||||
next Ogg page — the VBR analogue of the WAV exact-offset seek.
|
||||
- **UC7 — Safari that can't decode Ogg Opus.** Capability-gated to the lossless path (§3.4), so the
|
||||
listener still plays audio. (Ties to OQ2 + Phase 1.7.)
|
||||
|
||||
---
|
||||
|
||||
## 6. Open questions for Daniel (genuine product decisions, not implementation detail)
|
||||
|
||||
- **OQ1 — Selection UX: how does a listener choose lossless vs. low-data?** Candidates: a global
|
||||
toggle in the player bar / settings ("Stream quality: Low-data / Lossless"); a per-track control; an
|
||||
automatic default with a manual override. Recommend a **single global quality toggle** (player bar
|
||||
or a settings affordance) — it is the Spotify/Bandcamp/SoundCloud idiom (one account/session-level
|
||||
"streaming quality" setting), low-friction, and matches a small-sharp-tool posture better than
|
||||
per-track choosers. `[Daniel decision]`
|
||||
- **OQ2 — Default policy: what does a listener get before they choose?** Opus is the
|
||||
*default-candidate* per Daniel — confirm Opus-by-default. Sub-questions: should the default be
|
||||
**capability-aware** (don't serve Ogg Opus to a browser that can't decode it — §3.4 Safari < 18.4)?
|
||||
Should it be **network-aware** (Opus on cellular, lossless on wifi)? Recommend **Opus by default,
|
||||
capability-gated** (fall back to lossless when the browser can't decode Ogg Opus), and **defer
|
||||
network-awareness** as gold-plating for v1. `[Daniel decision]`
|
||||
- **OQ3 — Is the choice remembered, and at what scope?** Per-session (resets each visit) vs.
|
||||
persisted (cookie/`localStorage`, like the `darkMode` cookie) vs. (future) per-account once identity
|
||||
exists. Recommend **persisted via a cookie/`localStorage` setting**, mirroring the dark-mode
|
||||
precedent — one truth, seeded at prerender, carried to WASM. `[Daniel decision]`
|
||||
- **OQ4 — Per-upload Opus control in the CMS, or always-on?** Should the CMS upload form let an admin
|
||||
opt a track *out* of Opus generation (e.g. a track meant to be lossless-only), or is Opus always
|
||||
generated for every track? Recommend **always-on** (simpler; Opus is additive and cheap to serve;
|
||||
the listener's format choice already covers "I want lossless"). A per-track opt-out is a later
|
||||
refinement if a real need appears. `[Daniel decision]`
|
||||
- **OQ5 — Opus container/extension specifics.** Ogg Opus (`.opus` / `audio/ogg`) is the assumption
|
||||
(broadest `decodeAudioData` support; Daniel said "Ogg Opus"). Confirm — vs. CAF-wrapped Opus (older
|
||||
Safari) or WebM-Opus. Recommend **Ogg Opus** as Daniel directed; CAF-fallback for old Safari is not
|
||||
worth it given the lossless fallback already covers those browsers (§3.4). `[Daniel steer — confirms
|
||||
§3.4, not a blocker]`
|
||||
- **OQ6 — Transcode execution model (flag, leans implementation).** Synchronous-at-upload is a
|
||||
non-starter for 1 GB mixes (§3.1); the realistic options are a background/queued transcode after the
|
||||
source is stored. This is largely staff-engineer's call, but it has a **product-visible
|
||||
consequence**: a freshly uploaded track may be lossless-only for a short window until its Opus
|
||||
artifact finishes. Confirm that "Opus appears shortly after upload, lossless available immediately"
|
||||
is acceptable (it is the waveform-datum model already in place). `[Daniel steer]`
|
||||
|
||||
---
|
||||
|
||||
## 7. Acceptance criteria
|
||||
|
||||
- **AC1 (headline) — Dual-format delivery works.** A track can be streamed as either lossless WAV or
|
||||
Ogg Opus 320 from the same `EntryKey`, selected per request; both play correctly through the bespoke
|
||||
Web Audio graph.
|
||||
- **AC2 — Opus is the low-data win.** The Opus artifact of a representative track is materially smaller
|
||||
than its lossless source (target ~1/4–1/5 the bytes); a long mix's Opus transfer is correspondingly
|
||||
smaller.
|
||||
- **AC3 — Additive, non-breaking (C2).** The existing lossless WAV path is byte-for-byte unchanged; a
|
||||
track with no Opus artifact still plays losslessly; `?format=opus` on such a track falls back to
|
||||
lossless (no 404, no silence).
|
||||
- **AC4 — Transcode at ingest, regenerable (C6).** A new upload produces an Opus artifact best-effort
|
||||
after the source is stored; a transcode failure does not block the upload or break playback; a
|
||||
Backfill-Opus action (re)generates artifacts for existing tracks; replace-audio regenerates the
|
||||
Opus artifact from the new source.
|
||||
- **AC5 — Opus seek via the existing `Range` path (C3).** Forward and backward seek in an Opus stream
|
||||
resolve through the landed `Range: bytes=X-` primitive, with the offset coming from
|
||||
`OpusFormatDecoder.calculateByteOffset`; no new seek mechanism is introduced.
|
||||
- **AC6 — No format branches leak (C4).** The only Opus-specific code is `OpusFormatDecoder`, its
|
||||
`OpusSeekData`, the one `createFormatDecoder` selection arm, and the transcode processor + delivery
|
||||
param resolution. The format-agnostic player/scheduler code is unchanged.
|
||||
- **AC7 — Capability-safe default (OQ2).** A browser that cannot decode Ogg Opus is served (or falls
|
||||
back to) the lossless path and plays audio; no listener gets silence because of codec support.
|
||||
- **AC8 — Windowing-ready (the Phase 21 handshake).** The `OpusFormatDecoder`'s approximate byte↔time
|
||||
mapping is the one Phase 21's windowed refill will call; Opus playback must be windowable by the
|
||||
same machinery (verified jointly when Phase 21 lands on top — see §8 / Phase 21 cross-ref).
|
||||
|
||||
---
|
||||
|
||||
## 8. Wave decomposition
|
||||
|
||||
Dependency shape: `18.1 → 18.2 → {18.3, 18.4}`, with `18.5` validating end-to-end. **18.1 (the
|
||||
transcode/derived-artifact ingest) is the cold-start prerequisite** — until an Opus artifact exists,
|
||||
nothing downstream has bytes to serve or decode. 18.3 (delivery param) and 18.4 (the decoder) are
|
||||
largely parallel once 18.2 (storage/lookup) settles, but both need an artifact to test against.
|
||||
|
||||
- **18.1 — Ingest transcode: derive + store the Opus artifact (cold-start; load-bearing).** New
|
||||
`OpusTranscodeService`/processor in `DeepDrftContent`, invoked post-store from
|
||||
`UnifiedTrackService.UploadAsync` alongside `WaveformProfileService`; produces Ogg Opus fullband
|
||||
320; stores it as a derived artifact (S2 vault recommended). Failure-tolerant (C6) and off the hot
|
||||
path (background/queued — OQ6). **Independent of the delivery/decoder waves; can begin immediately.**
|
||||
- **18.2 — Storage + lookup contract.** The derived-artifact key/vault convention and the server-side
|
||||
resolution "given `EntryKey` + format, return the right `AudioBinary` + content-type," including the
|
||||
C2 fallback (no Opus → lossless). **Depends on 18.1** (an artifact must exist to resolve to).
|
||||
- **18.3 — Delivery: format param + proxy threading.** `?format=opus|lossless` on the
|
||||
`DeepDrftAPI` track stream endpoint (resolves via 18.2), forwarded through the `DeepDrftPublic`
|
||||
`TrackProxyController` (mirror the existing `offset` param threading), and the `Range` handler
|
||||
serving the chosen artifact's bytes. The player sends the param via `TrackMediaClient`. **Depends on
|
||||
18.2.** Parallel-ok with 18.4.
|
||||
- **18.4 — `OpusFormatDecoder` in the player stack.** New `IFormatDecoder` implementation
|
||||
(Ogg-page-aligned `getAlignedSegmentSize` via `OggS` scan, `OpusHead` setup carry in
|
||||
`wrapSegment`/continuation, approximate page-interpolation `calculateByteOffset` with an
|
||||
`OpusSeekData` accelerator); one new arm in `AudioPlayer.createFormatDecoder` on
|
||||
`audio/ogg`/`audio/opus`. Capability detection for the lossless fallback (§3.4, OQ2). **Depends on
|
||||
18.2** (needs Opus bytes to decode). Parallel-ok with 18.3; they meet at 18.5.
|
||||
- **18.5 — Backfill + selection UX + end-to-end validation.** The Backfill-Opus CMS bulk action (third
|
||||
sibling to Generate-Profiles / Backfill-High-res) and replace-audio Opus regeneration; the listener
|
||||
selection control per OQ1/OQ3 (global persisted quality toggle, recommended); and the AC1–AC8
|
||||
acceptance pass — including AC8's confirmation that Opus is windowable so Phase 21 can build on it.
|
||||
**Depends on 18.1–18.4.** (Selection UX can be split out if Daniel wants the substrate proven before
|
||||
the control lands — flag at planning time.)
|
||||
|
||||
---
|
||||
|
||||
## 9. Cross-references (read before implementing)
|
||||
|
||||
- `CONTEXT.md §5` "Non-WAV formats" — the deferred intent this phase realizes (now concrete: derived
|
||||
Opus low-data path, not generic format support).
|
||||
- `PLAN.md` Phase 21 / `product-notes/phase-21-windowed-streaming-buffer.md` — **sequenced AFTER this
|
||||
phase.** Phase 21's C5 invariant ("WAV-only shipping target; must not foreclose MP3/FLAC") is now
|
||||
driven by Opus's VBR/paged seek math; Phase 21 OQ5 (adopt MSE) is resolved **NO** — the bespoke
|
||||
graph stays (the same C1 decision recorded here). Windowing a VBR/Opus stream uses
|
||||
`OpusFormatDecoder.calculateByteOffset`'s approximate mapping — exactly the C5 case.
|
||||
- `PLAN.md` Phase 4 (landed) / `COMPLETED.md` — the HTTP `Range: bytes=X-` primitive Opus seek reuses.
|
||||
- `PLAN.md` Phase 1.5 (gapless) / 1.6 (track-skip on error) / 1.7 (Safari) — 1.5's "encoder
|
||||
padding/priming" caveat applies to Opus (it has pre-skip samples in `OpusHead`); 1.6's
|
||||
byte-scan-to-next-frame is the Ogg-page-sync analogue; 1.7's Safari floor intersects §3.4's Ogg-Opus
|
||||
`decodeAudioData` support (Safari < 18.4).
|
||||
- `PLAN.md` Phase 12 / `product-notes/phase-12-waveform-visualizer-generalization.md` — the
|
||||
`WaveformProfileService` derived-artifact-at-ingest + regenerate pattern this transcode mirrors
|
||||
(compute on upload, regenerate via CMS action / endpoint, its own `track-waveforms` vault → the S2
|
||||
precedent).
|
||||
- `PLAN.md` Phase 9 — defines the `Mix` medium (single long track), the canonical low-data case.
|
||||
- `PLAN.md` Phase 16 — play/share telemetry keys on one track identity; the §3.6 road-not-taken
|
||||
(one-row-per-format) would have fractured this — kept to one identity, two artifacts.
|
||||
- `DeepDrftContent/Processors/AudioProcessor.cs` + `AudioProcessorRouter` + `DeepDrftContent/CLAUDE.md`
|
||||
— the existing format-router and the `WaveformProfileService` derived-artifact seam; 18.1 lives here.
|
||||
- `DeepDrftPublic/Interop/audio/IFormatDecoder.ts` — the strategy interface `OpusFormatDecoder`
|
||||
implements; `FlacFormatDecoder.ts` is the nearest prior art (setup-bytes carry + frame-sync scan).
|
||||
- `DeepDrftPublic/Interop/audio/AudioPlayer.ts` (`createFormatDecoder`, lines 117–125) — the decoder
|
||||
registry gaining the Opus arm.
|
||||
- `DeepDrftPublic.Client/Clients/TrackMediaClient.cs` + `DeepDrftPublic/Controllers/TrackProxyController.cs`
|
||||
— the media fetch + proxy that thread the new `?format=` param (mirroring `offset`).
|
||||
|
||||
## Sources
|
||||
|
||||
- Ogg Opus support in `decodeAudioData`: Chrome/Firefox long-standing; Safari added Ogg-Opus at 18.4
|
||||
(macOS 15.4 / iOS 18.4, March 2025) — prior Safari decoded Opus only in CAF.
|
||||
https://chromestatus.com/feature/5649634416394240 ;
|
||||
https://www.testmuai.com/learning-hub/opus-audio-codec-browser-support/
|
||||
@@ -8,6 +8,16 @@ server touch is **reuse, not new surface**: the existing `DeepDrftAPI` HTTP `Ran
|
||||
partial-content primitive (Phase 4, landed) is the load-bearing dependency; this phase adds no new API
|
||||
endpoint.
|
||||
|
||||
> **Sequencing dependency (Daniel, 2026-06-23): Phase 18 (Opus Low-Data Streaming) comes BEFORE this
|
||||
> phase.** Format support — specifically the derived **Ogg Opus fullband 320** low-data delivery path
|
||||
> (`product-notes/phase-18-opus-low-data-streaming.md`) — is a prerequisite that sequences ahead of
|
||||
> windowing. Phase 21's windowing must work across **both** delivery formats (lossless WAV and Opus).
|
||||
> Its C5 invariant below already anticipated this ("must not foreclose MP3/FLAC"); **Opus is now the
|
||||
> concrete VBR/containerized driver of C5.** Windowing an Opus stream uses the decoder's *approximate*
|
||||
> byte↔time mapping (`OpusFormatDecoder.calculateByteOffset` — Ogg-page interpolation), exactly the C5
|
||||
> case — not the exact CBR-WAV `byteRate` math. Build the window machinery format-agnostically
|
||||
> (§2 C3/C5) so it inherits Opus for free.
|
||||
|
||||
---
|
||||
|
||||
## 1. Goal
|
||||
@@ -45,19 +55,25 @@ docs. This phase **modifies that seam** — so the contract it must preserve is
|
||||
- **C2 — Playback start latency unchanged.** Today playback starts as soon as a configurable minimum
|
||||
buffer count is queued (header-derived duration, not full-file). The window model must keep first-audio
|
||||
latency at parity — bounding memory must not reintroduce a fetch-then-play stall.
|
||||
- **C3 — The format-decoder abstraction is untouched.** `IFormatDecoder` (WAV active; MP3/FLAC
|
||||
implemented, not yet wired) owns all format-specific byte math. Windowing lives in the
|
||||
- **C3 — The format-decoder abstraction is untouched.** `IFormatDecoder` owns all format-specific
|
||||
byte math; `AudioPlayer.createFormatDecoder` already dispatches on `Content-Type` (WAV/MP3/FLAC
|
||||
decoders all wired today — verified 2026-06-23; an `OpusFormatDecoder` joins them in Phase 18).
|
||||
Windowing lives in the
|
||||
**format-agnostic** layer (`PlaybackScheduler` eviction + `StreamDecoder`/player refill
|
||||
orchestration); it must add **no** format-specific branches. A future wired MP3/FLAC decoder inherits
|
||||
windowing for free.
|
||||
- **C4 — Read-only playback only.** This is a memory-management change, not a UX change. No new
|
||||
user-visible control, no change to seek/transport semantics beyond what the listener already
|
||||
experiences. Seek must still feel identical.
|
||||
- **C5 — WAV-only is the shipping target; the design must not foreclose MP3/FLAC.** Byte↔time mapping
|
||||
for refill is exact and cheap for WAV (CBR: `byteRate` from the header). For VBR formats the mapping is
|
||||
approximate (the decoders already carry TOC/SEEKTABLE seek math). The window machinery must express
|
||||
refill in terms of the decoder's existing `calculateByteOffset`, so the same code works when those
|
||||
formats are wired — **no WAV-special-cased offset math in the window layer.**
|
||||
- **C5 — Must window both delivery formats (WAV lossless AND Opus low-data).** Byte↔time mapping for
|
||||
refill is exact and cheap for WAV (CBR: `byteRate` from the header). For VBR/containerized formats it
|
||||
is approximate (the decoders carry TOC/SEEKTABLE/Ogg-page seek math). **Phase 18 (Opus) is sequenced
|
||||
before this phase and is the concrete driver here:** an Ogg Opus 320 stream is VBR and page-paged, so
|
||||
its `calculateByteOffset` is an *approximate* page-interpolation, not exact-offset. The window
|
||||
machinery must express refill purely in terms of the decoder's existing `calculateByteOffset`, so the
|
||||
same code windows WAV exactly and Opus approximately — **no WAV-special-cased offset math in the
|
||||
window layer.** (MP3/FLAC decoders are already wired in the registry too — the registry dispatches on
|
||||
content-type today; an `OpusFormatDecoder` joins them in Phase 18.)
|
||||
- **C6 — No regression to the single-instance JS decoder concurrency guarantees.** The current code is
|
||||
careful that only one streaming loop touches the single JS `StreamDecoder` at a time
|
||||
(`DrainActiveStreamingTaskAsync`, the `_streamingCancellation` identity dance). Windowed refill
|
||||
@@ -146,14 +162,15 @@ because the stack is a bespoke Web Audio graph, not `<media>` + MSE.
|
||||
Stop hand-rolling the decode→schedule graph for long tracks; feed the Range stream into a `SourceBuffer`
|
||||
and let the browser evict via its built-in quota + `remove()`. Memory management becomes the platform's
|
||||
problem.
|
||||
*Why not (now, but flag for Daniel):* MSE does not accept raw WAV/PCM — it wants containerized formats
|
||||
(fragmented MP4/WebM, or MP3/AAC elementary streams). The current producer is WAV-only, and the entire
|
||||
bespoke visualizer/spectrum graph is wired to the Web Audio `AudioContext`, not a `<media>` element.
|
||||
Adopting MSE is a **rewrite of the playback substrate**, not a windowing change — out of scope for this
|
||||
phase. But it is the *real* long-term answer and is entangled with Phase 1.2 (non-WAV formats): if
|
||||
DeepDrft moves to a compressed delivery format, MSE becomes viable and could retire the hand-rolled
|
||||
decoder, the seek-beyond-buffer path, *and* this phase's window machinery in one move. **Surfaced as
|
||||
open question OQ5** — not to decide now, but so this phase is built knowing it may be superseded.
|
||||
*Why not — RESOLVED, rejected (Daniel, 2026-06-23; see OQ5):* MSE does not accept raw WAV/PCM — it
|
||||
wants containerized formats (fragmented MP4/WebM, or MP3/AAC elementary streams). The entire bespoke
|
||||
visualizer/spectrum graph is wired to the Web Audio `AudioContext`, not a `<media>` element. Adopting
|
||||
MSE is a **rewrite of the playback substrate**, not a windowing change. It *looked* like the real
|
||||
long-term answer once compressed delivery arrived — but Daniel has decided compressed delivery
|
||||
(**Phase 18 Opus**) will feed the **same bespoke graph** via the `IFormatDecoder` seam, so the
|
||||
compressed-delivery move that would have justified MSE happens *without* surrendering the graph. **The
|
||||
bespoke graph is a deliberate long-term commitment; MSE is rejected.** Direction A is therefore the
|
||||
permanent destination, not a stopgap that MSE will retire. Recorded as considered-and-declined.
|
||||
|
||||
### 3.3 Recommended direction: A, with B held as the documented fallback
|
||||
|
||||
@@ -262,11 +279,17 @@ These are policy calls with user-visible or resource trade-offs — flagged rath
|
||||
tracks that never needed it. Recommend **window everything** (one path, C6-safe, and short tracks
|
||||
simply never hit a refill because they fit inside the forward window) — but Daniel may prefer a
|
||||
size threshold. `[Daniel decision]`
|
||||
- **OQ5 — Is MSE (Direction C) the real destination?** Not for this phase, but it bears on how much to
|
||||
invest here. If DeepDrft will move to compressed delivery (Phase 1.2) and MSE within ~a year, Phase 21
|
||||
should be the *minimal* Direction-A change (don't gold-plate machinery MSE would retire). If WAV +
|
||||
bespoke graph is the long-term commitment, a more thorough windowing investment is justified.
|
||||
`[Daniel steer — informs scope, not a blocker]`
|
||||
- **OQ5 — Is MSE (Direction C) the real destination? — RESOLVED: NO (Daniel, 2026-06-23).** **Do not
|
||||
adopt MSE. The bespoke Web Audio decode→schedule graph stays — it is bespoke by deliberate choice, a
|
||||
long-term commitment, not a stopgap.** Daniel's rationale: the player is intentionally a custom
|
||||
graph, not an HTML `<media>` element; the compressed-delivery move that *would* have made MSE
|
||||
tempting is being met instead by **Phase 18 (Opus low-data path)** feeding the **same bespoke graph**
|
||||
through the `IFormatDecoder` seam — so compressed delivery arrives *without* surrendering the graph.
|
||||
Consequence for this phase: Direction A (the hand-rolled sliding window) is the destination, not a
|
||||
placeholder; invest in it as permanent machinery. It will window both the WAV and the Opus path
|
||||
(the sequencing note at the top). Direction C is recorded as **considered and declined** per file
|
||||
convention; kept visible so a future reader sees the road not taken and why.
|
||||
`[RESOLVED — bespoke graph retained; MSE rejected]`
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user