docs: spec Phase 23 — SEO crawl directives (sitemap.xml, robots.txt, CMS noindex)

2026-06-23 07:10:20 -04:00
parent 33383cd675
commit 9a4b79d377
2 changed files with 390 additions and 0 deletions
@@ -653,6 +653,26 @@ convention.** None block 21.1.
 ---
 ## Phase 23 — SEO Crawl Directives (sitemap.xml, robots.txt, CMS noindex)
 The endpoint/file-shaped follow-on to Phase 22's per-page `SeoHead` component (landed 2026-06-23, `COMPLETED.md §22`). Phase 22 flagged these three as "adjacent but separate concerns" (`product-notes/phase-22-seo-metadata-component.md §7`): they are a different *unit of work* — server-side endpoints and static files that tell crawlers **which** pages exist and **whether** to crawl at all, vs. the per-page head surface that says **what each page is**. Phase 22 is the *content* of discoverability; Phase 23 is the *directives* layer above it. Full design, contracts, acceptance criteria, and open questions: `product-notes/phase-23-seo-crawl-directives.md`.
 **The environment gate is the through-line.** Phase 22 established the rule that **every non-production environment must be uncrawlable** (beta/staging must not be indexed). Phase 22 expressed this for WASM-rendered page robots-meta via the `SeoEnvironment` `[PersistentState]` bridge. **Phase 23's three items all run server-side only** (endpoints + static files, never the WASM render tree), so they read the gate the simplest way: **`IWebHostEnvironment.IsProduction()` injected directly** — the same predicate `App.razor` seeds `SeoEnvironment` from, no PersistentState bridge needed because nothing crosses the server→WASM seam. Invariant E1 (fail-safe closed): in any non-production environment, `robots.txt` is `Disallow: /` and the sitemap is not served (or empty).
 **Architecture seam (per project convention).** Generated XML/text belongs in a **thin endpoint on `DeepDrftPublic`**, with list logic **reusing the existing release read** — no new `DeepDrftAPI` endpoint, no schema change (Phase 22 C5 holds). The sitemap endpoint *enumerates + transforms* (it is NOT a verbatim proxy like `ReleaseProxyController`): it walks `GET api/release` paged (server-to-server via the existing `"DeepDrft.API"` named client) and emits XML, absolutizing each `<loc>` via `SeoOptions.BaseUrl` (`https://deepdrft.com`) + `ReleaseRoutes.DetailHref(entryKey, medium)` — so every sitemap URL equals the page's `SeoHead` canonical by construction. The CMS item is the **one** deliberate, minimal exception to Phase 22 C1 ("zero CMS changes"): admin-chrome-only, no functional/service/API/data change.
 Sequenced as three largely-independent waves; the only coupling is a shared env-gate + `BaseUrl` wiring between the two public items.
 - **23.1 — Public env-gate primitives + `robots.txt` endpoint (cold-start, shared seam).** Stand up the `IWebHostEnvironment`-gated server-side endpoint pattern on `DeepDrftPublic` and ship `GET /robots.txt` (Production: `Allow: /` + `Sitemap:` pointer; non-prod: `Disallow: /`). Smallest item; establishes the **shared gate + BaseUrl wiring** 23.2 reuses, so it de-risks the seam. Resolves the static-vs-endpoint call (recommend **endpoint** — single testable gate; a static file can't express the per-environment branch). **Cold-start.**
 - **23.2 — `sitemap.xml` endpoint.** The release-enumeration walk over `GET api/release` (paginate until `PageNumber * PageSize >= TotalCount`) + sitemaps.org `urlset` emission + `ReleaseRoutes`/`BaseUrl` absolutization + the env gate (404 in non-prod). Static roots: `/`, `/about`, `/cuts`, `/sessions`, `/mixes`, `/archive`; plus one `<url>` per release (`/cuts|sessions|mixes/{key}`), optional `<lastmod>` from `ReleaseDate`. Resilient — a partial/empty release set yields a well-formed doc, never a 500. **Shares the gate + BaseUrl wiring with 23.1** (do 23.1 first or co-develop; same controller area); the production `robots.txt`'s `Sitemap:` line points here (harmless if 23.2 lands slightly later).
 - **23.3 — CMS `noindex` (the one CMS-touching item; fully parallel).** Static `robots.txt` (`Disallow: /` — no env branch; the CMS is *always* uncrawlable, including in production) in the `DeepDrftManager` `wwwroot/`, **plus** a blanket `<meta name="robots" content="noindex,nofollow">` in the CMS host `<head>` (defense in depth: robots-disallow prevents crawling but on-page `noindex` is what de-indexes a URL discovered via an external link). The CMS does **not** get Phase 22's `SeoHead` — one blanket directive, not a parameterized component. **Fully independent — touches only `DeepDrftManager`, can run start-to-finish from day one.**
 **Dependency shape:** `23.1 → 23.2` (shared gate/BaseUrl wiring + the `Sitemap:` pointer); **23.3 ∥** (parallel, independent, different app). Cold-start is **23.1**. A single end-of-phase production-vs-beta matrix check (Search Console / `curl` both hosts + sitemaps.org validator) is folded into the waves' ACs rather than a separate validation wave.
 **Open questions for Daniel (spec §7) — recommendations stated, none block 23.1:** OQ-S1 sitemap lists canonical browse roots only, **not** filtered/paginated variants (recommend: roots only — variants are views, not content); OQ-S2 `<lastmod>` from `ReleaseDate` (recommend: include it, accepting that it is the release date, not a content-modified date — a true modified timestamp would need a schema column, violating C5); OQ-S3 static-root list hardcoded vs. derived from nav (recommend: explicit list — indexable-roots ≠ nav set, e.g. `/FramePlayer` must stay out); OQ-R1 robots endpoint vs. static+nginx (recommend: endpoint); OQ-R2 also `Disallow: /FramePlayer` (recommend: yes) and `/api/` (optional) in Production; OQ-C1 CMS both layers vs. robots-only (recommend: both); OQ-X1 confirm `https://deepdrft.com` is the final canonical origin (likely closed — shipped with Phase 22).
 ---
 ## Working with this file
@@ -0,0 +1,370 @@
 # Phase 23 — SEO Crawl Directives (sitemap.xml, robots.txt, CMS noindex)
 Product spec. Status: **design / framing — implementation-ready pending Daniel's open-question calls.**
 Author: product-designer. Date: 2026-06-23. **No code has been written by this doc.**
 Phase 23 is the **endpoint/file-shaped follow-on** to Phase 22's per-page `SeoHead` component. Phase 22 flagged
 these three as "adjacent but separate concerns" (`product-notes/phase-22-seo-metadata-component.md §7`): they
 are a different *unit of work* — server-side endpoints and static files that tell crawlers **which** pages exist
 and **whether** to crawl them at all, as opposed to the per-page head surface that tells crawlers **what each
 page is**. Phase 22 is the *content* of discoverability; Phase 23 is the *directives* layer above it.
 Three items, each independently shippable:
 1. **`sitemap.xml`** on the public host — a generated sitemap enumerating every indexable public URL.
 2. **`robots.txt`** on the public host — allow + sitemap pointer in Production, `Disallow: /` everywhere else.
 3. **CMS `noindex`** on `DeepDrftManager` — the admin app must never be indexed. The **one** item touching the CMS.
 ---
 ## 1. The environment gate is the through-line (read this first)
 Phase 22 established the rule that **every non-production environment must be uncrawlable** — the beta/staging
 host must not appear in search results, and a stray crawl of staging must not dilute or duplicate the production
 site. Phase 22 expressed this for *page-level robots meta* via `SeoEnvironment` (a `[PersistentState]` bridge
 seeded from `IWebHostEnvironment.IsProduction()`, because `SeoHead` renders in the **WASM** component graph and
 WASM has no `IWebHostEnvironment`).
 **Phase 23's three items all run server-side only** (endpoints and static files, never the WASM render tree), so
 they read the gate the simplest possible way: **`IWebHostEnvironment.IsProduction()` injected directly.** They do
 **not** need the `SeoEnvironment` PersistentState bridge — that bridge exists *solely* to ferry the flag across
 the server→WASM seam, which these never cross. This is the correct reuse: same source of truth
 (`IWebHostEnvironment.IsProduction()`, the exact predicate `App.razor` already seeds `SeoEnvironment` from), no
 parallel gate invented, and no PersistentState plumbing where it isn't needed.
 | Concern | Renders where | Gate mechanism |
 |---|---|---|
 | Phase 22 `SeoHead` robots meta | WASM component graph | `SeoEnvironment` `[PersistentState]` bridge (server seed → WASM read) |
 | Phase 23 sitemap / robots / CMS | server-side endpoint or static file | `IWebHostEnvironment.IsProduction()` injected directly |
 **Invariant E1 (the non-negotiable):** in any non-production environment, `robots.txt` is `Disallow: /` and the
 sitemap is either not served or empty. A crawler must see a closed door on beta before it sees a single URL.
 The fail-safe default (matching Phase 22's `SeoEnvironment` fail-safe-to-`noindex`) is **closed**: if environment
 resolution is ever ambiguous, behave as non-production (disallow).
 ---
 ## 2. The architecture seam (where this code lives, and what it must not become)
 Per the project convention (root `CLAUDE.md`; `DeepDrftPublic/CLAUDE.md`): **the public host owns thin HTTP
 boundaries; domain logic lives in `*.Services` libraries or `DeepDrftAPI`.** Generated XML/text is a *rendering*
 of data the host already has access to — it belongs in a **thin endpoint on `DeepDrftPublic`**, and any list
 logic it needs must **reuse the existing release read**, not re-implement enumeration.
 - **`sitemap.xml`** is *not* a pass-through proxy like `ReleaseProxyController` (which relays JSON verbatim). It
  **enumerates** releases and **transforms** them into a different media type (XML). So it is a new endpoint that
  *calls* the upstream `GET api/release` paged read (server-to-server via the existing `"DeepDrft.API"` named
  `HttpClient`, the same client SSR prerender already uses — no proxy hop, no new data-layer code, no schema
  change) and walks the pages to build the URL set. **C5 from Phase 22 holds:** no new API endpoint on
  `DeepDrftAPI`, no schema change — the existing `PagedResult<ReleaseDto>` read is sufficient (it carries
  `EntryKey`, `Medium`, and `ReleaseDate` — everything a `<url>` entry needs).
 - **The URL composition reuses Phase 22's seams, not new ones:** absolute origin from `SeoOptions.BaseUrl`
  (`https://deepdrft.com` — config, because the origin can't be derived behind the nginx proxy), and per-release
  detail paths from `ReleaseRoutes.DetailHref(entryKey, medium)` (the single source of truth the Cut/Session/Mix
  pages, the player bar, and `SharePopover` all already use). The sitemap thereby lists the *exact* canonical
  URLs `SeoHead` emits as `<link rel="canonical">` — by construction, not by coincidence.
 > **Seam note for staff-engineer.** `SeoOptions` and `ReleaseRoutes` currently live in `DeepDrftPublic.Client`
 > (`Common/`). A server-side endpoint on `DeepDrftPublic` (the host) references the client assembly already (it
 > loads `DeepDrftPublic.Client._Imports` as an additional WASM assembly and shares the static `Startup`), so the
 > host can read these types. Confirm the reference direction at implementation; if `SeoOptions.BaseUrl` is not
 > cleanly reachable from a host controller, the minimal move is to source `BaseUrl` from the same config the
 > client `SeoOptions` is seeded from (it is a non-secret brand constant — `appsettings.json`, per Phase 22 §4.1),
 > **not** to duplicate the constant. This is a wiring detail, not a design fork.
 ---
 ## 3. Item 1 — `sitemap.xml`
 ### 3.1 Mechanism and location
 A new thin endpoint on `DeepDrftPublic` serving `GET /sitemap.xml` with content-type `application/xml`. It is an
 endpoint (not a static file and not a Razor component) because the URL set is **dynamic** — it must include every
 release detail URL, which changes as releases are added. A static file would go stale the moment a release lands.
 Recommended placement: a small `SitemapController` (or a minimal-API endpoint in `Program.cs`) alongside the
 existing proxy controllers in `DeepDrftPublic/Controllers/`. It is a host concern (HTTP surface + rendering),
 exactly the layer the proxy controllers occupy. It injects `IWebHostEnvironment` (the gate) and
 `IHttpClientFactory` (to call `"DeepDrft.API"`), mirroring `ReleaseProxyController`'s constructor shape.
 ### 3.2 What it enumerates
 The indexable public URL set, all absolutized against `SeoOptions.BaseUrl`:
 - **Static roots:** `/` (home), `/about`, and the four browse surfaces `/cuts`, `/sessions`, `/mixes`,
  `/archive`. These are a fixed list (a small in-endpoint constant array, or — cleaner — derived from the same
  nav index the site already maintains; see OQ-S3).
 - **Every release detail URL:** walk `GET api/release?page=N&pageSize=…` until `PageNumber * PageSize >=
  TotalCount`, and for each `ReleaseDto` emit `BaseUrl + ReleaseRoutes.DetailHref(dto.EntryKey, dto.Medium)` —
  i.e. `/cuts/{key}`, `/sessions/{key}`, `/mixes/{key}`. No `medium` filter on the query (we want all media in
  one pass); a generous `pageSize` (e.g. 100–200) keeps the walk to a handful of round-trips even for a large
  catalogue.
 ### 3.3 XML shape
 Standard sitemaps.org `urlset`:
 ```xml
 <?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url><loc>https://deepdrft.com/</loc></url>
  <url><loc>https://deepdrft.com/about</loc></url>
  <url><loc>https://deepdrft.com/cuts</loc></url>
  <!-- … browse roots … -->
  <url>
    <loc>https://deepdrft.com/mixes/3f2a9c…</loc>
    <lastmod>2026-05-12</lastmod>   <!-- optional; from ReleaseDate — see OQ-S2 -->
  </url>
  <!-- … one <url> per release … -->
 </urlset>
 ```
 - `<loc>` is required and must be a fully-qualified absolute URL (the reason `BaseUrl` is mandatory).
 - `<lastmod>` is **optional** and recommended from `ReleaseDto.ReleaseDate` (W3C date format `YYYY-MM-DD`) **for
  release URLs only** — static roots have no natural lastmod and omit it. See **OQ-S2** (ReleaseDate is the
  *release* date, not a content-modified date — it is a reasonable proxy but not strictly correct; the safe call
  is to include it, as a stale-but-plausible lastmod is better than none and crawlers treat it as a hint).
 - **No** `<changefreq>` / `<priority>` — both are widely ignored by Google and add noise. Omit them.
 ### 3.4 Failure posture
 The endpoint must degrade gracefully — a sitemap that 500s trains crawlers to stop fetching it. If the upstream
 `api/release` walk fails partway, **emit what was gathered** (static roots are always available; partial release
 set is better than none) and log the failure. Never 500 the sitemap. (Mirrors `ReleaseProxyController`'s
 philosophy of not collapsing valid-but-partial states, adapted to "always return a well-formed document.")
 ### 3.5 Acceptance criteria (sitemap)
 - **AC-S1 — Valid + complete.** `GET /sitemap.xml` (in Production) returns well-formed `urlset` XML that
  validates against the sitemaps.org schema and contains: the 6 static roots **and** exactly one `<url>` per
  non-deleted release, addressed by `ReleaseRoutes.DetailHref` (so every `<loc>` equals the page's canonical).
 - **AC-S2 — Absolute URLs.** Every `<loc>` is `https://deepdrft.com/…` (config origin, not a relative path, not
  a proxy-derived host).
 - **AC-S3 — Pagination walk is exhaustive.** A catalogue larger than one page is fully enumerated (no releases
  dropped at a page boundary); a catalogue of zero releases yields a valid sitemap of just the static roots.
 - **AC-S4 — Environment-gated.** In a non-production environment, `/sitemap.xml` is either not served (404) or
  served empty/`Disallow`-consistent — it must never advertise beta release URLs to a crawler (E1). Recommend
  **404 in non-production** (simplest; nothing references it because the non-prod `robots.txt` carries no
  `Sitemap:` line — see Item 2).
 - **AC-S5 — Resilient.** An upstream `api/release` failure yields a well-formed sitemap of the static roots (and
  any releases gathered before the failure), logged — never a 500.
 ---
 ## 4. Item 2 — `robots.txt`
 ### 4.1 Mechanism and location — the static-vs-endpoint tradeoff (flagged)
 `robots.txt` must express the environment gate (`Disallow: /` on beta, allow + sitemap pointer in Production). A
 **static file** in `wwwroot/` **cannot** do this — it serves identical bytes in every environment. So the
 content is environment-dependent and wants a **tiny endpoint** (`GET /robots.txt`, content-type `text/plain`),
 injecting `IWebHostEnvironment` for the gate.
 Three options, with the recommendation:
 - **(a) Endpoint `GET /robots.txt` [RECOMMENDED].** A few lines of code in the same place as the sitemap
  endpoint; reads `IWebHostEnvironment.IsProduction()`; emits the production or non-production body. Single source
  of truth for the gate, co-located with the sitemap, no infra dependency. The body is trivial.
 - **(b) Static file + reverse-proxy rule.** Ship a production `robots.txt` in `wwwroot/` and have nginx serve a
  `Disallow: /` variant (or block the file) on the beta host. **Cons:** splits the gate across app + nginx config
  (two places to reason about, two places to get wrong); the beta protection lives in infra the app can't test;
  Daniel would maintain an nginx rule per environment. Rejected unless Daniel specifically wants robots managed at
  the proxy layer.
 - **(c) Static file only.** Cannot express the gate at all — would either crawl-allow beta (violates E1) or
  disallow production. **Rejected outright.**
 The endpoint (a) is the natural sibling to the sitemap endpoint and keeps E1 in one testable place. Note the
 ordering subtlety from `DeepDrftPublic/CLAUDE.md`: static-file middleware runs before component/controller
 mapping, so **if** a literal `wwwroot/robots.txt` ever exists it would shadow the endpoint — the endpoint
 approach requires that no static `robots.txt` is shipped (a one-line thing to verify, called out so it isn't
 tripped over).
 ### 4.2 Content
 **Production:**
 ```
 User-agent: *
 Allow: /
 Sitemap: https://deepdrft.com/sitemap.xml
 ```
 **Every non-production environment (beta/staging):**
 ```
 User-agent: *
 Disallow: /
 ```
 - The `Sitemap:` line uses the absolute `SeoOptions.BaseUrl` origin (same config source as the sitemap's
  `<loc>`s) — it is the one documented way to point crawlers at the sitemap without submitting it manually.
 - The non-production body carries **no** `Sitemap:` line (consistent with AC-S4's "don't advertise beta URLs").
 - Consider whether to additionally `Disallow: /FramePlayer` and the `api/*` proxy paths in Production (OQ-R2) —
  the embed iframe and the JSON/stream proxy endpoints are not pages worth crawling.
 ### 4.3 Acceptance criteria (robots)
 - **AC-R1 — Production allows + points.** `GET /robots.txt` on the production host returns `Allow: /` and a
  `Sitemap: https://deepdrft.com/sitemap.xml` line.
 - **AC-R2 — Beta disallows everything.** `GET /robots.txt` on any non-production host returns `User-agent: *` +
  `Disallow: /` and **no** `Sitemap:` line (E1).
 - **AC-R3 — Single gate.** The Production-vs-beta distinction is driven by `IWebHostEnvironment.IsProduction()` —
  the same predicate as the sitemap and as Phase 22's `SeoEnvironment` seed — not a second config flag.
 - **AC-R4 — `text/plain`.** Correct content-type; no BOM/HTML wrapper.
 ---
 ## 5. Item 3 — CMS `noindex` (the one CMS-touching item)
 **This is the only Phase 23 item that touches `DeepDrftManager`.** Scoped, minimal, admin-chrome-only — **no
 functional change** to any CMS page, no service/API/data change. `DeepDrftManager` is an authenticated admin app
 that must never appear in any search index, in any environment (it has no "production is fine to index" case —
 the CMS is *always* `noindex`, unlike the public site whose gate flips per environment).
 ### 5.1 Mechanism — defense in depth, cheapest-robust
 Two layers; recommend **both** because they fail independently and the cost is trivial:
 - **(a) `robots.txt` on the CMS host [primary].** A `Disallow: /` `robots.txt` served at the CMS root. Because the
  CMS is *always* uncrawlable (no environment gate), this can be the **simplest possible static file** in the CMS
  `wwwroot/` — no endpoint, no environment logic:
  ```
  User-agent: *
  Disallow: /
  ```
  This is the cleanest single move and differs from the public `robots.txt` precisely because there is no
  per-environment branch to express.
 - **(b) Blanket `<meta name="robots" content="noindex,nofollow">` in the CMS layout `<head>` [belt-and-braces].**
  A static meta tag in the CMS app's root `App.razor`/host `<head>` (the CMS's analogue of the public
  `App.razor`'s static head block). This protects against the case where a crawler reaches a deep CMS URL that
  `robots.txt` disallow doesn't *de-index* (robots disallow prevents *crawling*, but a URL linked from elsewhere
  can still be *indexed* without crawling; an on-page `noindex` is what actually keeps it out of the index). It is
  a single static line in the CMS host head — no per-page wiring, no component, no `SeoHead` port (the CMS does
  **not** get Phase 22's component; this is one blanket tag).
 Layer (a) is the floor; layer (b) is the robust ceiling. Together they cost a static file plus one `<head>` line.
 ### 5.2 Why the CMS does *not* reuse Phase 22's `SeoHead` / `SeoEnvironment`
 Phase 22 C1/C9 explicitly kept the CMS out of scope ("Zero changes to `DeepDrftManager`"). Phase 23 makes the
 **one** deliberate, minimal exception — but it does **not** drag the public component graph into the CMS. The CMS
 need is a single constant directive ("never index"), not a parameterized per-page head surface; porting `SeoHead`
 (a `DeepDrftPublic.Client` WASM component) into the server-rendered CMS would be wildly disproportionate. The
 blanket meta + static robots is the right-sized answer. (And `SeoEnvironment`'s per-environment flip is
 irrelevant here — the CMS is `noindex` in *all* environments, including production.)
 ### 5.3 Acceptance criteria (CMS noindex)
 - **AC-C1 — CMS robots disallows.** `GET /robots.txt` on the CMS host returns `User-agent: *` + `Disallow: /`.
 - **AC-C2 — Every CMS page carries `noindex`.** Any CMS page's prerendered `<head>` contains
  `<meta name="robots" content="noindex,nofollow">` (the blanket layout tag), including the public-facing
  `/account/login` and `/account/register` routes (which render in the lean `CmsHomeLayout`) and the home splash.
  Confirm the meta lands in whichever head block both layouts inherit (the CMS host `App.razor`), so a
  layout-specific head doesn't leave a route uncovered.
 - **AC-C3 — No functional change.** No CMS page's behavior, auth gate, layout, or data path changes — the diff is
  a static `robots.txt` and a static `<meta>` line. (Aligns with Phase 22 AC9's spirit, now scoped as the
  intentional CMS exception.)
 - **AC-C4 — Always-on (no env gate).** The CMS `noindex` holds in production too — it is unconditional, unlike the
  public site.
 ---
 ## 6. Wave decomposition
 These are **largely independent** — three separate surfaces with one shared concept (the env gate) and one shared
 config value (`BaseUrl`). The dependency graph is shallow.
 - **23.1 — Public env-gate primitives + `robots.txt` endpoint (cold-start, shared seam).** Stand up the
  server-side `IWebHostEnvironment`-gated endpoint pattern on `DeepDrftPublic` and ship `GET /robots.txt`
  (Production allow+sitemap-pointer / non-prod `Disallow: /`). This is the smallest item and it establishes the
  **shared gate + BaseUrl wiring** that 23.2 also uses, so doing it first de-risks the seam. Resolves the
  static-vs-endpoint call (OQ-R1). **Cold-start; nothing depends on it being done first except that 23.2 reuses
  the same gate wiring.**
 - **23.2 — `sitemap.xml` endpoint.** The release-enumeration walk over `GET api/release` + XML emission +
  `ReleaseRoutes`/`BaseUrl` absolutization + the env gate (404 in non-prod). The largest item. **Shares the gate
  + BaseUrl wiring with 23.1** (do 23.1 first or co-develop; they touch the same controller area). The
  `Sitemap:` line in 23.1's production `robots.txt` points at this — so 23.1's production body assumes 23.2 exists
  (harmless if 23.2 lands slightly later: a `Sitemap:` pointer to a not-yet-built URL just 404s until it does).
 - **23.3 — CMS `noindex` (the CMS-side item).** Static `robots.txt` (`Disallow: /`) in the `DeepDrftManager`
  `wwwroot/` + blanket `<meta name="robots" content="noindex,nofollow">` in the CMS host `<head>`. **Fully
  independent — touches only `DeepDrftManager`, shares nothing with 23.1/23.2, can run in parallel from day one.**
 **Dependency shape:** `23.1 → 23.2` (shared gate/BaseUrl wiring + the `Sitemap:` pointer relationship); **23.3 ∥**
 (parallel, independent, different app). The cold-start item is **23.1** (it proves the gate seam the public side
 leans on); **23.3** can run start-to-finish alongside either.
 **Validation (folded into each wave's ACs, not a separate wave):** the items are small enough that a dedicated
 validation wave is overkill — each wave carries its own ACs (S/R/C above). A single end-of-phase check that
 exercises the production-vs-beta matrix for all three (Google Search Console / a `curl` against both hosts, plus
 the sitemaps.org validator) is worth doing once 23.1–23.3 land.
 ---
 ## 7. Open questions for Daniel (product/infra calls, not implementation detail)
 ### Sitemap
 - **OQ-S1 — Browse variants vs. canonical roots.** The sitemap lists the **canonical** browse roots (`/cuts`,
  `/sessions`, `/mixes`, `/archive`). Phase 11 put Archive filters in the URL (`/archive?q=&medium=&genre=`).
  **Recommend: do NOT enumerate filtered/paginated variants** — they are filtered *views* of the same release set,
  not distinct content, and listing them invites duplicate-content dilution. The per-release detail URLs carry the
  indexable content; the browse roots are navigational. `[Daniel decision — recommendation: canonical roots only]`
 - **OQ-S2 — `lastmod` source.** Use `ReleaseDto.ReleaseDate` as the release URLs' `<lastmod>`? It is the *release*
  date, not a content-last-modified date (a re-edited description or replaced cover would not bump it). **Recommend:
  include it** — a plausible-but-imperfect lastmod is a useful crawl hint and strictly better than omitting it; the
  alternative (a true content-modified timestamp) would need a schema column that doesn't exist (would violate
  C5/no-schema-change). Static roots omit `lastmod`. `[Daniel decision — recommendation: ReleaseDate, accept the
  imprecision]`
 - **OQ-S3 — Static-root list source.** Hardcode the 6 static roots in the endpoint, or derive from the site's nav
  index (`DeepDrftPublic.Client/Layout/Pages.cs` `AllPages`)? **Recommend: hardcode for v1** (the indexable-roots
  set is *not* the same as the nav set — e.g. `/FramePlayer` is a nav-absent route that must stay out, and a new
  nav entry isn't automatically sitemap-worthy), with a code comment to revisit if the set grows. Deriving couples
  the sitemap to nav decisions in a way that can silently leak or drop URLs. `[Daniel decision — recommendation:
  explicit list]`
 ### robots
 - **OQ-R1 — Endpoint vs. static + nginx (§4.1).** **Recommend the endpoint** (single testable gate, co-located
  with the sitemap). Confirm, or — if Daniel prefers robots managed at the reverse-proxy layer — the static +
  nginx-rule variant (b), accepting the split gate. `[Daniel decision — recommendation: endpoint]`
 - **OQ-R2 — Disallow non-page routes in Production?** Should the production `robots.txt` additionally
  `Disallow: /FramePlayer` (the embed iframe) and/or `Disallow: /api/` (the proxy JSON/stream paths)? **Recommend:
  yes for `/FramePlayer`** (an embed shell is not a destination page and would be thin/duplicate content if
  crawled), **optional for `/api/`** (proxy paths return JSON/bytes, not HTML — crawlers mostly self-skip, but an
  explicit disallow is tidy). `[Daniel decision — low stakes]`
 ### CMS
 - **OQ-C1 — Both layers or just robots? (§5.1)** **Recommend both** (static `Disallow: /` robots **and** the
  blanket `noindex` meta) — they fail independently and the combined cost is a file + one line; robots-disallow
  alone does not de-index a URL discovered via an external link, which is exactly what the on-page `noindex`
  closes. Confirm, or accept robots-only if the meta line is judged not worth the one CMS `<head>` touch. `[Daniel
  decision — recommendation: both]`
 ### Cross-cutting
 - **OQ-X1 — Is `https://deepdrft.com` the confirmed canonical origin?** This is Phase 22's OQ1, still load-bearing
  here: every `<loc>`, the `Sitemap:` line, all assume `SeoOptions.BaseUrl = https://deepdrft.com`. If that value
  was confirmed when Phase 22 landed (COMPLETED.md §22 shows it shipped as `https://deepdrft.com`), this is
  closed — flagged only so the dependency is explicit. `[Likely closed — confirm BaseUrl is final]`
 ---
 ## 8. Cross-references (read before implementing)
 - `product-notes/phase-22-seo-metadata-component.md` — the parent spec; §7 "Adjacent but separate concerns"
  flagged all three Phase 23 items; the `SeoOptions.BaseUrl` / `ReleaseRoutes` / `SeoEnvironment` seams Phase 23
  reuses are defined here.
 - `COMPLETED.md §22` — what Phase 22 actually landed (the `SeoEnvironment` env gate, `SeoOptions.BaseUrl =
  https://deepdrft.com`, the `ReleaseRoutes`-based canonical the sitemap must match).
 - `DeepDrftPublic/Controllers/ReleaseProxyController.cs` — the thin-proxy shape and the `"DeepDrft.API"` named
  client the sitemap endpoint reuses to walk releases (server-to-server, no proxy hop). **Note the distinction:**
  the sitemap endpoint *enumerates + transforms*, it does not relay verbatim like this proxy.
 - `DeepDrftPublic/CLAUDE.md` — the host's "thin HTTP boundary, no domain logic" contract; the middleware ordering
  (static files before controller mapping — relevant to the robots endpoint-vs-static-file shadowing note); the
  `IWebHostEnvironment` availability server-side.
 - `DeepDrftPublic.Client/Common/ReleaseRoutes.cs` — `DetailHref(entryKey, medium)`, the single source of truth for
  per-release detail URLs; every sitemap `<loc>` for a release goes through it.
 - `DeepDrftPublic/Components/App.razor` — where `SeoEnvironment.IsProduction` is seeded from
  `IWebHostEnvironment.IsProduction()` (lines 38–48); the Phase 23 endpoints read the **same** predicate directly.
 - `DeepDrftAPI/Controllers/ReleaseController.cs` `GET api/release` — the paged `PagedResult<ReleaseDto>` read the
  sitemap walks (returns `Items`, `TotalCount`, `PageNumber`, `PageSize`; `ReleaseDto` carries `EntryKey`,
  `Medium`, `ReleaseDate`). No change to this endpoint (C5).
 - `DeepDrftManager` host `App.razor` / `wwwroot/` — where Item 3's CMS robots file and blanket `noindex` meta land
  (the one CMS-touching surface).
 - sitemaps.org `0.9` schema + Google's "Manage your sitemaps" / robots.txt docs — the validation targets (AC-S1,
  AC-R*).