ADR-0041: Ingest durability semantics — cursor advance, lake sink defaults, drop visibility

Closes: CS-028 (cursor advances on enqueue), the "substrate off

by default" trap, and the LiveSink silent-drop blind spot (audit 2026-06-30 / cold-read 2026-07-02).

Context

Three coupled facts about the live ingest path:

The ledgerstream cursor advances on ENQUEUE, not durable write

(processAndPersistCursor): events sit in the PersistEvents channel, the soroban_events AsyncSink, and the CH LiveSink when the cursor moves past their ledger. A hard crash loses whatever was buffered — silently, for any source the census doesn't cover.

**The ClickHouse dual-sink and the projector's CH feed-switch were

false by default**, while the "100% coverage" claim rests on the CH lake capturing everything. r1 — the only deployment — runs both true; the defaults described a deployment shape we don't run and would hand R2/R3 bring-up a certified-lake claim with the lake sink off.

**clickhouse.LiveSink drops whole ledger extracts on buffer

pressure by design (non-blocking; `DroppedCount++`), healed by the `ch-live-catchup` timer — but the `stellarindex_ch_live_sink_ledgers_total{outcome="dropped"}` counter had no alert in either rule tree**: sustained drops (wedged CH, undersized buffer) would surface only as a later completeness-verdict failure, hours after the fact.

Decision

1. Cursor semantics: keep enqueue-advance; the lake + reconcile IS the durability story

We considered advancing the cursor only past the minimum durable watermark of all sinks. Rejected for now:

It couples live throughput to the SLOWEST sink (the 20× throughput

win of the batched channel drain came precisely from decoupling).

The loss window is one crash × ≤ buffer depth, and every lost row

is recoverable mechanically: raw events from the CH lake (ch-live-catchup heals the lake's own tail; projector-replay / re-derive heals the served tier), and the ADR-0033 verdict — since CS-084, strict per-ledger — DETECTS any residue.

The invariant we commit to instead: **the cursor is a resume hint,

not a durability claim.** The completeness verdict is the durability claim. Anything that weakens the verdict's ability to see a post-crash hole (e.g. disabling the daily timer, aggregate reconciles) is a regression against THIS ADR, not just ADR-0033.

Preconditions this rests on (now enforced/monitored):

the CH sink is on (Decision 2),
the verdict runs on a timer and is watchdogged (data-freshness),
drops are alerted (Decision 3).

2. `clickhouse_live_sink` + `clickhouse_projector_source` default to `true`

The defaults now match the only production topology and the coverage claim's substrate. Deployments that genuinely cannot run ClickHouse (dev laptops, minimal self-hosters) opt OUT explicitly — the config docs say what they give up: the certified-lake substrate, the CH completeness path, and lake-derived supply. r1 is unaffected (both already explicitly true).

3. Sustained LiveSink drops page

New alert stellarindex_ingestion_ch_live_sink_drops (both rule trees): any drop increase over 10m, for: 10m, severity ticket — drops are self-healing via ch-live-catchup, so this is "the heal path is being exercised abnormally", not a page. A companion severity: page threshold fires when drops persist for 1h (the heal path is losing the race — lake tail integrity at risk).

4. CH current-state reads get a staleness signal (spec)

ledger_entries_current and supply_flows reads serve whatever the lake holds, with no indication when ch-live-catchup is behind. The committed follow-up: readers surface the lake's contiguous watermark (ContiguousWatermark, already computed for the projector clamp) as an as_of_ledger on supply/state responses, and the API's flags.stale fires when tip - as_of_ledger exceeds the freshness window. Implementation rides with the reconciliation harness work (board #14) — the same watermark plumbing serves both.

Consequences

Fresh deployments capture the lake by default; the "certified raw

history" claim stops being config-conditional.

A crash's loss window remains, but is bounded, detected (strict

per-ledger verdict), healable (lake re-derive), and now *visible within minutes* (drop alert) instead of at the next verdict run.

Operators explicitly opting out of ClickHouse own the consequence

in their config file, in writing.