release: integration/all-fixes → main (v0.7.2 → v0.8.0, 178 PRs)#345
Conversation
Pre-merge review caught two findings on PR #179. Both fixed here: H1 — bundleCid forensic ambiguity. The synthetic `local-rescan-{addr}-{detectedAt}` marker collided when two distinct tokens were probed in the same millisecond (the storage key still differs because it's (addr, tokenId, observedTokenContentHash), so no data-loss path — but the `bundleCidsObserved` accumulator on each record's AUDIT entry would carry the same marker for both, reducing forensic fidelity at operator replay). Fix: include the local token id slice in the bundleCid → `local-rescan-{addr}-{tokenId12}-{detectedAt}`. Test strengthened to assert the tokenId portion is present. H3 — narrow null-writer window. The reviewer noted that `installSpentStateAuditWriter` is called AFTER `payments.initialize()` in Sphere's bootstrap, and `initialize()` starts the rescan worker. In theory, a probe could fire before the writer is installed and silently skip the AUDIT write. In practice this window is zero because (a) the worker's first probe fires at `intervalMs` = 5 min after `start()`, well after Sphere bootstrap completes; (b) the closure reads the field lazily AT PROBE TIME, not at bind time, so any install-order works as long as the writer is in place when the first off-record-spend is detected. Fix: documented the lazy-read pattern in a code comment at the `_spentStateAuditWriter` lookup site so future readers don't repeat the same race-analysis question. Behavior unchanged. All 14 tests in spent-state-rescan-default-closure.test.ts still pass; typecheck clean.
…isposition-writer feat(payments)(#174): DispositionWriter wiring for spent-state-rescan AUDIT route
#4/#7/#10/#11/#12) Audit on 2026-05-20 confirmed these items have been LANDED in code but were still marked OPEN in the follow-up tracker. Pure documentation hygiene — adds a "Status (2026-05-20)" banner block at the top of each section that points to the as-implemented code location and the existing tests / cross-references. - #1 Aggregator cross-check before orphan recovery: code at `PaymentsModule.defaultOrphanRecovery` (~3725-3800) cross-checks `oracle.isSpent` before flipping status, with the three branches the acceptance criteria required (UNSPENT → restore, SPENT → manual, throw → manual). - #3 SentLedgerWriter.contains() in-memory index: lazy tokenIndex + entryTokenIds maps populated on first call via ensureIndex(); cost-contract test updated. O(1) miss, O(b) hit. - #4 Storage GC for tombstones: gcExpiredTombstones on both writers + TombstoneGcWorker module; also wired into the snapshot builder via the gcExpiredTombstones hook (Item #15 Phase F, commit `0f530eb`). - #7 lamport:0 synthetic placeholder: writeSentEntryFromOutbox refactored to accept OutboxCreateInput (no _schemaVersion/lamport fields); the foot-gun is gone, the type system prevents reintroduction. - #10 Vector vs per-entry-key design: resolved by Item #15 (per- entry-key wins; Lamport+tombstone is the load-bearing JOIN merge function at snapshot-pull time). - #11 Operator runbooks: docs/uxf/RUNBOOK-SEND-PIPELINE.md covers all 5 #166 events + the 3 post-#166 additions (#2, #16). - #12 Consumer-facing event API docs: CLAUDE.md Key Events table + SphereEventMap JSDoc covers all 8 events. Original section bodies preserved as historical context. The status banner uses the same "> Status (date): LANDED ..." pattern already established in Item #16 (which itself documented the spent-state rescan landing across PRs #176/177/178/179). Note: `docs/uxf/ISSUE-174-PROMPT.md` and `ITEM-15-OPERATIONAL-CLOSURE-PROMPT.md` are untracked (existed before this branch). Left alone for the next cleanup pass.
…t-ON Item #5 partial: the second of the three soak-gated flags now flips to default-ON. Item #1's aggregator cross-check prerequisite (`defaultOrphanRecovery` queries `oracle.isSpent(sourceStateHash)` before flipping status) is satisfied — the safety contract the original default-OFF gate was waiting on is in place. Without this flip a crashed send (process death between `commitSources` returning and the OUTBOX entry persisting) leaves the source token at `'transferring'` indefinitely. The load-tail orphan sweeper emits `transfer:orphan-spending-detected` but takes no action; the operator must intervene manually. With the flip, the cross-checked recovery hook runs automatically: - aggregator UNSPENT → safe to restore (flips `'transferring'` → `'confirmed'`, persists, fires `transfer:orphan-recovered`). - aggregator SPENT → escalates to `'manual'` (commit DID land on-chain; local restore would diverge; operator triage required). - aggregator RPC throws OR state hash unparseable → fail-closed to `'manual'`. Test update: `detect-orphan-spending-wrapper.test.ts` had one test that exercised the legacy "default-OFF → no attemptRecovery wired" behavior by OMITTING the flag. That test now explicitly sets `orphanAutoRecovery: false` (renamed from "default-OFF" to "explicit OFF" in the test title). All 14 tests in that file pass, plus the broader 1,514-test regression sweep across payments/transfer/integration suites. OUTBOX-SEND-FOLLOWUPS Item #5 status updated to PARTIAL with the two flipped flags called out and the two still-deferred flags (`tombstoneGcWorker`, `nostrPersistenceVerifier`) flagged as non-blocking optimisation surfaces.
…-default-on feat(payments): flip features.orphanAutoRecovery default-OFF → default-ON
Closes OUTBOX-SEND-FOLLOWUPS Item #14 Phase 2 work item 5 (JOIN→local- Token correction). User-visible bug: a loser device's `unconfirmedAmount` was inflated indefinitely by the in-flight `'transferring'` token from a failed multi-device double-spend send, until manual operator intervention. Root cause: `loadFromStorageData` preserves in-memory tokens that storage doesn't overwrite (the NEVER-WIPE invariant). When the preserved snapshot has the SAME genesisTokenId as a storage token but a DIFFERENT current state hash, the previous code restored BOTH (the "different state, fork or finalization race" branch). For the multi-device double-spend scenario where the L3 aggregator has ALREADY arbitrated against the local in-flight send, "preserving the loser" is wrong — the value is gone from the wallet's perspective. Fix: tighten the restore branch in loadFromStorageData. When the snapshot token is at status='transferring' AND a storage token with the same genesisTokenId at a DIFFERENT stateHash exists, drop the snapshot (don't restore) and emit `transfer:double-spend-detected`. Reuses Item #14 Phase 1's reactive surface event — operators see the same event from EITHER the submit-time STATE_ALREADY_SPENT_BY_OTHER throw OR this JOIN-time discovery. For non-'transferring' snapshot statuses (`'confirmed'`) the legacy dual-state restore is preserved — Item #16's spent-state rescan worker (default-ON post-soak) catches the stale state on its next ~5min `oracle.isSpent` probe and routes through `defaultSpentStateTransition` (archive + tombstone + map-delete via removeToken). The orthogonal coverage is intentional: load-time JOIN divergence handles the active-send race; periodic rescan handles the sibling-device passive case. Tests: 2 new tests in `PaymentsModule.never-wipe.test.ts`: - drops `'transferring'` snapshot with divergent state + emits the event with the loser's tokenId/stateHash/empty-recipient. - preserves `'confirmed'` snapshot with divergent state (legacy dual-state behavior; spent-state rescan handles the cleanup). Plus the existing 5 NEVER-WIPE tests unchanged. Broader sweep: 1,587 tests pass across payments/transfer/modules suites — no regressions. OUTBOX-SEND-FOLLOWUPS Item #14 Phase 2 status updated: work item 5 LANDED; work items 7 (orphan sweeper disambiguation) and 8 (`getAssets`/balance regression test) remain open as forensic / observability surfaces — not correctness paths now that work item 5 closes the `unconfirmedAmount` inflation.
…sdkData-stability invariant Pre-merge review caught two findings on the JOIN-divergent loser detection. Both fixed here: H1 — Tombstone the dropped loser. When the JOIN-divergent branch fires, the previous implementation silently evicted the loser 'transferring' token with no durable record. A process restart between drop and event-consume would leave no recoverable audit trail, AND a stale remote storage source could re-sync the dead state back into the active pool on a future load. Fix: create a tombstone for the (tokenId, stateHash) pair BEFORE the event emit, using the same `createTombstoneFromToken` + `tombstoneKeySet` pattern that `removeToken` (~line 9512) uses. Test strengthened to assert the tombstone is present after the drop. H2 — Document the sdkData-stability invariant. The correctness of the JOIN-divergent loser discriminator depends on `Token.sdkData` NOT being mutated between the `'confirmed' → 'transferring'` flip and the subsequent `save()`. Today no code mutates sdkData on outgoing source tokens during the send flow (incoming tokens get synthetic pending-tx appended in `addToken`, but outgoing source tokens stay at their last-finalization stateHash through the transition). A future refactor that appends synthetic pending-tx to outgoing sources would silently break the discriminator and cause legitimate in-flight sends to be DROPPED as false-positive multi-device race losers. Fix: add a load-bearing JSDoc comment at the primary `'transferring'` flip site (~line 5275) explaining the invariant and pointing at the JOIN-divergent loser branch. Future refactors will see the warning before they trip it. All 7 tests in `PaymentsModule.never-wipe.test.ts` pass (1 strengthened); typecheck clean.
…-status feat(payments): JOIN-divergent loser detection in loadFromStorageData (Item #14 Phase 2 work item 5)
… 14, 15 Self-contained next-wave resumption guide for the remaining open items in OUTBOX-SEND-FOLLOWUPS.md after the 2026-05-20 production-readiness wave landed (PRs #176–#182). Structure (mirrors the pattern established by ISSUE-174-PROMPT.md and ITEM-15-OPERATIONAL-CLOSURE-PROMPT.md): - Audience, branch baseline, scope statement. - Recently-shipped context so the next agent doesn't re-do landed work. - Dependency graph showing item ordering (e.g. Item #6.a unblocks Item #2 closure; Item #14 work item 7 builds on Item #5). - Recommended ordering by time budget (1h, 3h, 1d, 2d, 4d, 1wk+). - Workflow conventions (branch off integration/all-fixes, run typecheck + eslint + vitest per phase, adversarial review before merge, Conventional Commits with scope, update OUTBOX-SEND- FOLLOWUPS status banner on ship). - Per-item scope/files/acceptance/test-plan/gotchas/review-checklist for items 5, 6.a, 2 (final closure), 8, 9 residual, 14 Phase 2/3 residual (split into Phase 3 stale-comment, Phase 2 work item 8 balance test, Phase 2 work item 7 orphan-sweeper disambiguation), and 15 B.4 manifest (phased: OrbitDB migration → JOIN primitive extension → snapshot dispatcher wiring). - Risk register cross-cutting all items. - Adversarial review pattern template (Agent + code-reviewer subagent) that worked across the recent wave. - "After clearing context — first steps" runbook for the next agent. All anchors (line numbers, files, function names) are verified against the current `integration/all-fixes` HEAD (`309477d`). This doc enables the next agent to pick up cold without re-deriving the dependency graph or re-reading every PR in the wave.
…ps-handoff docs: handoff prompt for OUTBOX-SEND-FOLLOWUPS items 2, 5, 6.a, 8, 9, 14, 15
OUTBOX-SEND-FOLLOWUPS item #5 — second flag in the soak-gated wave to flip default-ON after PR #178 (spentStateRescan) and PR #181 (orphanAutoRecovery). The TombstoneGcWorker reclaims OrbitDB log bytes by replacing tombstone markers older than `retentionMs` (default 30 days) with `db.del()` calls. The 30-day default is conservative — longer than any realistic concurrent-replica pre-sync window per Issue #166 P1 #2 safety contract — so swept slots cannot be resurrected by a stale replica. The worker self-skips when no OUTBOX/SENT writer is installed, so default-ON is a safe no-op for legacy-only wallets. Changes: - `modules/payments/PaymentsModule.ts` — flip `?? false` → `?? true` at the default-feature block; update inline comment and the type JSDoc to reflect default-ON (mirrors PR #178's spentStateRescan pattern). Also updates the auto-install path's inline comment to stop describing the gate as default-OFF. - `modules/payments/transfer/tombstone-gc-worker.ts` — update module-level JSDoc to reflect default-ON. - `docs/uxf/RUNBOOK-SEND-PIPELINE.md` — config-reference table now reads `tombstoneGcWorker: true`. Also fixes stale `orphanAutoRecovery: false` line as adjacent doc hygiene (PR #181 flipped the code default but missed this table row). - `docs/uxf/OUTBOX-SEND-FOLLOWUPS.md` — Item #5 banner extended. - `tests/unit/modules/PaymentsModule.tombstone.test.ts` — add `afterEach(destroy())` and explicit `features.tombstoneGcWorker: false` to suppress the now-default 24h setTimeout handle in this read/merge-only suite (caught by adversarial review). Verified: typecheck clean, eslint clean (pre-existing warnings only), 1423 transfer tests pass, 21 tombstone-test tests pass.
OUTBOX-SEND-FOLLOWUPS item #5 — fourth and final soak-gated flag to flip default-ON, completing the wave (after PR #178 spentStateRescan, PR #181 orphanAutoRecovery, PR #184 tombstoneGcWorker). The NostrPersistenceVerifier periodically re-queries the relay set for SENT-ledger entries' nostrEventId to detect retention drops (events accepted at publish but later evicted by retention policy, relay restart, or segregation). On 'missing' outcome the verifier re-arms the OUTBOX entry to 'sending' so the recovery worker republishes via Item #2's path. Query traffic is proportional to eligible SENT volume with an LRU- bounded cap and per-entry cooldown (default 5 minutes); the worker self-skips wallets with no `nostrEventId`-tagged SENT entries. Changes: - `modules/payments/PaymentsModule.ts` — flip `?? false` → `?? true` at the default-feature block (line 1626); update inline comment + type JSDoc to reflect default-ON. - `docs/uxf/RUNBOOK-SEND-PIPELINE.md` — config-reference table now reads `nostrPersistenceVerifier: true`. Also fixes adversarial- review H1: two stale `default-OFF` mentions of orphanAutoRecovery in the `transfer:orphan-spending-detected` and `transfer:orphan-recovered` operator sections (PR #181 flipped the code default but missed these in the same RUNBOOK file). - `docs/uxf/OUTBOX-SEND-FOLLOWUPS.md` — Item #5 banner now SHIPPED (all four flags flipped); "How to NOT resume" rewritten to capture the post-flip protocol consequences. - `tests/unit/modules/payments/detect-orphan-spending-wrapper.test.ts` — add `nostrPersistenceVerifier: false, tombstoneGcWorker: false` to every features block (10 createPaymentsModule callsites). This is the H2 follow-up from adversarial review: the file does not use fake timers and the verifier's 5-min setTimeout would otherwise leak as an open handle under CI `--detectOpenHandles`. Verified: typecheck clean, 3103 tests (tests/unit/payments/transfer + tests/unit/modules) pass.
…nd trigger sources OUTBOX-SEND-FOLLOWUPS Item #14 Phase 3 — pure docs cleanup. No code behavior changes; only references to as-implemented state. Closes work item 6 from the Item #14 punch list (stale comment + CLAUDE.md / RUNBOOK additions referenced by work items 4 and 6). Changes: - `profile/pointer-wiring.ts:30-47` — replace stale per-token JOIN resolver caveat (which claimed Rules 3 + 4 were absent) with a forward reference to the as-implemented surfaces: * `resolveTokenRoot` (`uxf/token-join.ts:210`) — exported, unit- tested, callers in `UxfPackage.merge()` (`~785`) and `conflict-merger.ts` (`~351`). * `PaymentsModule.loadFromStorageData` JOIN-divergent loser branch (`~15112`, PR #182) — drops superseded `'transferring'` snapshots with a tombstone. * `transfer:double-spend-detected` reactive surface (Item #14 Phase 1). - `CLAUDE.md` Key Events table — add two missing rows that the recent waves added but never back-ported: * `transfer:double-spend-detected` — names BOTH trigger sources (reactive submit-time on `STATE_ALREADY_SPENT_BY_OTHER` and JOIN-time on snapshot loser detection). * `transfer:off-record-spent` — Issue #174 / spent-state rescan. - `docs/uxf/RUNBOOK-SEND-PIPELINE.md` "Companion events" block under `transfer:off-record-spent` — refresh stale "fires only when YOU attempt a send" wording to describe both trigger sources. - `docs/uxf/OUTBOX-SEND-FOLLOWUPS.md` Item #14 status — Phase 3 banner flipped to LANDED. Verified: typecheck clean, lint clean for `profile/pointer-wiring.ts`.
OUTBOX-SEND-FOLLOWUPS Item #14 Phase 2 work item 8: pin the post- PR-#182 JOIN-divergent loser drop contract through getAssets(). Pre-PR #182 the loser device's `unconfirmedAmount` was inflated indefinitely by the in-flight `'transferring'` token from a failed multi-device double-spend send (aggregateTokens includes `'transferring'` in the unconfirmed bucket). PR #182 fixed the root cause by dropping the loser from `this.tokens` entirely; the loser should be absent from `confirmedAmount`, `unconfirmedAmount`, `totalAmount`, and every per-bucket count. This test pins that contract end-to-end through the `getAssets()` public API surface — not just the internal token map. A regression that re-introduces the loser (whether by restoring the snapshot or via a future refactor that bypasses the JOIN-divergent branch) will fail at `expect(uct.unconfirmedAmount).toBe('0')`. The test reuses the loser/winner fixture from the existing JOIN- divergent loser tests (added by PR #182), so the input shape stays in lockstep — the only new assertion surface is the `getAssets()` output. Also updates `OUTBOX-SEND-FOLLOWUPS.md` Item #14 status banner: Phase 2 work item 8 LANDED; only work item 7 (orphan sweeper disambiguation) remains open. Verified: 8 tests pass in `PaymentsModule.never-wipe.test.ts` (7 pre-existing + 1 new); typecheck + lint clean.
…get) OUTBOX-SEND-FOLLOWUPS Item #6.a — close the prerequisite for Item #2's final closure. Inline-CAR sends now leave a local IPFS pin on the sender's node so Item #2's retention re-publish closure can downgrade 'car-over-nostr' republishes to CID-shape unconditionally. Design choice: resolver stays a pure decision function. The `DeliveryDecision`'s `inline` shape gains a `shouldPin?: boolean` field set to `true` iff a `publishToIpfs` callback was wired on the resolver call. Orchestrators (conservative-sender, instant-sender) read this flag after `resolveDelivery` returns and fire-and-forget a parallel `publishToIpfs(carBytes)` call. Strictly fire-and-forget: - Pin failure MUST NOT block the send (wire delivery is already inline — recipient is not waiting on the pin). - The pin runs in parallel with the rest of the wire-envelope construction; the orchestrator never awaits it. - Idempotent: re-running publishToIpfs for the same CAR bytes is a no-op at the IPFS layer (content-addressed). - Trampoline through `Promise.resolve().then(...)` so even a non- async publisher that throws SYNCHRONOUSLY is still caught. Three inline-returning branches in `delivery-resolver.ts` set `shouldPin` consistently via the new private `inlineDecision()` helper: - `force-inline` — `shouldPin: true` if publisher wired. - `auto`-within-cap — `shouldPin: true` if publisher wired (coexists with `clampInfo` on the same shape). - `carInlineFallback` — always `shouldPin: undefined` by construction (this branch is reachable ONLY when `publishToIpfs` is absent; documented as a load-bearing invariant). CID branches are untouched — the resolver still calls publishToIpfs internally for them, and `shouldPin: true` on the CID shape preserves its existing contract. Files: - `modules/payments/transfer/delivery-resolver.ts` — type extension + three call sites + new `inlineDecision()` helper. - `modules/payments/transfer/conservative-sender.ts` — new "Step 8.5" block (fire-and-forget pin); logger import. - `modules/payments/transfer/instant-sender.ts` — same block after the CID pinned-outbox write; logger import. - `tests/unit/payments/transfer/delivery-resolver-pin.test.ts` — 11 new tests covering the `shouldPin` contract across all three inline branches plus the resolver-purity guarantee. Adversarial review pre-merge applied two non-blocking findings in-PR: - H1: `Promise.resolve().then(() => publish(carBytes)).catch(...)` micro-task trampoline against synchronous-throw publishers. - H2: load-bearing invariant comment on `carInlineFallback` documenting why `shouldPin` is unconditionally `false` there. Verified: 3,181 tests pass (tests/unit/payments + tests/unit/modules); typecheck clean; pre-existing lint warnings only. Next in stack: Item #2 final closure — downgrade the default `republish` closure in PaymentsModule for `'car-over-nostr'` entries to produce CID-shape re-publishes (the pin from this PR makes the CID fetchable).
…nal closure) OUTBOX-SEND-FOLLOWUPS Item #2 final closure. The default `republish` closure in `PaymentsModule` now produces a `'uxf-cid'` payload for both `'cid-over-nostr'` AND `'car-over-nostr'` OUTBOX entries — Item #6.a (PR #188) flipped inline-CAR sends to also pin the bundle to the sender's local IPFS node, so the CID is fetchable for all new entries. Closes the long-standing throw arc that left CAR-mode retention re-publishes stuck at `'failed-transient'` with no recovery path. Code changes: - `modules/payments/PaymentsModule.ts:~1937-2040` — merge `case 'car-over-nostr':` into the `'cid-over-nostr':` fall-through. Both arms produce the same `'uxf-cid'` payload (advisory `mode` field preserved; `'txf'` → `'instant'` mapping unchanged). `'txf-legacy'` and `default:` retain their throws (single-token legacy wire shape / exhaustiveness sentinel). - `tests/unit/modules/payments/recovery-worker-shim.test.ts` — flip the `'car-over-nostr'` test from `'throws (transport never called)'` to `'downgrades to kind: uxf-cid'`. Asserts payload shape, status transition to `'delivered'`, and SENT-write fires (the recovery arc reaches terminal success). Doc changes: - `docs/uxf/OUTBOX-SEND-FOLLOWUPS.md` — Item #2 banner now SHIPPED; references PRs #188 (#6.a prereq) and #189 (this closure). - `docs/uxf/RUNBOOK-SEND-PIPELINE.md`: * `transfer:retention-republish-rearmed` — new operator-action bullet for the cross-restart livelock signal (legacy pre-#6.a entries lacking a local pin). Provides three intervention paths: re-pin manually, accept-and-close, or install a custom `republish` closure. * Same section — new note clarifying that `'delivered'` after a retention re-publish confirms relay reach, NOT recipient bundle fetch. For pre-#6.a entries this distinction matters; for post- #6.a entries the recipient's CID-fetch should succeed. * `transfer:retention-republish-skipped` `'transition-failed'` row — the historical CAR-throw cause is GONE; other transient causes remain. Trade-off analysis (per adversarial review C1 + H1): - Post-#6.a CAR entries: pin exists locally → recipient CID-fetch succeeds → clean recovery (the common case). - Pre-#6.a CAR entries: no local pin → recipient CID-fetch fails → verifier's next-cycle retention probe detects `'missing'` again → re-arm → re-publish loop. Bounded per session by the verifier's `checkedIds`; cross-restart unbounded but SentLedgerWriter is per-id idempotent so no SENT-ledger growth. Operator-invisible without verifier; mitigated via the new RUNBOOK livelock-detection guidance. - Custom `republish` closures (e.g. with richer pin-availability signals) are unaffected — install via `installSendingRecoveryWorker()` to restore strict-throw semantics if needed. Verified: typecheck clean, pre-existing lint warnings only, 3,115 tests pass (tests/unit/payments/transfer + tests/unit/modules), 3 retention-republish-after-snapshot-join integration tests pass.
…gc-default-on feat(payments)(#5): flip features.tombstoneGcWorker default-ON
…istence-verifier-default-on feat(payments)(#5): flip features.nostrPersistenceVerifier default-ON
…comments docs(#14): Phase 3 stale-comment cleanup — JOIN resolver + double-spend trigger sources
…balance-regression test(payments)(#14): Phase 2 work item 8 — getAssets balance regression
…fs-pin feat(payments)(#6a): pin inline-CAR sends to local IPFS (fire-and-forget)
The default recipient finalization worker built by buildDefaultFinalizationWorkerRecipient previously wrote a placeholder manifest entry (rootHash = 32 zero bytes, status = 'pending') inside its aggregatorClient.poll callback whenever a proof was returned for the first time on a tokenId. This contradicted the §5.5 step 5 4-step write-order contract: ownership of the manifest entry is assigned to step2ManifestCidRewrite, which CASes on RequestContext.previousCid. The recipient enqueue path populates previousCid as undefined (the genesis case), which step 2 translates to prev = null — asserting "no entry exists" in the manifest store. The placeholder violated that assertion: step 2 read the placeholder, returned cas-mismatch (observed.rootHash != newCid, so the idempotency-skip branch did not fire), and threw ManifestCidRewriteCasError on every received token. Symptom in the wild (issue #195): the escrow swap deposit flow stalls at PARTIAL_DEPOSIT because the deposit token never flips to 'confirmed'. Sequential deposits ~50 s apart still reproduce — every poll independently re-traps the same CAS failure. Casual receives also break, but the bug was masked because the local Token still shows in the UI and users do not see the silent 'pending' stuck-state unless a downstream state machine (swap, invoice attribution) gates on confirmation. Fix: remove the placeholder write. step 2's CAS then runs cleanly against the empty store, accepts prev = null, and inserts the canonical first entry via writeEntry. Companion fix (profile/per-token-mutex.ts): the bounded-hold strategy logged EVERY detached-fn rejection as "detached fn rejected after timeout", even when the rejection arrived in microseconds (long before the timer fired). That wording sent operators chasing imaginary hold-time blowups when the real failure was a synchronous error inside fn (the CAS failure above, for example). Gate the warn on the timer actually having fired so the awaiter's own error flow surfaces pre-timeout rejections without log duplication, while post-timeout rejections still surface for disk-full / quota-exceeded observability. Tests: - tests/unit/payments/transfer/issue-195-recipient-manifest-placeholder.test.ts (6 tests) — pins the genesis contract (with vs. without placeholder), the source guard against reintroducing the placeholder, and the pre-vs-post-timeout log gating on PerTokenMutex. All 172 transfer + profile unit test files (3211 tests), 93 module unit test files (1681 tests), and 21 transfer integration test files (74 tests) pass. typecheck clean. lint of touched files clean.
…onWriter After the CAS-mismatch fix unblocked step 5's manifest CID rewrite for inbound deposits, the swap settlement E2E (sphere-cli #16) still hung at PARTIAL_DEPOSIT. Root cause: the recipient finalization worker's dispositionWriter VALID branch flips the local Token to 'confirmed' and persists it, but never emits `transfer:confirmed`. AccountingModule listens for `transfer:confirmed` to mark invoice ledger entries as `confirmed` and re-fire `invoice:covered` with `confirmed: true`. Without that event, `allConfirmed` stays false, the second `invoice:covered` never fires, and any downstream consumer that gates on confirmation (escrow swap orchestrator, payment-request flow, etc.) stalls indefinitely waiting for aggregator confirmation that silently already happened. The send-side `transfer:confirmed` emit (PaymentsModule.ts:5677) and the NOSTR-FIRST inbound path (line 13928) and the V5 resolver (line 8180) all already emit. Only the recipient T.5.C dispositionWriter was missing it — making this a default-recipient-worker gap that has been latent since #151 / Task 151 (Wave 2 bootstrap). Fix: emit `transfer:confirmed` from both the main success path (after `await save()` succeeds + ctx delete) and the stClient/trustBase-missing fallback path (status-flip-only save+ctx delete). Payload shape mirrors the existing emit sites. Test update: widened the Wave 5 source-inspection regex's post-delete window from 400 to 1200 chars to accommodate the new emit block. The structural invariant the test pins (delete is AFTER save(), INSIDE the try block) is unchanged. E2E verification: - sphere-cli PR #16 `E2E_RUN_SWAP_FULL=1` run 6 against locally-built escrow:v0.4-issue195-local: all 3 tests passed. Full settlement: 147s (well under the 600s budget). - Swap announced 23:02:08 - Bob deposit 23:02:14, alice deposit 23:02:42 - invoice:covered confirmed=true → deposit invoice closed → payouts 23:03:15 (33s after second deposit) - "Swap completed successfully" 23:03:25 (43s after second deposit) - No CAS-mismatch, no PerTokenMutex bounded-hold timeout fired. Test suites: all 4892 transfer + profile + module unit tests pass.
… sdkData caveat
Addresses two findings from the steelman review of the previous two
commits on this branch:
1. **Test gap**: the new regression file
(issue-195-recipient-manifest-placeholder.test.ts) covered the
placeholder removal but not the `transfer:confirmed` emit added by
the dispositionWriter follow-up. A future regression could silently
remove the emit and the source-inspection regex would not catch it.
Adds 4 unit tests under section I of
PaymentsModule.wave4-regressions.test.ts (which already has the
vi.mock infrastructure for state-transition-sdk):
- Main success path → emits exactly one transfer:confirmed with the
finalized token (sdkData = finalized form).
- stClient/trustBase-missing fallback path → emits exactly one
transfer:confirmed with the pre-finalized token (sdkData stays in
sender-predicate form — pinned explicitly to document the caveat).
- Main path save() throws → NO transfer:confirmed emit (the emit
lives inside `try { await save(); ... emit; }` and a save throw
short-circuits). operator-alert still fires.
- Fallback path save() throws → NO transfer:confirmed emit, same
contract.
2. **Documentation gap**: the fallback-path emit fires with a Token
whose `sdkData` is in SENDER-PREDICATE form (the recipient never
ran `finalizeTransferToken` — stClient/trustBase were missing).
The token is correctly marked 'confirmed' for accounting purposes
but is NOT spendable until the NOSTR-FIRST finalization path
overwrites `sdkData` with the recipient-predicate form. Listeners
that read `sdkData` for spend operations must guard against this
intermediate state.
Adds a CAVEAT block in the in-source comment at the fallback emit
site documenting this contract.
All 4942 transfer + profile + module unit tests pass. typecheck clean.
…anifest-placeholder-cas fix(payments)(#195): unblock recipient finalization for inbound deposits
Single SDK routine that walks a Token's `sdkData.transactions` chain and attaches an aggregator inclusion proof to every entry whose `inclusionProof` is null/missing. Idempotent (no-op + same reference when chain is fully finalized). Reused by every wallet path that needs to ensure a token's chain is fully proof-attached before being shipped or verified. Why a single routine: a local `status === 'confirmed'` flag is INDEPENDENT of `sdkData.transactions[*].inclusionProof` completeness (per Issue #197 root cause). The recipient's `Token.verify(trustBase)` walks EVERY tx; one null proof rejects the whole token. Centralizing the walker + attachment logic prevents per-path drift. Public surface: - finalizeSourceTokenChain(token, oracle, opts?) — high-level routine, the entry point callers should use. - extractPendingChainFromSdkData(json) — pure helper, also used on the recipient ingest path to detect proofless intermediates in arrived bundles. - extractPendingSourceChain(token) — token-shaped wrapper. - derivePendingTxDescriptor(pendingTx) — SDK requestId derivation mirroring the recipient ingest path; kept in lockstep. - applyProofToSdkData(json, txIndex, proof) — pure patcher. Internally driven by preflightFinalize: per pending tx the routine derives (requestId, transactionHash) from the tx data, probes the aggregator (no re-submit; the previous owner already anchored the commitment), and patches a working sdkData JSON in lock-step. `SOURCE_CHAIN_HARD_FAIL` propagates verbatim on irrecoverable failure. 26 unit tests cover the pure helpers (extract/apply variants, edge cases including null-data placeholder skip, malformed JSON, all-null, mixed chains, source-order preservation). The SDK-bound derivation path is exercised by the integration tier (tests/integration/transfer/conservative-end-to-end.test.ts).
Wire the standard finalizeSourceTokenChain routine into two paths: 1. dispatchUxfConservativeSend.selectSources (PRIMARY FIX): finalize every pending tx in every selected source's chain BEFORE marking them transferring and BEFORE bundle construction. Closes the reported escrow → trader payout hang where conservative-mode bundles shipped with proofless intermediate txs, the recipient's SdkToken.fromJSON threw, the throw was silently swallowed by an outer catch, and every subsequent Token.verify(trustBase) rejected. The previous no-op preflight relied on SpendQueue's `status === 'confirmed'` filter as the sole gate. Per Issue #197 root cause this is unsafe: a locally-confirmed token can still carry proofless txs (e.g. via the recipient dispositionWriter fallback flip from Issue #195, or an instant-mode arrival whose deferred worker never ran). The conservative-sender's preflightOptions remains a documented no-op — the finalization happens earlier so we get fresh Token references through directSources/splitSources without fighting the readonly `Token.sdkData` contract. 2. Recipient ingest (SECOND LINE OF DEFENSE): when an arriving bundle's source chain contains proofless intermediates, proactively attempt repair via finalizeSourceTokenChain. On unrecoverable failure surface a structured `transfer:operator-alert` (code='proof-throw') with the offending tx indexes and abort ingestion — replaces the previous silent wedging behaviour. Behavior preserved on the happy path: tokens whose chains are already fully finalized return the same reference (no allocation, no aggregator round-trip) so the per-send overhead is one map lookup plus a JSON parse on the source's sdkData. All 1480 transfer-tier tests pass; full suite 7828 passing / 13 skipped / 0 failures.
…ith real SDK fixtures
Exercises the SDK-bound derivation path (`derivePendingTxDescriptor`
calls `TransferTransactionData.fromJSON` + `PredicateEngineService.createPredicate`
+ `RequestId.create`) that the pure-helpers unit tier skips.
Three scenarios:
1. Token with real SDK transferTxJson + inclusionProof:null → mock
aggregator serves the proof → routine returns a NEW Token with
sdkData patched at the right tx index, updatedAt advanced.
2. Token whose chain is already fully finalized → routine returns
the SAME reference (idempotency contract); aggregator never
called.
3. Aggregator returns null past retry budget → routine raises
`SOURCE_CHAIN_HARD_FAIL`.
Live happy-path coverage (sender → recipient via real Nostr +
aggregator) already exists in tests/e2e/uxf-send-receive.test.ts;
conservative-mode scenario re-run after the fix shows status=completed,
0 failed, Bob's re-spend works.
…d-local-infra-images feat(infra)(#321): ssl-manager-wrapped local-infra images for relay/faucet/aggregator
The send-path gate from PR #312 hard-refused `payments.send()` with SphereError('OFFLINE') whenever the connectivity probe read 'down'. A transient testnet aggregator outage during the CLI soak (§C.2 — Bob pays invoice) propagated to this refuse, and the same code path surfaced as POST /goggregator-test → 400 spam in the browser load hang investigation. The probe layer is a Sphere-SDK invention: state-transition-sdk exposes no health/ping/round API. Its StateTransitionClient public surface is submitMintCommitment / submitTransferCommitment / finalizeTransaction / getInclusionProof / isStateSpent / isTokenStateSpent / isMinted. The standard ST-SDK pattern is "call the real op; transport throws JsonRpcNetworkError on failure" — no preflight probing. Refusing on a probe blocks sends a recovered aggregator would accept (between probe and submit) and re-implements behavior ST-SDK does not have a contract for. Change: - modules/payments/PaymentsModule.ts: replace the OFFLINE throw with a logger.warn; gate now logs 'down' for operator visibility but always passes through to the dispatcher. - core/Sphere.ts: doc comment on `connectivity` getter — advisory semantics, ST-SDK rationale. - tests/unit/payments/connectivity-gate.test.ts: flip the 'down → throws' assertion to 'down → logs + proceeds', tighten the unwire test to verify the gate is consulted exactly once before unwiring. Kept for backward compat: - SphereErrorCode 'OFFLINE' remains in core/errors.ts — no caller emits it now, but removing the code is a separate type-surface break. - `configureConnectivityGate` / `sphere.connectivity` API are unchanged; only the throw is removed. Verification: - npx vitest run tests/unit/payments tests/unit/core/connectivity → 1587 / 1587 pass - npm run typecheck → 0 errors - npx eslint on touched files → 0 errors (pre-existing warnings only)
…isory fix(payments)(#312): soften connectivity gate to advisory
Browser Profile wallets used Helia's default `MemoryBlockstore` because neither the factory (`createBrowserProfileProviders`) nor the adapter configured a persistent blockstore. OrbitDB's level state IS persisted to IndexedDB, so the head CID pointer survived page reloads — but the blocks it referenced lived only in tab memory. After unload, every subsequent append failed with the same error, forever (the SDK's own JSDoc at `orbitdb-adapter.ts:1228-1232` describes the exact symptom). The companion `helia-blockstore-pin-shim` (#311) calls `pins.add(cid)` after every put — but pinning a memory blockstore is meaningless. The defence only works against a persistent backing store. Fix: in the browser branch (no `directory` configured AND `isBrowserEnvironment()`), dynamically import `blockstore-idb` and install `IDBBlockstore` into `heliaOptions.blockstore`, symmetric to the existing Node-side `FsBlockstore` path. DB name defaults to `'sphere-helia-blocks'` and is overridable via the new `OrbitDbConfig.browserBlockstorePath` for per-wallet isolation. Closes the OpLog-loss leg of #330. Cross-device durability is fixed separately by the inline pointer-publish gate (also #330).
PR #272 moved HEAD-verify of newly-pinned CIDs off the synchronous flush path (fixing an at-least-once Nostr replay loop under contended testnet). The side effect was that the aggregator pointer could be advanced — and `pendingPublishCid` cleared — before the just-pinned snapshot CID was durably fetchable from the operator gateway. A cross-device reader resolving the pointer would 404 on the snapshot CID, which is exactly the symptom in #330. Fix: add an inline HEAD-verify gate inside `publishAggregatorPointerBestEffort`. After a successful aggregator publish, before clearing `pendingPublishCid`, run a bounded HEAD-verify against the configured gateways. On gate timeout the publish is classified as transient (code `IPFS_NOT_YET_DURABLE`), the marker is KEPT for retry, and `storage:pending-publish` is emitted. The flush itself completes — the at-least-once Nostr ack gate is unchanged, only the pending-publish retry marker is held. Tunable via `ProfileConfig.pointerPublishDurabilityGateMs`: - direct-construction default: 0 (off, preserves legacy test behaviour) - factory default (`createProfileProviders`): 5_000 ms (production) Tests in `lifecycle-manager-publish-durability-gate-330.test.ts` cover all three branches (off / verify-succeeds / verify-times-out).
…y on migration
The Profile token-storage provider had no fallback when the primary
(OrbitDB) read returned empty or threw. After memory-blockstore
eviction + gateway 404, the wallet silently reported "0 tokens" —
losing tokens that WERE durable in the legacy IndexedDB token storage
from before the wallet migrated to Profile mode. The companion
identity-side `fallbackStorage` only covers ~8 named identity keys
(`MASTER_KEY`, `CHAIN_CODE`, ...); tokens had no analogous safety net.
Worse, migration step 5c actively wiped the legacy token IDB
(`legacyTokenStorage.clear()`), so even when the legacy DB still
contained tokens at migration time, those bytes were destroyed before
the new fragile Profile path took over.
Two-part fix:
1. **Preserve legacy on migration.** `profile/migration.ts` step 5c
no longer wipes the legacy token storage. Instead it writes a
`migration.migratedAt` marker into the legacy KV store. Token data
stays in place, available as a read-only fallback. The marker is
added to step 5b's preserve list so re-runs do not nuke it.
2. **Wire a runtime fallback path.** Add `fallbackTokenStorage` to
`SphereInitOptions` / `SphereLoadOptions` (the token-side analogue
of `fallbackStorage`). `Sphere.load()` propagates it via the new
`ProfileTokenStorageProvider.setFallbackTokenStorage(legacy)`
method. The provider consults the fallback at three sites inside
`load()`:
- `activeBundles.size === 0` AND no in-memory seed (would
otherwise return empty).
- All bundle fetches failed despite bundle refs existing.
- Outer catch (e.g. `LoadBlockFailedError` from missing OpLog
block — the `bafyreihmwunmk75i3h…` symptom in #330).
Strictly read-only: the fallback is never written to.
`addTokenStorageProvider` also propagates the fallback to newly-added
providers via duck-typed `setFallbackTokenStorage`.
Tests in `profile-token-storage-fallback-330.test.ts` cover the
positive path, no-fallback regression, read-only invariant, and the
null-fallback clear path. `migration.test.ts` updated to assert
"clear NOT called + marker IS written".
Adversarial code review of PR #331 surfaced three merge blockers and five real bugs in the original three commits. All addressed here. ## Blockers 1. **Static `blockstore-idb` import.** The original `await import('blockstore-idb' as string)` form left bundlers unable to statically discover the dependency. Consumer browser bundles (Vite/Webpack/esbuild) could silently omit blockstore-idb, making fix (a) a no-op in production. Removed the `as string` cast; the built browser entry now emits the literal `import("blockstore-idb")` which consumer bundlers will resolve and include. Also added a 5s timeout on `open()` so a stuck IDB upgrade (sibling tab holding an older version) doesn't hang init forever, and a `console.warn` on every failure path so operators see degraded mode. 2. **`Sphere.clear()` didn't wipe the fallback.** Without this, a user who called `clear()` to start over with the same mnemonic would see pre-clear tokens resurrected via the fallback wiring — a real data-integrity hazard. `clear()` options now accept `fallbackTokenStorage` and wipe it alongside the primary. 3. **No auto-wiring of fallback from migration marker.** The `migration.migratedAt` marker was written but no consumer code read it, making the migration change a pure storage leak. Added `createBrowserProfileProvidersAuto` — async sister of the sync factory — that probes the marker and constructs a legacy `IndexedDBTokenStorageProvider` as `fallbackTokenStorage` on the returned providers. `Sphere.load()` also warns loudly when the marker is present but no fallback was passed. ## Bugs - **Same-CID double leg.** `lifecycle-manager.ts:1407` called `verifyFlushDurability(cidString, cidString, ...)`, pushing two identical HEAD probes to the same gateway. Changed to `verifyFlushDurability(cidString, null, ...)` per the documented contract — bundle leg only. - **`setLastDiscoveredPointerCid` stamp order.** It ran BEFORE the inline gate, so on gate failure the downstream no-data flush short-circuit (`flush-scheduler.ts:700`) saw the unverified CID as the discovered pointer. Moved the stamp to AFTER the gate succeeds. - **`isMissError` didn't match `PutFailedError`.** `blockstore-idb@4.0.1` `dist/src/index.js:88` wraps GET errors as `PutFailedError` (an upstream misclassification). The helia-shim threw on transient IDB read errors instead of falling back to the HTTP block broker — breaking the exact OrbitDB replay path #330 sought to fix. Widened the matcher. - **`lastTokenManifest` not set on fallback returns.** All three fallback sites in `ProfileTokenStorageProvider.load()` bypassed the merge pipeline that normally populates `lastTokenManifest`, leaving a stale value visible via `getTokenManifest()`. Set to `new Map()` defensively on every fallback return. - **Double `storage:pending-publish` emit.** The inline gate emitted the event AND the FlushScheduler emitted it on the returned transient result. Suppressed the in-gate emit; the FlushScheduler remains the single source. ## Test additions - Added test for fallback site #2 (`no-bundles-fetched`) — monkey- patches `bundleIndex.listActiveBundles` to return a CID that fails to fetch, verifying the fallback path engages. - Added test for fallback site #3 (`load-error`, outer catch) — makes `listActiveBundles` throw and verifies the fallback rescues. ## Verification - `npx tsc --noEmit` — clean - `npm run build` — success; `dist/profile/browser.js` confirmed to contain literal `import("blockstore-idb")`. - `npx vitest run tests/unit/` — 8061 passed, 2 skipped, 0 failures.
…ability fix(profile)(#330): browser durability — IDBBlockstore + pointer-publish gate + token fallback
After PR #331 landed the IDBBlockstore in browser, the wallet finally boots correctly across reloads. But the user experience was sabotaged by a separate latent issue in the pin-shim: every block touched during an OrbitDB OpLog replay triggered `helia.pins.add` again, which Helia rejects with "Already pinned" — that's the EXPECTED outcome for a CID we already pinned, NOT an error. The shim was logging it at WARN level, producing hundreds of warns per second during boot which froze the page in DevTools (each `console.warn` is a synchronous stringify + render). Two fixes inside `schedulePin`: 1. **In-memory pre-check.** When the in-session `pinnedCids` Set already tracks the CID, skip the `pinsApi.add` round-trip entirely. Cheap, prevents the call from being made in the first place. 2. **Catch-side classification.** If `pins.add` rejects with "Already pinned" anyway (e.g. the Set was reset by a destroy / reconnect but Helia's pin records survive in the datastore), treat it as success: stamp the in-memory tracker so subsequent puts of the same CID short-circuit, and return without a warn. Together these eliminate the warn-spam completely: the first put of a CID hits `pins.add` and on success populates the Set; every subsequent put of that CID short-circuits at the pre-check. User-visible effect: the periodic ~10s freezes during wallet boot in browser disappear. The fix is also a pure performance + cleanliness win — the pin contract is unchanged, the warnings were already non-fatal. Tests added in `helia-blockstore-pin-shim.test.ts`: - Three identical-CID puts produce ONE pins.add call + three successes recorded (pre-check verified). - A pinsApi that throws "Already pinned" produces a success + no warn (catch-side verified).
…ed-noise fix(profile): silence "Already pinned" pin-shim noise (#330 follow-up)
…essage fan-out Addresses Sections B and C of the page-freeze review-agent findings (2026-05-29). Combines the WIP work originally split across two commits. == C: degrade CID_REF_UNREADABLE — replace 4 fatal throws with logger.warn + skip == modules/groupchat/GroupChatModule.ts had four sites that fatal-threw `ProfileError(CID_REF_UNREADABLE)` when a stored value carried a CID ref but the wallet was opened without a cidRefStore (legacy factory path). The throw propagated through Sphere.load's Promise.allSettled as "Module load failed", leaving the rest of the wallet alive but unable to recover GroupChat state. After the fix, each site logs a `[CID_REF_DEGRADE]` warn and starts with empty state for that key; relay re-delivery rehydrates via idempotent event handlers. Sites converted in load(): 1. groups (~line 356) 2. messages:<id> (~line 427) 3. members:<id> (~line 638) 4. processedEvents (~line 779) == B: per-message CID fan-out — bounded concurrency == The Pattern B index branch (~line 478) was `fetched.items.map(async (item) => cidRefStoreRef.fetchJson(item))` followed by `Promise.all(fetches)` — N parallel HTTP requests per group with no concurrency limit, no in-flight dedup, and a 30 s per-request fetch timeout. With multiple groups × thousands of message-CIDs, this fan-out was the primary driver of the `/sidecar/blob?cid=… 404` storm observed in the browser console on 2026-05-29 and exhausted the daemon socket pool under a degraded testnet. Replaced with `mapWithConcurrency(fetched.items, LOAD_FETCH_CONCURRENCY, fn)` capped at 4 concurrent fetches. Preserves the parallel-not-serial speed-up while leaving headroom for the rest of the page (transport, payments, profile-storage) on a shared gateway. == Tests == Two existing tests asserted the throw behavior and were updated to assert the degrade path: tests/unit/modules/GroupChatModule.cidref.test.ts:470 (groups key) tests/unit/modules/GroupChatModule.cidref.test.ts:1053 (processedEvents) 50/50 GroupChatModule tests pass. tsc clean.
…erate transient lines in §D.5 byte-compare Two false positives that surfaced in the 2026-05-29 page-freeze soak comparison between c1f2ac0 (integration/all-fixes) and 7a12ac8 (PR #327): == CBOR "simple values are not supported" — NOT a regression == `SentLedgerWriter` writes ciphertext directly via `db.put(key, bytes)` rather than through `putEntry`. AES-GCM ciphertext has a random 12-byte IV; ~5/256 ≈ 2 % of IVs start with a byte in [0xf0–0xf3] or 0xf8 which cborg rejects as an unsupported CBOR major-type-7 simple value. The read-side `getEnvelopePayload` catches the throw and falls through to the raw-bytes path — that path is CORRECT for pre-#247 raw writers — but `handleEnvelopeFallback` was logging a WARN with text claiming "live envelope corruption when seen on freshly-written entries", which is flat-out wrong for the cbor-decode case. 7a12ac8 happened to roll IVs that all CBOR-decoded fine (0 hits); c1f2ac0 rolled 6 unsupported-simple-value IVs (6 hits) and tripped the WARN six times. Same code, different luck. Fix: thread the cborError flag through GetEnvelopePayloadFallbackHook into handleEnvelopeFallback. cborError === true ⇒ legacy raw-bytes path, demoted to DEBUG. cborError === false (decode succeeded but the resulting shape is not a valid envelope) IS real corruption — keep WARN. Notifier still fires in both cases so consumers retain the typed-event signal. == §D.5 `alice-peer1-before-vs-after` — false-positive byte-compare == `sphere balance` snapshots include transient output that varies between runs without reflecting wallet state: - " IPFS: +N added, -M removed" — fires when balance notices a background IPFS sync; depends on race between the CLI invocation and the durability gate. - "Syncing..." / " Ready." — wallet-load banner. - "[YYYY-MM-DDThh:mm:ss.sssZ] [LEVEL] [Component] ..." — debug timestamps + monotonic counters when the CLI runs verbose. Pure noise for state comparison. `assert_diff_empty` now normalizes snapshots through a sed filter before diff, leaving the originals untouched for forensics and writing the normalized copies as `${label}.{a,b}.norm` next to the diff. Real state divergences (e.g. the `bob-peer1-vs-peer2-after` 199.99-vs-99.99 UCT mismatch from the 7a12ac8 soak run today) still trip the assert — the normalizer only strips logging/sync noise, not balance rows. Builds: `npx tsc --noEmit` clean; `npx tsup` clean.
…nvoice visibility) Two consecutive c1f2ac0+fix soak runs (2026-05-29 11:12 and 11:24) aborted at §C.4 with `sphere invoice status "$INV"` returning "No invoice found matching prefix: ...". Root cause: cross-device invoice visibility requires three legs — Bob's profile-token IPFS publish landing durably, Bob's Nostr at-least-once mux acking (60s cooldown on retry), and peer2-alice's OrbitDB replicating the accounting key. When `unicity-ipfs1.dyndns.org` returned HTTP 500 on Bob's publish (observed in run 2), the at-least-once retry was scheduled 60s out but the script proceeded to §C.4 immediately, finding no invoice and tripping `set -euo pipefail`. `wait_for_invoice_visible` polls every 15s for up to 150s (10 attempts), treating "No invoice found" as transient. Other CLI failures (e.g. "Database is not open") still propagate immediately so the soak surfaces real breakage rather than masking it. The 150s budget exceeds the 60s at-least-once cooldown by 2.5× plus the OrbitDB replication window, so transient gateway flaps no longer fail the run.
…05-29) Adds wall-clock anchor + elapsed-total + elapsed-previous-section to every section banner so soak runs are self-timing without needing external date-anchors or log post-processing. Output format: [2026-05-29T12:31:42+02:00] +127s (prev section §C.3 ... took 12s) ================================================================ §C.4 Peer2 view (NO manual sync) ================================================================ Motivation: the page-freeze investigation soak runs were taking 6-17 minutes with no way to attribute time to specific sections. The 17-min run (PID 975830, 2026-05-29 11:54+) hit ~7 min stuck in §D.4 sphere-init retrying a 30s mutex timeout, but without per-section timing the slow section had to be identified by tail-and-eyeball. Now it's quantitative.
…r on recoverLatest
Two related changes that reduce the cumulative work the pointer-poll path
does under a degraded testnet — the situation observed on
unicity-ipfs1.dyndns.org 2026-05-29 that produced the
`/sidecar/blob?cid=… 404` flood in the browser console and pinned the
daemon CPU. Together they implement Section D from the review-agent
freeze findings.
== inspectSnapshotEpoch — closure-scope memo ==
profile/pointer-wiring.ts (around line 722)
Previously, every pointer-poll cycle's discovery walkback called
`inspectSnapshotEpoch(v)` for every version it inspected, which did the
full `resolveRemoteCid(v) → fetchFromIpfs(cid)` round-trip regardless of
whether the previous poll had already learned (a) the version's epoch
or (b) that the version's CID was unfetchable (404 / decode / shape
error). When a flap of the gateway returned 404 for a single CID, the
SAME CID was re-fetched on every subsequent poll — the dominant source
of the redundant work the user saw in the browser console.
The memo is a closure-scope `Map<version, { value?, error?, expiresAt }>`
keyed by version (a number). TTL = 30 s, set to be ≥ POINTER_POLL_MIN_MS
so a single full poll cycle dedups; values older than that are recomputed
in case a stuck gateway has recovered. Both positive results (the epoch
itself) and negative results (the error to re-throw) are cached, so the
loop sees the same failure signature without re-walking it.
== AbortController on recoverLatest ==
profile/profile-token-storage/lifecycle-manager.ts (5 callsites)
`ProfilePointerLayer.recoverLatest({ abortSignal })` has supported an
abort signal since #311, but only the two callsites that already had a
caller-supplied `signal` parameter used it (lines 904, 1154). The three
callsites in transient-handler / cold-start / pointer-poll paths
(lines ~1686, ~1882, ~2199) passed no signal — so `Sphere.destroy()`
during a slow IPFS round-trip left those calls in flight for tens of
seconds, accumulating work AFTER the user had reloaded / navigated /
unmounted the provider. That contributed to the dual-instance leak
observed in the review-agent flame graph.
Adds a `destroyController: AbortController | null` field, lazily allocated
on first use (so test harnesses that never poll don't pay the cost).
`shutdown()` aborts it next to the existing `pointerPollTimer` clear.
The three previously-bare recoverLatest calls now pass
`{ abortSignal: this.getDestroySignal() }`. The two callsites with
caller-supplied signals are unchanged.
Build: tsc clean. Tests: 2213/2213 pass.
…hrow (pre-merge review) Addresses pre-merge review feedback on #334. The helper's doc-comment previously claimed it matched `Promise.all(items.map(fn))` semantics, but the loop kept pulling new items off the cursor after the first worker's `fn` threw — exactly the fan-out leak the helper exists to prevent. The callsites in this file are safe today (every `fn` wraps errors in try/catch and returns `null`), but the helper is otherwise a reusable utility that could mislead a future caller. Adds an `aborted` flag set inside the worker's try/catch. Subsequent loop iterations short-circuit. In-flight `fn` calls already dispatched still settle (cancelling them would require an AbortSignal, which the current callsite does not provide). The aggregate promise rejects with the first thrown error via `Promise.all` as before. Updated the doc-comment to be honest about the strictness vs `Promise.all` and to call out the page-freeze 2026-05-29 motivation explicitly. Build: tsc clean. Tests: 50/50 GroupChat tests pass.
…6-05-29 fix(page-freeze): GroupChat fan-out + CBOR + snapshot-epoch + soak hardening
… + invalidatedNametags
Cross-device snapshot apply silently dropped the single-blob OrbitDB
keys `${addressId}.tombstones` and `${addressId}.invalidatedNametags`
because the lean-snapshot dispatcher had no writer registered for them.
Every other per-address key flows through a per-entry prefix writer
(`OutboxWriter`, `SentLedgerWriter`, `PrefixSyncWriter`-based
disposition / finalization-queue / recipient-context); these two
single-blob holdouts were missing from `factory.ts:writersFor()`.
Concretely, this caused the `bob-peer1-vs-peer2-after` soak failure:
peer2 recovered from the aggregator-pointer snapshot without the
spent-token tombstones, then re-ingested a CAR bundle that contained
the spent source token as if live — doubling the post-recovery
balance (199.99 UCT instead of 99.99 UCT). The companion
`alice-peer1-vs-peer2-after` passed only because Alice never spent
during the soak and therefore had no tombstones to lose.
Adds `SingleBlobSyncWriter<T>` with set-CRDT union semantics: decrypt
local + remote blobs, dedup by a caller-supplied key, write back the
merged blob re-encrypted through the canonical envelope path. Matches
`PaymentsModule.mergeTombstones` and the RMW union loop in
`ProfileTokenStorageProvider.writeOrbitOperationalState`. Idempotent
and monotone — re-running the JOIN with the same remote is a no-op
once the first pass converges.
Registers two writer instances in `profile/factory.ts:writersFor()`:
- `${addressId}.tombstones` (TxfTombstone[])
- `${addressId}.invalidatedNametags` (string[])
Both keys are written as JSON arrays via `writeProfileKey()` at
`profile/profile-token-storage-provider.ts:2275-2306`. The
RCA Phase 1 explicitly flagged `invalidatedNametags` as a likely
companion gap (§8.2); bundling avoids a second round-trip.
A related-but-separate latent defect (the per-device
`PROFILE_SNAPSHOT_BLOB_<addressId>` cache surviving `sphere clear`)
is tracked separately as issue #339 and is NOT fixed here.
…eys in dispatcher (Phase 2.5)
Phase 2 wired SingleBlobSyncWriter into factory.ts:writersFor() for
${addressId}.tombstones and ${addressId}.invalidatedNametags. The soak
confirmed §C.4 (Nostr-driven cross-device daemon sync) passes but §D.5
(`--no-nostr` IPFS-only mnemonic recovery) still fails with the original
bob-peer1-vs-peer2-after divergence (UCT 199 in 2 tokens vs UCT 99 in
1 token).
Root cause: the lean-snapshot publisher emits single-blob per-address
keys in their LEGACY form (${addr}_tombstones, ${addr}_invalidatedNametags)
because ProfileStorageProvider.keys() funnels them through
reverseMapProfileKey() — the static PROFILE_KEY_MAPPING per-address
suffix table converts profile-form (${addr}.tombstones) back to legacy
form. The Phase 2 SingleBlobSyncWriter is wired with profile-form
keyPrefix, so the dispatcher's pre-filter
entries.filter(e.key.startsWith(keyPrefix)) slices an empty array and
the writer never fires. The wiring is correct; entries simply never
reach it. The soak log signal was `addresses=0` on every applySnapshot
call.
Per-entry writers (outbox/sent/dispositions/finalization/recipient-context)
are unaffected — their keys flow as ${addr}.${prefix}.${id} which
doesn't match the static suffix-match (the suffix is .${prefix}.${id},
not .${prefix}), so they stay in profile form end-to-end. Bundle keys
(tokens.bundle.*) also pass through unchanged.
Fix: add normalizeEntryKey() to the dispatcher (PER_ADDRESS_LEGACY_SUFFIX_MAP
table for tombstones + invalidatedNametags) and apply it in the
runProfileSnapshotJoin entry point BEFORE address extraction and writer
pre-filter. Strict DIRECT_[0-9a-f]{6}_[0-9a-f]{6} prefix check guards
against malformed addressIds. Fail-closed on unlisted suffixes — adding
a new single-blob writer in the future requires extending BOTH the
table AND factory.ts:writersFor(), mirroring the existing contract.
Tests: 14 new tests in tests/unit/profile/snapshot-apply-ipfs-no-nostr.test.ts
covering the pure normalizer (idempotence, profile/legacy/global/bundle
keys, malformed addressIds, fail-closed unlisted suffixes), end-to-end
peer-A-to-peer-B dispatch for both tombstones and invalidatedNametags
via the production publisher's legacy-key path, and a mixed snapshot
(legacy-form single-blob + profile-form per-entry outbox in the same
JOIN). REGRESSION suite at the bottom pins the pre-Phase-2.5 behaviour
by asserting the address-extraction regex misses underscore-form keys.
Verified the new end-to-end tests fail when normalizeEntryKey is
temporarily replaced with the identity function — confirms the
normalizer is on the critical path. Restored on commit.
- profile/profile-snapshot-dispatcher.ts — +103 lines (normalizer +
table + dispatch hook + __internal exposure for tests)
- tests/unit/profile/snapshot-apply-ipfs-no-nostr.test.ts — new (14 tests)
npx vitest run tests/unit/profile/ → 138 files, 2250/2250 pass
(Phase 2's 2236 baseline + 14 new tests)
npx vitest run tests/unit/ → 440 files, 8100/8100 pass + 2 skipped
npx tsup → build success
npx eslint . → unchanged (7 pre-existing errors in unrelated files,
no new warnings on changed files)
npm run typecheck → success
The Sphere page top bar surfaced "Aggregator service unavailable"
while the aggregator was actually live. Root cause was a mode
inconsistency in `AggregatorPinger`:
- Provider mode required `round > 0` to count as `'up'`.
- URL mode (fallback) accepts ANY finite numeric result as `'up'`.
- The reference infra-probe at `unicity-infra-probe/src/probes/
aggregator.mjs` treats any structured JSON-RPC response as alive.
Fresh shards / between-batch states can legitimately return a `0`
block height — provider mode demoted these to `'degraded'`, which
the UI surfaced as unavailable.
The previous code used `0` as a sentinel for the legacy "no
aggregator client" stub path in `UnicityAggregatorProvider.
getCurrentRound()` (returns `0` when `aggregatorClient` is null,
i.e. `initialize()` was never called). But that sentinel collided
with legitimate `0` round responses.
Fix:
- `UnicityAggregatorProvider.getCurrentRound()` now throws when
`aggregatorClient` is null instead of returning `0`. This routes
the uninitialized path through the pinger's catch arm to `'down'`,
matching the semantics URL mode already has.
- `AggregatorPinger` provider mode now treats any finite `round >= 0`
as `'up'`. Non-finite / negative values still fall through to
`'degraded'`.
Tests:
- `tests/unit/core/connectivity.test.ts`: added coverage for round=0
(now up), MAX_SAFE_INTEGER (up), NaN/Infinity/negative (degraded),
and the stub-throw path (down).
- `tests/unit/oracle/UnicityAggregatorProvider.rpc-methods.test.ts`:
new `describe('getCurrentRound()')` block covering uninitialized
(throws), wired client (returns number), and shard-returns-0
(returns 0, not the stub sentinel).
Both new failure cases verified to fail against the unfixed code
via `git stash` round-trip.
…mbstones-writer fix(profile/snapshot)(issue-335): SingleBlobSyncWriter for tombstones + invalidatedNametags
…or-false-negative fix(connectivity): aggregator pinger false-negative when round=0
|
Release-scoped pre-merge review. Release-promotion review at the meta-level — not a 348k-line re-review of individual PRs. Each PR had its own gate when merged to Verdict:
|
| Flag | PR-mentioned | Actually default-on |
|---|---|---|
orphanAutoRecovery |
yes | yes |
spentStateRescan |
yes | yes |
tombstoneGcWorker |
yes | yes |
nostrPersistenceVerifier |
yes | yes |
senderUxf |
no | yes (BREAKING — wire-shape) |
recipientUxf |
no | yes |
recipientLegacyAdapter |
no | yes (REQUIRED for legacy interop) |
recoveryWorker |
no | yes |
sentReconciliationWorker |
no | yes |
recoveryAggregatorCheck |
no | yes |
finalizationWorker |
no | yes |
Opt-out path works: all are gated on config?.features?.<x> ?? true, and PaymentsModuleConfig.features?: UxfTransferFeatures accepts the partial. Type contract is clean. The escape hatch IS available — it's just not documented in the PR description.
5. Unsurfaced known concerns
- V6-RECOVER spam — already flagged by PR.
- Hot TODOs in scope:
modules/payments/PaymentsModule.ts:5573// TODO(T.2.B/T.2.C/T.5.B/T.7.A): consume the new TransferRequest— this looks like a half-wired call-site; worth a 5-minute peek before merge. @deprecatedcount rose from 23 → 28 across the diff (impl/browser, impl/nodejs, profile aggregator-pointer paths). Notable additions are the legacyIpfsStorageProviderflow and theIpnsSubscriptionClientconfig — consistent with the PR's "deprecated-API removals deferred to a future PR" note.features.recoveryWorkeris default-ON but the comment notes the workerno-ops until bootstrap installs a republish hook. If the bootstrap layer hasn't installed the hook in some consumer, this is silent dead-code. Not a release blocker but worth a one-line doc note.
6. Test coverage on new public surfaces
| Surface | Coverage |
|---|---|
uxf/* |
present — 22 files under tests/unit/uxf/ (UxfPackage, deconstruct, ipld, json, hash, instance-chain, transfer-payload, etc.) plus e2e in tests/e2e/uxf-*.test.ts and integration in tests/integration/accounting/uxf-transfer.test.ts. Real-token-fixture coverage too (issue-295-real-token-deconstruct). |
sphere.connectivity |
present — tests/unit/core/connectivity.test.ts and tests/unit/payments/connectivity-gate.test.ts. |
| Hierarchical addressability (#200/#201) | present — covered indirectly via tests/integration/tracked-addresses.test.ts + multi-address profile tests. |
| Lazy snapshot load (#313/#314) | assumed-present — didn't sample; trust the per-PR review. |
connect-host UXF intent schemaVersion |
present — tests/unit/connect/connect-host-uxf-intent-schema.test.ts. |
Coverage is sufficient for release. Did not sample individual test contents for non-triviality — that was each merge PR's own gate.
7. What I didn't / couldn't check
- Per-PR correctness — out of scope, each was reviewed on merge.
- Test contents — only directory presence + file existence checked; per-test assertions were the merging PR's responsibility.
- Live deploy behaviour — relied on the PR's own statement of 5/5 soak ALL GREEN and 3 h of
sphere-telco-test.dyndns.orguptime. - Cross-repo downstream impact —
sphere.telcois already vendored ahead per fix(payments)(#312): soften connectivity gate to advisory #327 (noted by PR), butagentsphereandopenclaw-unicityhaven't been audited forsenderUxfdefault-on consequences. PR's risk section acknowledges this; a coordinated upgrade plan should be confirmed before merge. - OrbitDB on-disk format compat — PR claims "No on-disk format changes beyond the additive envelope-vs-raw dual-format coexistence." Did not independently verify against actual
.orbitdbblob structure; trusting End-to-end flow flake: same script fails at different sections per run — suspected race in Profile persistence #247 envelope-migration PR's own gate.
Top blocker: bump package.json → 0.8.0 and rev CHANGELOG.md [Unreleased] → [0.8.0] - 2026-05-29 before merging. The rest are PR-description amendments.
… heading Prep for release: integration/all-fixes -> main (PR #345). - package.json: 0.7.2 -> 0.8.0 - CHANGELOG.md: add [0.8.0] - 2026-05-29 heading; existing Unreleased content (UXF feature-flag flip BREAKING + the 60-line content block) now sits under [0.8.0]. The new [Unreleased] section is empty, ready for the next development cycle.
Bring docs/readme-overhaul (PR #224) lineage from main into integration: - README.md: take main's autonomous-economic-agents framing (resolved conflict in favor of main's editorial direction; superseded integration's older Features-bullet-list style). - ARCHITECTURE.md + 9 new docs/* (DIRECT-MESSAGES, GROUP-CHAT, IDENTITY-CRYPTO, L1-ALPHA, MULTI-ADDRESS, PAYMENT-REQUESTS, PROVIDERS-AND-CONFIG, UNICITY-ID, WALLET-IMPORT-EXPORT): added wholesale from main. - docs/API.md, CONNECT.md, INTEGRATION.md, NAMETAG-BINDINGS.md, QUICKSTART-*: auto-merged cleanly (terminology sweep + integration's UXF additions both applied). Reconciles PR #345 (integration/all-fixes -> main) merge state for promotion.
Summary
Promote 178 merged PRs from
integration/all-fixestomainafter ~4 months of accumulated profile-layer, UXF, recovery, and connectivity work. Scope: 806 files, +348,528 / −8,860 lines. Last release lineage onmainwas around #128–#130; integration is at #342.Version bump recommendation:
0.7.2→0.8.0. The branch ships a new public surface (uxf/*,sphere.connectivity, profile aggregator-pointer layer, lazy-load snapshot) and several behaviour changes consumers may rely on. Maintainer to commit the bump pre-merge or as the merge commit message.Highlights
uxf/*) — full bundle/transfer-payload IPLD/CBOR/JSON path with limits, header validation, instance chains, token-join, SMT verification. Foundation for P2 (#202 follow-up): PR-B — self-sufficient UXF V5-pending bundles + resolveV5Token shape-reading refactor #207 self-sufficient V5-pending bundles and Invoice delivery: payer can't discover invoices targeting their address even after explicit sync #226 invoice-delivery via UXF-over-DM. Surface exported from main entry.sphere.connectivity((C) Offline mode — sphere.connectivity surface, send-path gating, per-backend status + periodic re-ping #312, feat(core)(#312): sphere.connectivity surface + offline send-path gating #315) — per-backend reachability (aggregator / IPFS / Nostr) with subscriber surface, force-probe, and send-path advisory gating (fix(payments)(#312): soften connectivity gate to advisory #327). Used by the Sphere top bar.publishedVersionawait (F2), and sentinel-KV republish (F3).Profile-layer hardening
A sustained ~4-month investment. Highest-impact:
IDBBlockstorefor browser Helia, inline durability gate on pointer publish ([CRITICAL] Browser Profile loses all tokens on reload — local blockstore evicted + IPFS 404 (works in Node.js CLI soak) #330), read-only fallback + legacy preservation on migration.SentLedgerWriter) and new envelope readers coexist via dual-format compat path. Migration writer + W7 reconcile downward + residuals tidy.manual-test-full-recovery.shruns.process.kill(pid, 0)probe with hostname guard; stops 920 s stale-lock waits in CLI workflows.Transfer / payments / SENT-pipeline (#166 follow-ups)
transfer:double-spend-detectedfor snapshot losers.Bug fixes worth calling out
sphere wallet usehangs in §D.1 with OrbitDB block-broker FD leak (900+ FDs on 2 blocks, 330% CPU, 1.4GB RSS) #278 OrbitDB FD leak — closed.Infrastructure / DX
SPHERE_DEBUGenv var, hierarchical filter spec).0920b4e), pre-End-to-end flow flake: same script fails at different sections per run — suspected race in Profile persistence #247 CBOR false-alarm tolerance.Migration / breaking-change notes
Most changes are additive. Items downstream consumers should review:
@unicitylabs/sphere-sdk/profile/*subpath exports broadened. Update barrel imports if you've pinned specific paths.sphere.connectivitysurface added ((C) Offline mode — sphere.connectivity surface, send-path gating, per-backend status + periodic re-ping #312). New events:connectivity:changedetc. Existing offline-handling code paths now receive advisory'down'/'degraded'signals — review your error-handling to ensure you're not double-handling.SingleBlobSyncWriterregistered infactory.ts:writersFor()for${addressId}.tombstonesand${addressId}.invalidatedNametags(PR fix(profile/snapshot)(issue-335): SingleBlobSyncWriter for tombstones + invalidatedNametags #340). Pre-existing wallets gain merge capability automatically — no on-disk format change.orphanAutoRecovery(feat(payments): flip features.orphanAutoRecovery default-OFF → default-ON #181),spentStateRescan(feat(payments)(#174): flip features.spentStateRescan default-OFF → default-ON #178),tombstoneGcWorker(feat(payments)(#5): flip features.tombstoneGcWorker default-ON #184),nostrPersistenceVerifier(feat(payments)(#5): flip features.nostrPersistenceVerifier default-ON #185). Consumers wanting to disable should pass explicitfalse.No deprecated-API removals in this merge.
IpfsStorageProviderremoval (#337) is gated and deferred to a future PR.Known issues riding along (not fixed, tracked)
sphere clear --yes, masks cross-device recovery defects #339 —PROFILE_SNAPSHOT_BLOBlocal cache survivessphere clear --yes. Masks future cross-device bugs from soak detection. Filed today.ENVELOPE-FALLBACKlogs WARN on expected legacy CBOR-decode failures. Cosmetic, alarm-fatigue only. Two-line fix filed today.SingleBlobSyncWriter.joinSnapshotnon-atomic read-merge-write window. Currently no race observed in soak; option-1 (doc + assertion) recommended.SingleBlobSyncWritersilently drops oversized local blob pastMAX_ENTRY_BYTES_RAW. Slow data-loss vector at high tombstone counts; not exercised today.c1f2ac0and noted in the freeze-fix pause memo. Soak tolerates it; needs separate diagnosis.Validation
manual-test-full-recovery.shagainst the integration/all-fixes tip): 5/5 ALL GREEN. Wall times 579–1352 s per run. Everybob-peer1-vs-peer2-afterbyte-compare passed.sphere-telco-test.dyndns.orgran the integration tip (cherry-picked combined build) for ~3 hours of this session with no top-bar regression and HTTP 200 throughout.npx tsupclean,npx eslint .no new warnings,npx tsc --noEmitclean.Risk assessment
@unicitylabs/sphere-sdkconsumers (sphere.telco already vendored ahead via fix(payments)(#312): soften connectivity gate to advisory #327; agentsphere and openclaw-unicity should be flagged).Closes / addresses
#178, #179, #180, #181, #182, #183, #184, #185, #197, #195, #200, #202, #206, #207, #210, #217, #218, #220, #221, #222, #223, #225, #226, #227, #229, #230, #231, #232, #234, #235, #236, #237, #238, #239, #240, #241, #242, #243, #244, #245, #246, #247, #248, #249, #250, #251, #252, #253, #254, #255, #256, #257, #258, #259, #260, #261, #262, #264, #265, #266, #267, #268, #269, #270, #271, #272, #273, #274, #275, #276, #277, #278, #279, #280, #281, #282, #283, #285, #286, #287, #288, #289, #292, #294, #295, #296, #300, #301, #302, #303, #305, #309, #310, #311, #312, #313, #314, #315, #316, #317, #319, #320, #321, #322, #327, #330, #331, #332, #334, #335, #336, #342.
Tracked-but-not-closed: #339, #341, #343, #344 (filed today as follow-ups).