fix(profile/storage): plug plaintext-seed window in encrypt() (Audit #333 C1)#346
Conversation
c76576a to
1b63ae1
Compare
profile/migration.ts walked persist → sanity → cleanup without forcing the
TXF bundle to land durably:
- stepPersistToOrbitDb (:585) accepted save()'s cid: 'debounced' return
as success without driving the flush.
- stepSanityCheck (:643) read back via load() which returns in-memory
pendingData (source: 'cache') — passing even if nothing was pinned.
- stepCleanup (:779) then deleted legacy KV keys and unpinned the
legacy CID.
A crash (or later flush failure) between persist and the debounced flush
landing lost both the legacy KV state AND the now-unpinned legacy CID.
Token-DB loss was only partially mitigated by the #330 legacy-token-DB
fallback if wired; legacy KV and the unpinned CID were not.
Fix:
- stepPersistToOrbitDb now calls profileTokenStorage.awaitNextFlush(0)
immediately after save(). awaitNextFlush drives
flushScheduler.forceFlushSerialized() and throws on TIMEOUT or
POINTER_MONOTONICITY_VIOLATION — converting a silent crash-window
data loss into a recoverable MIGRATION_FAILED. timeoutMs=0 disables
the wall-clock deadline (the 4-iteration cap still applies) so large
migrations are not artificially capped.
- A token-storage provider without awaitNextFlush() is refused outright
with MIGRATION_FAILED. The pre-fix silent-loss path was the only
alternative and it is unsafe.
- stepSanityCheck rejects loadResult.source === 'cache' with a hard
error. After awaitNextFlush, pendingData is null on the real provider
and load() walks active bundles in OrbitDB returning source 'remote'.
A 'cache' result here means the durability gate was bypassed —
belt-and-braces over the awaitNextFlush guarantee.
Existing migration test mocks updated to simulate the real flush
contract (save → pendingData/source:'cache', awaitNextFlush → flushed/
source:'remote'). 9 new C2 regression tests cover:
- awaitNextFlush is called exactly once after save() (happy path)
- timeoutMs=0 is passed (no artificial deadline)
- TIMEOUT, POINTER_MONOTONICITY_VIOLATION, and generic flush errors
all become MIGRATION_FAILED before cleanup runs
- Providers omitting awaitNextFlush are refused
- Null-txfData path skips the flush entirely
- sanity check rejects post-persist source='cache' reads
- End-to-end failure-path invariant: any flush/sanity error leaves
legacy KV intact and the legacy CID reclaimable
All 442 unit test files (8127 tests) pass. tsc clean. eslint clean on
touched files.
Refs: #333 (C2). Stacked on top of #346 (C1).
…oplog reset
FlushScheduler.addBundleWithOplogAutoReset transitioned straight from
"extractLostHeadCid matched the error" to resetCorruptedLog() (which
calls db.drop()). The matcher cannot distinguish a permanently-corrupt
head (Helia GC ran on a memory-blockstore wallet) from a transiently-
unreachable one (gateway blip, peer offline, propagation lag). A
momentary fetch failure thus wiped all OUTBOX/SENT/disposition/
finalization entries not yet captured in a pinned bundle — permanent
data loss on a recoverable error.
Fix:
- Probe configured IPFS gateways for the lost head CID with
exponential backoff (verifyCidAccessibleWithRetry, 30 s deadline,
5 s per-attempt HEAD timeout — matches the Issue #239 shutdown gate
convention for testnet propagation) BEFORE the destructive reset.
- On a successful probe, retry addBundle ONCE. If the retry
succeeds the reset is SKIPPED — no operational state is lost. This
is the primary recovery path for the gateway-blip / propagation-
lag class of failure.
- On probe failure (no gateway served the CID within the deadline)
OR a post-probe retry that still hits the same auto-reset signature
(local Helia is the bottleneck, not the network), fall through to
the existing reset path. The freshest lostHeadCid is propagated
into the resetCorruptedLog reason payload and event data.
- On a post-probe retry that hits a DIFFERENT error class (e.g.,
POINTER_MONOTONICITY_VIOLATION), re-throw the new error — resetting
the log would not help.
- With no gateways configured, the probe is skipped entirely and the
pre-fix behaviour is preserved (no recovery surface, destructive
reset is the only forward path).
- When the probe itself throws (e.g., gateway URL validation failed
before any HEAD ran), fall through to reset — we cannot prove the
head is recoverable, so the safe-on-error default is the existing
behaviour.
Worst-case cost: +30 s before destructive teardown on a genuine
corruption. Small price relative to the data-loss exposure on the
false-positive path.
Tests: 7 new C3 regression tests covering each path (transient blip,
permanent loss, probe-ok-but-retry-fails, retry-different-error,
no-gateways, probe-throws, unrelated-errors). Existing 7-test
flush-scheduler-oplog-reset.test.ts suite unchanged and still green.
443 unit test files (8134 tests) pass. tsc clean.
Refs: #333 (C3). Stacked on top of #347 (C2) which is stacked on #346 (C1).
…nservative-sender conservative-sender.ts had zero locking primitives. instant-sender.ts declared a module-local sourceLocks map (Wave 5 #171), so conservative sends and instant-vs-conservative cross-pairs did NOT serialize on shared source tokens: two concurrent sends could both pass selection, both commit on-chain, and only the aggregator caught the duplicate-spend after a source was burned. Fix has three parts: - Extracted the per-source lock registry from instant-sender.ts to a new `modules/payments/transfer/source-locks.ts`. The Wave 4-7 semantics are preserved verbatim (lex-sorted acquisition for deadlock prevention, 60 s liveness floor with force-release, Set dedup, fail-closed __resetSourceLocksForTesting guard). The shared module exports `acquireSourceLocks` and `__resetSourceLocksForTesting`. - instant-sender.ts re-exports `__resetSourceLocksForTesting` so the existing tests/unit/payments/transfer/instant-sender.test.ts import path continues to work. - conservative-sender.ts now acquires per-source locks after source selection completes (Step 2.5, mirroring the instant pattern) and releases them in the outer `finally`. Adds a TEST-ONLY __sourceLockMaxHoldMs override on ConservativeSenderDeps, symmetric with InstantSenderDeps. Cross-pair invariant: because both senders now share the SAME process-global map, an in-flight instant send blocks a concurrent conservative send sharing a source token, and vice versa. Tests: - 9 new direct-module tests on source-locks.ts covering acquire serialization, deadlock prevention via lex-sort, dedup, liveness floor (force-release), back-compat re-export identity, fail-closed guard preserved. - 4 new integration tests on conservative-sender proving the pipeline acquires + releases the lock: serialization on shared source, release on success, release on failure (try/finally), disjoint sources proceed concurrently. - All 36 existing instant-sender tests pass unchanged (the extracted module is byte-equivalent to the original Wave 7 code modulo the new callerLabel parameter, which defaults to 'sender' when omitted). 8147 unit tests pass (2 pre-existing skipped). tsc clean. Refs: #333 (H1). Stacked on #348 (C3) → #347 (C2) → #346 (C1).
…s content.tokenId
The manifest `tokenId → rootHash` map was only regex-shape-checked at the
import boundary (uxf/json.ts:430, uxf/ipld.ts:555) and never asserted
equal to the genesis tokenId encoded in the referenced token-root
element. Pool elements still self-verified (content-hash recomputed),
so a hostile sender could craft `{ "tokenId=A": <hash-of-root-minting-
tokenId-B> }` and pass every existing gate. Downstream consumers that
trust the manifest key (dedup, ownership filtering, balance
computation) would mis-identify the token — an identity-confusion
primitive that the audit specifically called out.
Fix layered defense-in-depth:
- uxf/verify.ts — primary structural-integrity gate. When iterating
manifest entries during the DAG walk, assert that the root
element exists with type ELEMENT_TYPE_TOKEN_ROOT AND that
element.content.tokenId === manifestKey. Mismatches surface as
MANIFEST_TOKENID_MISMATCH errors; wrong element types surface as
MANIFEST_TYPE_MISMATCH. modules/payments/transfer/bundle-verifier
already calls pkg.verify() so the receive pipeline (the audit's
primary concern) is covered.
- uxf/json.ts packageFromJson — fail-fast at the JSON import
boundary with VERIFICATION_FAILED. Belt-and-braces for consumers
that bypass verify() (e.g., direct UxfPackage.fromJson() use
without a downstream bundle-verifier round).
- uxf/ipld.ts importFromCar — same fail-fast at the CAR import
boundary. Mirrors the json-path check for the CAR-bound delivery
flow.
The export side already refuses to write packages with mismatched
manifest keys (the verify() result feeds production callers' export
paths), so this fix focuses on hostile-receive scenarios.
Tests: 8 new H2 regression tests covering verify() positive +
MANIFEST_TOKENID_MISMATCH + MANIFEST_TYPE_MISMATCH, the json import
boundary positive + spoofed-key + wrong-type rejections, and the
CAR import boundary spoofed-key + positive flow. All 443 existing
uxf tests pass unchanged. 8155 total unit tests pass.
Refs: #333 (H2). Stacked on #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…mode
Before this fix, `UxfPackage.merge` (→ `mergePkg`) silently dropped any
tokenId whose per-token resolver threw. A single malformed sibling
element inside a Rule 4 synthetic-rebuild made a legitimately-received
token vanish from the merged manifest with only a `logger.warn`. On the
receive path this manifested as token loss from view: the caller had no
machine-readable signal that anything had gone wrong.
Fix:
- `UxfPackage.merge()` now returns `{ skipped: MergeSkip[] }`. Each
`MergeSkip` records the tokenId, the original error, the target's
prior root (if any), and the source's incoming root that failed
to incorporate. Callers can react: replay, retry, surface to UI,
or quarantine.
- `opts.strict: true` aggregates skipped tokens into a
`UxfError('MERGE_PARTIAL_FAILURE')` thrown BEFORE the atomic apply
phase — target state is unchanged on the throw. Use on receive-
path JOINs where any silent drop is unacceptable; leave default-off
for opportunistic peer JOINs where partial coverage is fine.
- `opts.onSkip` fires once per skipped tokenId for callers that want
telemetry visibility without strict-mode failure. Throws inside
onSkip are caught and logged — observability must not change merge
semantics.
- New error code `MERGE_PARTIAL_FAILURE` on UxfErrorCode. The thrown
error also carries a non-typed `skipped` field with the structured
skip list so callers needing machine-readable details can read it
via `(err as { skipped }).skipped`.
- Back-compat at the internal `mergePkg` boundary: legacy positional
`(target, source, verifiedProofs)` callers keep working via a
Set-vs-opts shape probe. Public callers (4 in tree: PaymentsModule,
profile-token-storage-provider, profile/consolidation,
flush-scheduler) ignore the returned `{ skipped }` and continue to
use merge as void — behavior unchanged for them.
Tests: 8 new H3 regression tests covering empty-skipped baseline,
captured-skip on resolver throw, MergeSkip metadata fields,
strict-mode hard fail, strict-mode atomicity invariant (target
unchanged on throw), strict-mode happy path, onSkip callback fires
once per skip, and onSkip throw does not change merge semantics.
All 451 existing uxf tests pass unchanged. 8163 total unit tests pass.
Refs: #333 (H3). Stacked on #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
Fix-up: removed over-restrictive set() gate (force-push)While running the testnet-bound e2e suite against the stack tip, I caught a false-positive of my own C1 fix: 9 of 14 e2e failures were caused by this gate firing in The call sequence in
My original C1 fix had two layers: a hard-block in What changed
What this changes in security termsThe Test updates
Verification
Stack-wide impactThis is a force-push to C1. I cascade-rebased every downstream branch (C2, C3, H1, H2, H3, H4, H5, H7, V6-RECOVER test gap) so they cleanly stack on the updated C1. All rebases applied without conflicts.
|
…etIdentity()
profile/profile-storage-provider.ts:2213 encrypt() returned raw UTF-8 when
profileEncryptionKey === null, even with encryption explicitly enabled.
Under the common ordering of storage.connect() (OrbitDB attaches) →
setIdentity() (key derives), any set() call landing in the gap replicated
plaintext to OrbitDB → public IPFS gateways. The catastrophic case is the
6-step migration writing the wallet seed (mnemonic / master_key /
chain_code / derivation_path / base_path) before the consumer wired
setIdentity.
Fix has two layers:
- encrypt() and decrypt() now throw PROFILE_NOT_INITIALIZED when
encryption is configured but the key has not been derived. The
plaintext fallback only fires for explicit `encrypt: false` (test
mode), where identity-class keys are expected to be plaintext.
- set() hard-blocks IDENTITY_CLASS_KEYS at the top of the method —
before the local-cache write — so the wallet seed never even touches
the local cache layer either. This is belt-and-braces over the
encrypt() backstop and covers the migration's stepPersistToOrbitDb.
Legitimate ordering (setIdentity before any set) is unchanged. Browser /
Node Profile factories already route through attachIdentityToProfileProviders
which calls setIdentity before connect(), so sphere.telco's runProfileMigration
is unaffected. Consumer code that previously relied on the silent plaintext
fallback now gets a clear, recoverable error instead of a permanent leak.
Backward-compat: pre-fix wallets with plaintext identity-class entries in
OrbitDB (from the original leak window) will now throw on decrypt instead
of silently returning UTF-8 garbage. This is intentional — surfacing the
compromise is the right signal.
Tests: 13 new C1 regression tests + 2263 existing profile tests pass.
8118 total unit tests pass.
Refs: #333 (C1)
1b63ae1 to
0349e37
Compare
profile/migration.ts walked persist → sanity → cleanup without forcing the
TXF bundle to land durably:
- stepPersistToOrbitDb (:585) accepted save()'s cid: 'debounced' return
as success without driving the flush.
- stepSanityCheck (:643) read back via load() which returns in-memory
pendingData (source: 'cache') — passing even if nothing was pinned.
- stepCleanup (:779) then deleted legacy KV keys and unpinned the
legacy CID.
A crash (or later flush failure) between persist and the debounced flush
landing lost both the legacy KV state AND the now-unpinned legacy CID.
Token-DB loss was only partially mitigated by the #330 legacy-token-DB
fallback if wired; legacy KV and the unpinned CID were not.
Fix:
- stepPersistToOrbitDb now calls profileTokenStorage.awaitNextFlush(0)
immediately after save(). awaitNextFlush drives
flushScheduler.forceFlushSerialized() and throws on TIMEOUT or
POINTER_MONOTONICITY_VIOLATION — converting a silent crash-window
data loss into a recoverable MIGRATION_FAILED. timeoutMs=0 disables
the wall-clock deadline (the 4-iteration cap still applies) so large
migrations are not artificially capped.
- A token-storage provider without awaitNextFlush() is refused outright
with MIGRATION_FAILED. The pre-fix silent-loss path was the only
alternative and it is unsafe.
- stepSanityCheck rejects loadResult.source === 'cache' with a hard
error. After awaitNextFlush, pendingData is null on the real provider
and load() walks active bundles in OrbitDB returning source 'remote'.
A 'cache' result here means the durability gate was bypassed —
belt-and-braces over the awaitNextFlush guarantee.
Existing migration test mocks updated to simulate the real flush
contract (save → pendingData/source:'cache', awaitNextFlush → flushed/
source:'remote'). 9 new C2 regression tests cover:
- awaitNextFlush is called exactly once after save() (happy path)
- timeoutMs=0 is passed (no artificial deadline)
- TIMEOUT, POINTER_MONOTONICITY_VIOLATION, and generic flush errors
all become MIGRATION_FAILED before cleanup runs
- Providers omitting awaitNextFlush are refused
- Null-txfData path skips the flush entirely
- sanity check rejects post-persist source='cache' reads
- End-to-end failure-path invariant: any flush/sanity error leaves
legacy KV intact and the legacy CID reclaimable
All 442 unit test files (8127 tests) pass. tsc clean. eslint clean on
touched files.
Refs: #333 (C2). Stacked on top of #346 (C1).
…oplog reset
FlushScheduler.addBundleWithOplogAutoReset transitioned straight from
"extractLostHeadCid matched the error" to resetCorruptedLog() (which
calls db.drop()). The matcher cannot distinguish a permanently-corrupt
head (Helia GC ran on a memory-blockstore wallet) from a transiently-
unreachable one (gateway blip, peer offline, propagation lag). A
momentary fetch failure thus wiped all OUTBOX/SENT/disposition/
finalization entries not yet captured in a pinned bundle — permanent
data loss on a recoverable error.
Fix:
- Probe configured IPFS gateways for the lost head CID with
exponential backoff (verifyCidAccessibleWithRetry, 30 s deadline,
5 s per-attempt HEAD timeout — matches the Issue #239 shutdown gate
convention for testnet propagation) BEFORE the destructive reset.
- On a successful probe, retry addBundle ONCE. If the retry
succeeds the reset is SKIPPED — no operational state is lost. This
is the primary recovery path for the gateway-blip / propagation-
lag class of failure.
- On probe failure (no gateway served the CID within the deadline)
OR a post-probe retry that still hits the same auto-reset signature
(local Helia is the bottleneck, not the network), fall through to
the existing reset path. The freshest lostHeadCid is propagated
into the resetCorruptedLog reason payload and event data.
- On a post-probe retry that hits a DIFFERENT error class (e.g.,
POINTER_MONOTONICITY_VIOLATION), re-throw the new error — resetting
the log would not help.
- With no gateways configured, the probe is skipped entirely and the
pre-fix behaviour is preserved (no recovery surface, destructive
reset is the only forward path).
- When the probe itself throws (e.g., gateway URL validation failed
before any HEAD ran), fall through to reset — we cannot prove the
head is recoverable, so the safe-on-error default is the existing
behaviour.
Worst-case cost: +30 s before destructive teardown on a genuine
corruption. Small price relative to the data-loss exposure on the
false-positive path.
Tests: 7 new C3 regression tests covering each path (transient blip,
permanent loss, probe-ok-but-retry-fails, retry-different-error,
no-gateways, probe-throws, unrelated-errors). Existing 7-test
flush-scheduler-oplog-reset.test.ts suite unchanged and still green.
443 unit test files (8134 tests) pass. tsc clean.
Refs: #333 (C3). Stacked on top of #347 (C2) which is stacked on #346 (C1).
…nservative-sender conservative-sender.ts had zero locking primitives. instant-sender.ts declared a module-local sourceLocks map (Wave 5 #171), so conservative sends and instant-vs-conservative cross-pairs did NOT serialize on shared source tokens: two concurrent sends could both pass selection, both commit on-chain, and only the aggregator caught the duplicate-spend after a source was burned. Fix has three parts: - Extracted the per-source lock registry from instant-sender.ts to a new `modules/payments/transfer/source-locks.ts`. The Wave 4-7 semantics are preserved verbatim (lex-sorted acquisition for deadlock prevention, 60 s liveness floor with force-release, Set dedup, fail-closed __resetSourceLocksForTesting guard). The shared module exports `acquireSourceLocks` and `__resetSourceLocksForTesting`. - instant-sender.ts re-exports `__resetSourceLocksForTesting` so the existing tests/unit/payments/transfer/instant-sender.test.ts import path continues to work. - conservative-sender.ts now acquires per-source locks after source selection completes (Step 2.5, mirroring the instant pattern) and releases them in the outer `finally`. Adds a TEST-ONLY __sourceLockMaxHoldMs override on ConservativeSenderDeps, symmetric with InstantSenderDeps. Cross-pair invariant: because both senders now share the SAME process-global map, an in-flight instant send blocks a concurrent conservative send sharing a source token, and vice versa. Tests: - 9 new direct-module tests on source-locks.ts covering acquire serialization, deadlock prevention via lex-sort, dedup, liveness floor (force-release), back-compat re-export identity, fail-closed guard preserved. - 4 new integration tests on conservative-sender proving the pipeline acquires + releases the lock: serialization on shared source, release on success, release on failure (try/finally), disjoint sources proceed concurrently. - All 36 existing instant-sender tests pass unchanged (the extracted module is byte-equivalent to the original Wave 7 code modulo the new callerLabel parameter, which defaults to 'sender' when omitted). 8147 unit tests pass (2 pre-existing skipped). tsc clean. Refs: #333 (H1). Stacked on #348 (C3) → #347 (C2) → #346 (C1).
…s content.tokenId
The manifest `tokenId → rootHash` map was only regex-shape-checked at the
import boundary (uxf/json.ts:430, uxf/ipld.ts:555) and never asserted
equal to the genesis tokenId encoded in the referenced token-root
element. Pool elements still self-verified (content-hash recomputed),
so a hostile sender could craft `{ "tokenId=A": <hash-of-root-minting-
tokenId-B> }` and pass every existing gate. Downstream consumers that
trust the manifest key (dedup, ownership filtering, balance
computation) would mis-identify the token — an identity-confusion
primitive that the audit specifically called out.
Fix layered defense-in-depth:
- uxf/verify.ts — primary structural-integrity gate. When iterating
manifest entries during the DAG walk, assert that the root
element exists with type ELEMENT_TYPE_TOKEN_ROOT AND that
element.content.tokenId === manifestKey. Mismatches surface as
MANIFEST_TOKENID_MISMATCH errors; wrong element types surface as
MANIFEST_TYPE_MISMATCH. modules/payments/transfer/bundle-verifier
already calls pkg.verify() so the receive pipeline (the audit's
primary concern) is covered.
- uxf/json.ts packageFromJson — fail-fast at the JSON import
boundary with VERIFICATION_FAILED. Belt-and-braces for consumers
that bypass verify() (e.g., direct UxfPackage.fromJson() use
without a downstream bundle-verifier round).
- uxf/ipld.ts importFromCar — same fail-fast at the CAR import
boundary. Mirrors the json-path check for the CAR-bound delivery
flow.
The export side already refuses to write packages with mismatched
manifest keys (the verify() result feeds production callers' export
paths), so this fix focuses on hostile-receive scenarios.
Tests: 8 new H2 regression tests covering verify() positive +
MANIFEST_TOKENID_MISMATCH + MANIFEST_TYPE_MISMATCH, the json import
boundary positive + spoofed-key + wrong-type rejections, and the
CAR import boundary spoofed-key + positive flow. All 443 existing
uxf tests pass unchanged. 8155 total unit tests pass.
Refs: #333 (H2). Stacked on #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…nservative-sender conservative-sender.ts had zero locking primitives. instant-sender.ts declared a module-local sourceLocks map (Wave 5 #171), so conservative sends and instant-vs-conservative cross-pairs did NOT serialize on shared source tokens: two concurrent sends could both pass selection, both commit on-chain, and only the aggregator caught the duplicate-spend after a source was burned. Fix has three parts: - Extracted the per-source lock registry from instant-sender.ts to a new `modules/payments/transfer/source-locks.ts`. The Wave 4-7 semantics are preserved verbatim (lex-sorted acquisition for deadlock prevention, 60 s liveness floor with force-release, Set dedup, fail-closed __resetSourceLocksForTesting guard). The shared module exports `acquireSourceLocks` and `__resetSourceLocksForTesting`. - instant-sender.ts re-exports `__resetSourceLocksForTesting` so the existing tests/unit/payments/transfer/instant-sender.test.ts import path continues to work. - conservative-sender.ts now acquires per-source locks after source selection completes (Step 2.5, mirroring the instant pattern) and releases them in the outer `finally`. Adds a TEST-ONLY __sourceLockMaxHoldMs override on ConservativeSenderDeps, symmetric with InstantSenderDeps. Cross-pair invariant: because both senders now share the SAME process-global map, an in-flight instant send blocks a concurrent conservative send sharing a source token, and vice versa. Tests: - 9 new direct-module tests on source-locks.ts covering acquire serialization, deadlock prevention via lex-sort, dedup, liveness floor (force-release), back-compat re-export identity, fail-closed guard preserved. - 4 new integration tests on conservative-sender proving the pipeline acquires + releases the lock: serialization on shared source, release on success, release on failure (try/finally), disjoint sources proceed concurrently. - All 36 existing instant-sender tests pass unchanged (the extracted module is byte-equivalent to the original Wave 7 code modulo the new callerLabel parameter, which defaults to 'sender' when omitted). 8147 unit tests pass (2 pre-existing skipped). tsc clean. Refs: #333 (H1). Stacked on #348 (C3) → #347 (C2) → #346 (C1).
…s content.tokenId
The manifest `tokenId → rootHash` map was only regex-shape-checked at the
import boundary (uxf/json.ts:430, uxf/ipld.ts:555) and never asserted
equal to the genesis tokenId encoded in the referenced token-root
element. Pool elements still self-verified (content-hash recomputed),
so a hostile sender could craft `{ "tokenId=A": <hash-of-root-minting-
tokenId-B> }` and pass every existing gate. Downstream consumers that
trust the manifest key (dedup, ownership filtering, balance
computation) would mis-identify the token — an identity-confusion
primitive that the audit specifically called out.
Fix layered defense-in-depth:
- uxf/verify.ts — primary structural-integrity gate. When iterating
manifest entries during the DAG walk, assert that the root
element exists with type ELEMENT_TYPE_TOKEN_ROOT AND that
element.content.tokenId === manifestKey. Mismatches surface as
MANIFEST_TOKENID_MISMATCH errors; wrong element types surface as
MANIFEST_TYPE_MISMATCH. modules/payments/transfer/bundle-verifier
already calls pkg.verify() so the receive pipeline (the audit's
primary concern) is covered.
- uxf/json.ts packageFromJson — fail-fast at the JSON import
boundary with VERIFICATION_FAILED. Belt-and-braces for consumers
that bypass verify() (e.g., direct UxfPackage.fromJson() use
without a downstream bundle-verifier round).
- uxf/ipld.ts importFromCar — same fail-fast at the CAR import
boundary. Mirrors the json-path check for the CAR-bound delivery
flow.
The export side already refuses to write packages with mismatched
manifest keys (the verify() result feeds production callers' export
paths), so this fix focuses on hostile-receive scenarios.
Tests: 8 new H2 regression tests covering verify() positive +
MANIFEST_TOKENID_MISMATCH + MANIFEST_TYPE_MISMATCH, the json import
boundary positive + spoofed-key + wrong-type rejections, and the
CAR import boundary spoofed-key + positive flow. All 443 existing
uxf tests pass unchanged. 8155 total unit tests pass.
Refs: #333 (H2). Stacked on #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…mode
Before this fix, `UxfPackage.merge` (→ `mergePkg`) silently dropped any
tokenId whose per-token resolver threw. A single malformed sibling
element inside a Rule 4 synthetic-rebuild made a legitimately-received
token vanish from the merged manifest with only a `logger.warn`. On the
receive path this manifested as token loss from view: the caller had no
machine-readable signal that anything had gone wrong.
Fix:
- `UxfPackage.merge()` now returns `{ skipped: MergeSkip[] }`. Each
`MergeSkip` records the tokenId, the original error, the target's
prior root (if any), and the source's incoming root that failed
to incorporate. Callers can react: replay, retry, surface to UI,
or quarantine.
- `opts.strict: true` aggregates skipped tokens into a
`UxfError('MERGE_PARTIAL_FAILURE')` thrown BEFORE the atomic apply
phase — target state is unchanged on the throw. Use on receive-
path JOINs where any silent drop is unacceptable; leave default-off
for opportunistic peer JOINs where partial coverage is fine.
- `opts.onSkip` fires once per skipped tokenId for callers that want
telemetry visibility without strict-mode failure. Throws inside
onSkip are caught and logged — observability must not change merge
semantics.
- New error code `MERGE_PARTIAL_FAILURE` on UxfErrorCode. The thrown
error also carries a non-typed `skipped` field with the structured
skip list so callers needing machine-readable details can read it
via `(err as { skipped }).skipped`.
- Back-compat at the internal `mergePkg` boundary: legacy positional
`(target, source, verifiedProofs)` callers keep working via a
Set-vs-opts shape probe. Public callers (4 in tree: PaymentsModule,
profile-token-storage-provider, profile/consolidation,
flush-scheduler) ignore the returned `{ skipped }` and continue to
use merge as void — behavior unchanged for them.
Tests: 8 new H3 regression tests covering empty-skipped baseline,
captured-skip on resolver throw, MergeSkip metadata fields,
strict-mode hard fail, strict-mode atomicity invariant (target
unchanged on throw), strict-mode happy path, onSkip callback fires
once per skip, and onSkip throw does not change merge semantics.
All 451 existing uxf tests pass unchanged. 8163 total unit tests pass.
Refs: #333 (H3). Stacked on #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…s content.tokenId
The manifest `tokenId → rootHash` map was only regex-shape-checked at the
import boundary (uxf/json.ts:430, uxf/ipld.ts:555) and never asserted
equal to the genesis tokenId encoded in the referenced token-root
element. Pool elements still self-verified (content-hash recomputed),
so a hostile sender could craft `{ "tokenId=A": <hash-of-root-minting-
tokenId-B> }` and pass every existing gate. Downstream consumers that
trust the manifest key (dedup, ownership filtering, balance
computation) would mis-identify the token — an identity-confusion
primitive that the audit specifically called out.
Fix layered defense-in-depth:
- uxf/verify.ts — primary structural-integrity gate. When iterating
manifest entries during the DAG walk, assert that the root
element exists with type ELEMENT_TYPE_TOKEN_ROOT AND that
element.content.tokenId === manifestKey. Mismatches surface as
MANIFEST_TOKENID_MISMATCH errors; wrong element types surface as
MANIFEST_TYPE_MISMATCH. modules/payments/transfer/bundle-verifier
already calls pkg.verify() so the receive pipeline (the audit's
primary concern) is covered.
- uxf/json.ts packageFromJson — fail-fast at the JSON import
boundary with VERIFICATION_FAILED. Belt-and-braces for consumers
that bypass verify() (e.g., direct UxfPackage.fromJson() use
without a downstream bundle-verifier round).
- uxf/ipld.ts importFromCar — same fail-fast at the CAR import
boundary. Mirrors the json-path check for the CAR-bound delivery
flow.
The export side already refuses to write packages with mismatched
manifest keys (the verify() result feeds production callers' export
paths), so this fix focuses on hostile-receive scenarios.
Tests: 8 new H2 regression tests covering verify() positive +
MANIFEST_TOKENID_MISMATCH + MANIFEST_TYPE_MISMATCH, the json import
boundary positive + spoofed-key + wrong-type rejections, and the
CAR import boundary spoofed-key + positive flow. All 443 existing
uxf tests pass unchanged. 8155 total unit tests pass.
Refs: #333 (H2). Stacked on #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…mode
Before this fix, `UxfPackage.merge` (→ `mergePkg`) silently dropped any
tokenId whose per-token resolver threw. A single malformed sibling
element inside a Rule 4 synthetic-rebuild made a legitimately-received
token vanish from the merged manifest with only a `logger.warn`. On the
receive path this manifested as token loss from view: the caller had no
machine-readable signal that anything had gone wrong.
Fix:
- `UxfPackage.merge()` now returns `{ skipped: MergeSkip[] }`. Each
`MergeSkip` records the tokenId, the original error, the target's
prior root (if any), and the source's incoming root that failed
to incorporate. Callers can react: replay, retry, surface to UI,
or quarantine.
- `opts.strict: true` aggregates skipped tokens into a
`UxfError('MERGE_PARTIAL_FAILURE')` thrown BEFORE the atomic apply
phase — target state is unchanged on the throw. Use on receive-
path JOINs where any silent drop is unacceptable; leave default-off
for opportunistic peer JOINs where partial coverage is fine.
- `opts.onSkip` fires once per skipped tokenId for callers that want
telemetry visibility without strict-mode failure. Throws inside
onSkip are caught and logged — observability must not change merge
semantics.
- New error code `MERGE_PARTIAL_FAILURE` on UxfErrorCode. The thrown
error also carries a non-typed `skipped` field with the structured
skip list so callers needing machine-readable details can read it
via `(err as { skipped }).skipped`.
- Back-compat at the internal `mergePkg` boundary: legacy positional
`(target, source, verifiedProofs)` callers keep working via a
Set-vs-opts shape probe. Public callers (4 in tree: PaymentsModule,
profile-token-storage-provider, profile/consolidation,
flush-scheduler) ignore the returned `{ skipped }` and continue to
use merge as void — behavior unchanged for them.
Tests: 8 new H3 regression tests covering empty-skipped baseline,
captured-skip on resolver throw, MergeSkip metadata fields,
strict-mode hard fail, strict-mode atomicity invariant (target
unchanged on throw), strict-mode happy path, onSkip callback fires
once per skip, and onSkip throw does not change merge semantics.
All 451 existing uxf tests pass unchanged. 8163 total unit tests pass.
Refs: #333 (H3). Stacked on #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…mode
Before this fix, `UxfPackage.merge` (→ `mergePkg`) silently dropped any
tokenId whose per-token resolver threw. A single malformed sibling
element inside a Rule 4 synthetic-rebuild made a legitimately-received
token vanish from the merged manifest with only a `logger.warn`. On the
receive path this manifested as token loss from view: the caller had no
machine-readable signal that anything had gone wrong.
Fix:
- `UxfPackage.merge()` now returns `{ skipped: MergeSkip[] }`. Each
`MergeSkip` records the tokenId, the original error, the target's
prior root (if any), and the source's incoming root that failed
to incorporate. Callers can react: replay, retry, surface to UI,
or quarantine.
- `opts.strict: true` aggregates skipped tokens into a
`UxfError('MERGE_PARTIAL_FAILURE')` thrown BEFORE the atomic apply
phase — target state is unchanged on the throw. Use on receive-
path JOINs where any silent drop is unacceptable; leave default-off
for opportunistic peer JOINs where partial coverage is fine.
- `opts.onSkip` fires once per skipped tokenId for callers that want
telemetry visibility without strict-mode failure. Throws inside
onSkip are caught and logged — observability must not change merge
semantics.
- New error code `MERGE_PARTIAL_FAILURE` on UxfErrorCode. The thrown
error also carries a non-typed `skipped` field with the structured
skip list so callers needing machine-readable details can read it
via `(err as { skipped }).skipped`.
- Back-compat at the internal `mergePkg` boundary: legacy positional
`(target, source, verifiedProofs)` callers keep working via a
Set-vs-opts shape probe. Public callers (4 in tree: PaymentsModule,
profile-token-storage-provider, profile/consolidation,
flush-scheduler) ignore the returned `{ skipped }` and continue to
use merge as void — behavior unchanged for them.
Tests: 8 new H3 regression tests covering empty-skipped baseline,
captured-skip on resolver throw, MergeSkip metadata fields,
strict-mode hard fail, strict-mode atomicity invariant (target
unchanged on throw), strict-mode happy path, onSkip callback fires
once per skip, and onSkip throw does not change merge semantics.
All 451 existing uxf tests pass unchanged. 8163 total unit tests pass.
Refs: #333 (H3). Stacked on #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
Summary
Closes the C1 critical finding of audit #333 — the plaintext-seed window in
ProfileStorageProvider.encrypt(). Before this PR,encrypt()returned raw UTF-8 bytes wheneverprofileEncryptionKey === null, even with encryption explicitly enabled. Under the common Sphere ordering ofstorage.connect()(OrbitDB attaches) →setIdentity()(key derives), anyset()call landing in the gap silently replicated plaintext to OrbitDB → public IPFS gateways. The catastrophic case is the 6-stepProfileMigrationwriting the wallet seed (mnemonic,master_key,chain_code,derivation_path,base_path) before the consumer wiredsetIdentity.Fix
Two layers, defense-in-depth:
encrypt()/decrypt()fail closed. When encryption is configured (encryptionEnabled === true) but the key has not been derived, both now throwPROFILE_NOT_INITIALIZED. The plaintext fallback only fires for explicitencrypt: false(test mode), where identity-class keys are expected to be plaintext.set()hard-blocksIDENTITY_CLASS_KEYSat the top of the method — before the local-cache write — so the wallet seed never even touches the local cache layer either. This is belt-and-braces over theencrypt()backstop and specifically protects the migration'sstepPersistToOrbitDb.IDENTITY_CLASS_KEYSis the already-existing set inprofile/profile-export.ts:170:{ mnemonic, master_key, chain_code, derivation_path, base_path }.Why this is safe in production
attachIdentityToProfileProviders(profile/attach-identity.ts:62-66) callsproviders.storage.setIdentity(identity)immediately, thenconnect(). sphere.telco'srunProfileMigrationroutes throughcreateBrowserProfileProvidersFromSphere→ the same helper, so the legitimate path is unchanged.setIdentitybefore anyset(). No fixtures relied on the leak window.Backward-compat note
Pre-fix wallets with plaintext identity-class entries in OrbitDB (from the original leak window) will now throw on decrypt instead of silently returning UTF-8 garbage. This is intentional — surfacing the compromise is the right signal: if a wallet has plaintext identity in OrbitDB, that data is already on public IPFS and the seed should be rotated.
Test plan
tests/unit/profile/profile-storage-provider-c1-plaintext-seed.test.ts:set()refuses identity-class keys beforesetIdentity()(parametrized over all five keys)dbConnected— fires even pre-attachsetIdentity()derives the key, and land encrypted (not plaintext)encrypt: false) bypass worksencrypt()backstopdecrypt()throws symmetricallyconnect → set(mnemonic) → setIdentitysequence fails loudly, leaves zero traces in either layer, then succeeds post-setIdentitytsc --noEmitcleaneslintclean on touched filesAudit traceability
profile/profile-storage-provider.ts:2128encrypt()throw whenencryptionEnabled && key === null; hard-blockIDENTITY_CLASS_KEYSfrom the plaintext path unconditionally. Add a boot-order test proving no seedset()precedes key derivation."Next in the stack
C2 (migration cleanup-before-flush) and C3 (auto-drop on transient block-load) follow as stacked PRs.