fix(uxf): surface mergePkg skipped tokens + add strict mode (Audit #333 H3)#352
Merged
Merged
Conversation
This was referenced May 30, 2026
Closed
eb7a21d to
a0d5bc7
Compare
c827d43 to
38d3208
Compare
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
5 tasks
a0d5bc7 to
4fb7e19
Compare
38d3208 to
6e46117
Compare
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
4fb7e19 to
19fc37f
Compare
6e46117 to
2c8ec79
Compare
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
19fc37f to
6d9f26c
Compare
2c8ec79 to
eb3f577
Compare
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
6d9f26c to
f927a45
Compare
eb3f577 to
4422e84
Compare
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
…mode
Before this fix, `UxfPackage.merge` (→ `mergePkg`) silently dropped any
tokenId whose per-token resolver threw. A single malformed sibling
element inside a Rule 4 synthetic-rebuild made a legitimately-received
token vanish from the merged manifest with only a `logger.warn`. On the
receive path this manifested as token loss from view: the caller had no
machine-readable signal that anything had gone wrong.
Fix:
- `UxfPackage.merge()` now returns `{ skipped: MergeSkip[] }`. Each
`MergeSkip` records the tokenId, the original error, the target's
prior root (if any), and the source's incoming root that failed
to incorporate. Callers can react: replay, retry, surface to UI,
or quarantine.
- `opts.strict: true` aggregates skipped tokens into a
`UxfError('MERGE_PARTIAL_FAILURE')` thrown BEFORE the atomic apply
phase — target state is unchanged on the throw. Use on receive-
path JOINs where any silent drop is unacceptable; leave default-off
for opportunistic peer JOINs where partial coverage is fine.
- `opts.onSkip` fires once per skipped tokenId for callers that want
telemetry visibility without strict-mode failure. Throws inside
onSkip are caught and logged — observability must not change merge
semantics.
- New error code `MERGE_PARTIAL_FAILURE` on UxfErrorCode. The thrown
error also carries a non-typed `skipped` field with the structured
skip list so callers needing machine-readable details can read it
via `(err as { skipped }).skipped`.
- Back-compat at the internal `mergePkg` boundary: legacy positional
`(target, source, verifiedProofs)` callers keep working via a
Set-vs-opts shape probe. Public callers (4 in tree: PaymentsModule,
profile-token-storage-provider, profile/consolidation,
flush-scheduler) ignore the returned `{ skipped }` and continue to
use merge as void — behavior unchanged for them.
Tests: 8 new H3 regression tests covering empty-skipped baseline,
captured-skip on resolver throw, MergeSkip metadata fields,
strict-mode hard fail, strict-mode atomicity invariant (target
unchanged on throw), strict-mode happy path, onSkip callback fires
once per skip, and onSkip throw does not change merge semantics.
All 451 existing uxf tests pass unchanged. 8163 total unit tests pass.
Refs: #333 (H3). Stacked on #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
4422e84 to
768963e
Compare
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…e in disposition-engine
modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.
Fix:
- New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
interface on disposition-engine.ts. The hook canonically re-derives
RequestId from (authenticator.publicKey, sourceState) and compares
to the bundle's requestId; production wiring calls RequestId.create
and compares toJSON.
- Optional `assertRequestIdBinding?` dep added to
DispositionEngineInput. When provided, the engine calls it BEFORE
every verifyProof invocation:
* `{ ok: true }` → proof verification proceeds.
* `{ ok: false }` → cryptoInvalid('proof-invalid') — same
§5.3 [C](3) class as a clean PATH_INVALID at the verifier,
because the (proof, requestId) pair does not bind to this tx.
* throw → structuralInvalid('proof-throw').
The binding gate fires BEFORE verifyProof so a forged binding
short-circuits the (potentially expensive) on-chain proof check.
- Optional shape preserves back-compat with the 66 existing engine
tests (which do not set the hook) and with all production callers
that have not yet been updated.
- Plumbed through legacy-shape-adapter.ts: new optional
`assertRequestIdBinding?` field on LegacyShapeAdapterInput is
forwarded to DispositionEngineInput in buildEngineInput. Existing
callers that don't supply the hook degrade to the pre-fix
behaviour; new callers pass the production-grade RequestId.create
comparer.
Tests: 7 new H4 regression tests covering:
- Back-compat (hook absent → proof verification proceeds)
- Hook returns ok=true (binding asserted, proof verified)
- Hook receives both bundle requestId and authenticator unchanged
- Hook returns ok=false (cryptoInvalid('proof-invalid'),
verifyProof NOT invoked)
- Hook throws (structuralInvalid('proof-throw'))
- Binding gate fires BEFORE verifyProof (ordering invariant)
- Null-proof chain entries do NOT trigger the binding gate
All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.
Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…permanent
modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.
Fix:
- Added optional `sourceTokenIds?: ReadonlyArray<string>` to
`UxfTransferOutboxEntry`. The instant-sender now records the
selected source set at outbox-record construction time via
`OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
on `undefined` for back-compat with entries written before H5.
- Added optional `recoverFailedPermanentSources?(sources, outboxId)`
hook to `FinalizationWorkerSenderOptions`. When wired, the worker
invokes it once at the `failed-permanent` transition with the
entry's recorded source tokenIds. Production wiring flips each
source back to a spendable state (typically `confirmed`) gated
on a same-source-no-other-live-outbox check; the audit's gap
was that the worker provided no signal at all.
- The hook is best-effort: throws are caught and emitted as a
`transfer:failed` event for triage, but the `failed-permanent`
transition itself stands. This preserves the terminal-state
contract — a recovery hook failure must not re-introduce the
locked-pending bug it exists to fix.
- Optional shape preserves back-compat with the 90+ existing
finalization-worker-sender tests (which do not set the hook).
Tests: 5 new H5 regression tests covering:
- Hook fires with the recorded source ids on failed-permanent
- Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
- Hook absent: worker still transitions cleanly
- Hook throws: failed-permanent stands, transfer:failed emitted
- Hook fires AFTER the outbox transition (ordering invariant)
The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.
All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.
Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…nifestCas
ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.
Fix:
- New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
constructor's `opts.verifyEntryRoot`. When wired, the hook is
invoked AFTER the label CAS passes and BEFORE the write. The hook
fetches the content the entry's `rootHash` references (production:
the UXF pool element), recomputes the hash, and asserts it matches
the declaration.
- New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
with structured `integrityDetail` for triage. The result also
carries `observed.contentHash` so callers can re-read state.
- `ManifestCidRewriteCasError.casReason` widened to surface
`'integrity-failed'`. The worker's outer retry loop already
treats any throw from manifest-cid-rewrite as a hard CAS failure
that bubbles up; the new reason rides through unchanged but
preserves structure for operator triage.
- Skipped on the `prev === null` (asserts-no-entry) path because
there is no observed entry to verify. Skipped on label-CAS-mismatch
because the swap has already been rejected.
- Optional shape preserves back-compat: all 9 existing manifest-cas
tests and the 100% of production callers that do not (yet) supply
the verifier continue to behave identically. Production wiring is
a follow-up that does not block this PR.
Tests: 6 new H7 regression tests covering:
- Back-compat (verifier omitted → label-only CAS as before)
- Happy path (verifier called AFTER label CAS, BEFORE write)
- Integrity failure (returns 'integrity-failed', no write happens,
storage unchanged, integrityDetail propagated)
- Skip on prev=null (no observed entry to verify)
- Skip on label-CAS-mismatch (short-circuit before verifier)
- 'integrity-failed' has its own discriminator distinct from
cas-mismatch / not-found / concurrent-modification
All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.
Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
5 tasks
vrogojin
added a commit
that referenced
this pull request
May 30, 2026
…ER test-coverage gap The soak (manual-test-full-recovery.sh) routinely surfaces `[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient- address mismatch (HD-index recovery exhausted) (no retry): VerificationError: Recipient address mismatch` at §C → §D. The existing unit tests cover helpers in isolation (PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts #269), but neither proves the composition: "given a sibling HD index in the recipient's inventory, tryRecover finds it AND the SDK's verifyRecipient agrees on the derived address". This left CI blind to the regression mode where tryRecover returns a false-positive candidate that Token.update later rejects. This commit closes the gap at three layers: L1. Real-SDK integration test (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts): - Replaces the file-level vi.mock for SigningService / UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with REAL SDK imports. - Builds two real HD-derived signing services. - Constructs the sender-targeted DIRECT address via real UnmaskedPredicate.create → reference.toAddress (the EXACT path the SDK's verifyRecipient uses). - Drives `tryRecoverSigningServiceForRecipient` against real TokenId / TokenType / Uint8Array salt. - 4 tests: happy path, no-match negative, deriveAddressInfo throws, and (L2 composition) tryRecover + predicate construction round-trip equals the transferTx target address. L3. CI workflow (.github/workflows/soak-nightly.yml): - Nightly cron + workflow_dispatch trigger. - Probes testnet aggregator + Nostr relay; skip-not-fail when either is unreachable (third-party outages cannot block releases). - Checks out sphere-cli alongside sphere-sdk, links the local build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*. - Captures soak.log + workspace as artifacts (30-day retention on failure, 7-day on success). - Surfaces metrics in the workflow summary: V6-RECOVER count, Stranded receive count, POINTER_MONOTONICITY_VIOLATION count, bcast_pub>0 count, exit code. What's tested by what (audit roll-up): - SDK address-derivation regression in tryRecover → L1 (real SDK). - tryRecover ↔ predicate-construction composition → L1 composition test (proves verifyRecipient cannot throw against the recovered state). - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak. - Helper-level edge cases (mock-friendly) → existing PaymentsModule.recipient-address-mismatch-recovery.test.ts. - Error-classification on hard fail → existing #269 test in PaymentsModule.proof-polling-persistence.test.ts. Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean. Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack). Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the H3 high finding of audit #333 —
UxfPackage.mergePkgsilently dropped any tokenId whose per-token resolver threw. A single malformed sibling element inside a Rule 4 synthetic rebuild made a legitimately-received token vanish from the merged manifest with only alogger.warn. On the receive path this was token loss from view with no machine-readable signal.Fix
The pre-fix design deliberately did per-token-atomic merging ("one poisoned entry must not deny the user their good tokens"). H3 preserves that property but adds observability:
UxfPackage.merge()now returns{ skipped: MergeSkip[] }. EachMergeSkiprecords{ tokenId, error, targetExisting, sourceIncoming }so callers can replay, retry, surface to UI, or quarantine.opts.strict: trueaggregates skipped tokens into aUxfError('MERGE_PARTIAL_FAILURE')thrown before the atomic apply phase — target state is unchanged on the throw. Recipient receive-path JOINs should use strict; opportunistic peer JOINs leave it off.opts.onSkipfires once per skipped tokenId for callers that want telemetry without strict-mode failure. Throws insideonSkipare caught and logged — observability must not change merge semantics.MERGE_PARTIAL_FAILUREerror code onUxfErrorCode. The thrown error also carries a non-typedskippedfield with the structured skip list.mergePkgboundary: legacy positional(target, source, verifiedProofs)callers keep working via a Set-vs-opts shape probe. The 4 public callers in tree (PaymentsModule,profile-token-storage-provider,profile/consolidation,flush-scheduler) ignore the returned{ skipped }and behave exactly as before.Test plan
tests/unit/uxf/h3-mergepkg-skipped-tokens.test.ts:skippedbaseline (no resolver fires)skipped(was silently dropped pre-fix)MergeSkipmetadata fields populated correctly (targetExisting,sourceIncoming)MERGE_PARTIAL_FAILUREwith structuredskippedlistonSkipcallback fires once per skip with correct payloadonSkipthrows do not change merge semantics (merge still completes, skipped reported)tests/unit/uxf/tests pass unchangedtsc --noEmitcleanAudit traceability
uxf/UxfPackage.ts:~800strict: truefor hard failure mode, and the returnedskippedarray as the default explicit-result mode.Next in the stack
H4 (disposition engine must re-derive
requestId) follows, branched off this PR.