Skip to content

fix(uxf): surface mergePkg skipped tokens + add strict mode (Audit #333 H3)#352

Merged
vrogojin merged 1 commit into
mainfrom
fix/issue-333-h3-mergepkg-no-silent-drop
May 30, 2026
Merged

fix(uxf): surface mergePkg skipped tokens + add strict mode (Audit #333 H3)#352
vrogojin merged 1 commit into
mainfrom
fix/issue-333-h3-mergepkg-no-silent-drop

Conversation

@vrogojin
Copy link
Copy Markdown
Contributor

Stacked PR. Built atop #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).

Summary

Closes the H3 high finding of audit #333UxfPackage.mergePkg silently dropped any tokenId whose per-token resolver threw. A single malformed sibling element inside a Rule 4 synthetic rebuild made a legitimately-received token vanish from the merged manifest with only a logger.warn. On the receive path this was token loss from view with no machine-readable signal.

Fix

The pre-fix design deliberately did per-token-atomic merging ("one poisoned entry must not deny the user their good tokens"). H3 preserves that property but adds observability:

  1. UxfPackage.merge() now returns { skipped: MergeSkip[] }. Each MergeSkip records { tokenId, error, targetExisting, sourceIncoming } so callers can replay, retry, surface to UI, or quarantine.
  2. opts.strict: true aggregates skipped tokens into a UxfError('MERGE_PARTIAL_FAILURE') thrown before the atomic apply phase — target state is unchanged on the throw. Recipient receive-path JOINs should use strict; opportunistic peer JOINs leave it off.
  3. opts.onSkip fires once per skipped tokenId for callers that want telemetry without strict-mode failure. Throws inside onSkip are caught and logged — observability must not change merge semantics.
  4. New MERGE_PARTIAL_FAILURE error code on UxfErrorCode. The thrown error also carries a non-typed skipped field with the structured skip list.
  5. Back-compat at the internal mergePkg boundary: legacy positional (target, source, verifiedProofs) callers keep working via a Set-vs-opts shape probe. The 4 public callers in tree (PaymentsModule, profile-token-storage-provider, profile/consolidation, flush-scheduler) ignore the returned { skipped } and behave exactly as before.

Test plan

  • 8 new H3 regression tests in tests/unit/uxf/h3-mergepkg-skipped-tokens.test.ts:
    • Empty skipped baseline (no resolver fires)
    • Per-token throw captured in skipped (was silently dropped pre-fix)
    • MergeSkip metadata fields populated correctly (targetExisting, sourceIncoming)
    • Strict mode throws MERGE_PARTIAL_FAILURE with structured skipped list
    • Strict-mode atomicity: target unchanged on throw (TOKEN_C not acquired when TOKEN_A's resolver fails)
    • Strict mode happy path (no resolver fires → no throw, empty skipped)
    • onSkip callback fires once per skip with correct payload
    • onSkip throws do not change merge semantics (merge still completes, skipped reported)
  • All 451 existing tests/unit/uxf/ tests pass unchanged
  • All 8163 unit tests pass (2 pre-existing skipped)
  • tsc --noEmit clean

Audit traceability

Next in the stack

H4 (disposition engine must re-derive requestId) follows, branched off this PR.

@vrogojin vrogojin force-pushed the fix/issue-333-h2-uxf-manifest-tokenid-binding branch from eb7a21d to a0d5bc7 Compare May 30, 2026 14:39
@vrogojin vrogojin force-pushed the fix/issue-333-h3-mergepkg-no-silent-drop branch from c827d43 to 38d3208 Compare May 30, 2026 14:39
vrogojin added a commit that referenced this pull request May 30, 2026
…e in disposition-engine

modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.

Fix:

  - New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
    interface on disposition-engine.ts. The hook canonically re-derives
    RequestId from (authenticator.publicKey, sourceState) and compares
    to the bundle's requestId; production wiring calls RequestId.create
    and compares toJSON.

  - Optional `assertRequestIdBinding?` dep added to
    DispositionEngineInput. When provided, the engine calls it BEFORE
    every verifyProof invocation:
      * `{ ok: true }`  → proof verification proceeds.
      * `{ ok: false }` → cryptoInvalid('proof-invalid') — same
        §5.3 [C](3) class as a clean PATH_INVALID at the verifier,
        because the (proof, requestId) pair does not bind to this tx.
      * throw → structuralInvalid('proof-throw').
    The binding gate fires BEFORE verifyProof so a forged binding
    short-circuits the (potentially expensive) on-chain proof check.

  - Optional shape preserves back-compat with the 66 existing engine
    tests (which do not set the hook) and with all production callers
    that have not yet been updated.

  - Plumbed through legacy-shape-adapter.ts: new optional
    `assertRequestIdBinding?` field on LegacyShapeAdapterInput is
    forwarded to DispositionEngineInput in buildEngineInput. Existing
    callers that don't supply the hook degrade to the pre-fix
    behaviour; new callers pass the production-grade RequestId.create
    comparer.

Tests: 7 new H4 regression tests covering:
  - Back-compat (hook absent → proof verification proceeds)
  - Hook returns ok=true (binding asserted, proof verified)
  - Hook receives both bundle requestId and authenticator unchanged
  - Hook returns ok=false (cryptoInvalid('proof-invalid'),
    verifyProof NOT invoked)
  - Hook throws (structuralInvalid('proof-throw'))
  - Binding gate fires BEFORE verifyProof (ordering invariant)
  - Null-proof chain entries do NOT trigger the binding gate

All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.

Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…permanent

modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.

Fix:

  - Added optional `sourceTokenIds?: ReadonlyArray<string>` to
    `UxfTransferOutboxEntry`. The instant-sender now records the
    selected source set at outbox-record construction time via
    `OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
    on `undefined` for back-compat with entries written before H5.

  - Added optional `recoverFailedPermanentSources?(sources, outboxId)`
    hook to `FinalizationWorkerSenderOptions`. When wired, the worker
    invokes it once at the `failed-permanent` transition with the
    entry's recorded source tokenIds. Production wiring flips each
    source back to a spendable state (typically `confirmed`) gated
    on a same-source-no-other-live-outbox check; the audit's gap
    was that the worker provided no signal at all.

  - The hook is best-effort: throws are caught and emitted as a
    `transfer:failed` event for triage, but the `failed-permanent`
    transition itself stands. This preserves the terminal-state
    contract — a recovery hook failure must not re-introduce the
    locked-pending bug it exists to fix.

  - Optional shape preserves back-compat with the 90+ existing
    finalization-worker-sender tests (which do not set the hook).

Tests: 5 new H5 regression tests covering:
  - Hook fires with the recorded source ids on failed-permanent
  - Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
  - Hook absent: worker still transitions cleanly
  - Hook throws: failed-permanent stands, transfer:failed emitted
  - Hook fires AFTER the outbox transition (ordering invariant)

The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.

All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.

Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
@vrogojin vrogojin force-pushed the fix/issue-333-h2-uxf-manifest-tokenid-binding branch from a0d5bc7 to 4fb7e19 Compare May 30, 2026 15:14
@vrogojin vrogojin force-pushed the fix/issue-333-h3-mergepkg-no-silent-drop branch from 38d3208 to 6e46117 Compare May 30, 2026 15:14
vrogojin added a commit that referenced this pull request May 30, 2026
…e in disposition-engine

modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.

Fix:

  - New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
    interface on disposition-engine.ts. The hook canonically re-derives
    RequestId from (authenticator.publicKey, sourceState) and compares
    to the bundle's requestId; production wiring calls RequestId.create
    and compares toJSON.

  - Optional `assertRequestIdBinding?` dep added to
    DispositionEngineInput. When provided, the engine calls it BEFORE
    every verifyProof invocation:
      * `{ ok: true }`  → proof verification proceeds.
      * `{ ok: false }` → cryptoInvalid('proof-invalid') — same
        §5.3 [C](3) class as a clean PATH_INVALID at the verifier,
        because the (proof, requestId) pair does not bind to this tx.
      * throw → structuralInvalid('proof-throw').
    The binding gate fires BEFORE verifyProof so a forged binding
    short-circuits the (potentially expensive) on-chain proof check.

  - Optional shape preserves back-compat with the 66 existing engine
    tests (which do not set the hook) and with all production callers
    that have not yet been updated.

  - Plumbed through legacy-shape-adapter.ts: new optional
    `assertRequestIdBinding?` field on LegacyShapeAdapterInput is
    forwarded to DispositionEngineInput in buildEngineInput. Existing
    callers that don't supply the hook degrade to the pre-fix
    behaviour; new callers pass the production-grade RequestId.create
    comparer.

Tests: 7 new H4 regression tests covering:
  - Back-compat (hook absent → proof verification proceeds)
  - Hook returns ok=true (binding asserted, proof verified)
  - Hook receives both bundle requestId and authenticator unchanged
  - Hook returns ok=false (cryptoInvalid('proof-invalid'),
    verifyProof NOT invoked)
  - Hook throws (structuralInvalid('proof-throw'))
  - Binding gate fires BEFORE verifyProof (ordering invariant)
  - Null-proof chain entries do NOT trigger the binding gate

All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.

Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…permanent

modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.

Fix:

  - Added optional `sourceTokenIds?: ReadonlyArray<string>` to
    `UxfTransferOutboxEntry`. The instant-sender now records the
    selected source set at outbox-record construction time via
    `OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
    on `undefined` for back-compat with entries written before H5.

  - Added optional `recoverFailedPermanentSources?(sources, outboxId)`
    hook to `FinalizationWorkerSenderOptions`. When wired, the worker
    invokes it once at the `failed-permanent` transition with the
    entry's recorded source tokenIds. Production wiring flips each
    source back to a spendable state (typically `confirmed`) gated
    on a same-source-no-other-live-outbox check; the audit's gap
    was that the worker provided no signal at all.

  - The hook is best-effort: throws are caught and emitted as a
    `transfer:failed` event for triage, but the `failed-permanent`
    transition itself stands. This preserves the terminal-state
    contract — a recovery hook failure must not re-introduce the
    locked-pending bug it exists to fix.

  - Optional shape preserves back-compat with the 90+ existing
    finalization-worker-sender tests (which do not set the hook).

Tests: 5 new H5 regression tests covering:
  - Hook fires with the recorded source ids on failed-permanent
  - Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
  - Hook absent: worker still transitions cleanly
  - Hook throws: failed-permanent stands, transfer:failed emitted
  - Hook fires AFTER the outbox transition (ordering invariant)

The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.

All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.

Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
@vrogojin vrogojin force-pushed the fix/issue-333-h2-uxf-manifest-tokenid-binding branch from 4fb7e19 to 19fc37f Compare May 30, 2026 15:46
@vrogojin vrogojin force-pushed the fix/issue-333-h3-mergepkg-no-silent-drop branch from 6e46117 to 2c8ec79 Compare May 30, 2026 15:46
vrogojin added a commit that referenced this pull request May 30, 2026
…e in disposition-engine

modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.

Fix:

  - New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
    interface on disposition-engine.ts. The hook canonically re-derives
    RequestId from (authenticator.publicKey, sourceState) and compares
    to the bundle's requestId; production wiring calls RequestId.create
    and compares toJSON.

  - Optional `assertRequestIdBinding?` dep added to
    DispositionEngineInput. When provided, the engine calls it BEFORE
    every verifyProof invocation:
      * `{ ok: true }`  → proof verification proceeds.
      * `{ ok: false }` → cryptoInvalid('proof-invalid') — same
        §5.3 [C](3) class as a clean PATH_INVALID at the verifier,
        because the (proof, requestId) pair does not bind to this tx.
      * throw → structuralInvalid('proof-throw').
    The binding gate fires BEFORE verifyProof so a forged binding
    short-circuits the (potentially expensive) on-chain proof check.

  - Optional shape preserves back-compat with the 66 existing engine
    tests (which do not set the hook) and with all production callers
    that have not yet been updated.

  - Plumbed through legacy-shape-adapter.ts: new optional
    `assertRequestIdBinding?` field on LegacyShapeAdapterInput is
    forwarded to DispositionEngineInput in buildEngineInput. Existing
    callers that don't supply the hook degrade to the pre-fix
    behaviour; new callers pass the production-grade RequestId.create
    comparer.

Tests: 7 new H4 regression tests covering:
  - Back-compat (hook absent → proof verification proceeds)
  - Hook returns ok=true (binding asserted, proof verified)
  - Hook receives both bundle requestId and authenticator unchanged
  - Hook returns ok=false (cryptoInvalid('proof-invalid'),
    verifyProof NOT invoked)
  - Hook throws (structuralInvalid('proof-throw'))
  - Binding gate fires BEFORE verifyProof (ordering invariant)
  - Null-proof chain entries do NOT trigger the binding gate

All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.

Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…permanent

modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.

Fix:

  - Added optional `sourceTokenIds?: ReadonlyArray<string>` to
    `UxfTransferOutboxEntry`. The instant-sender now records the
    selected source set at outbox-record construction time via
    `OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
    on `undefined` for back-compat with entries written before H5.

  - Added optional `recoverFailedPermanentSources?(sources, outboxId)`
    hook to `FinalizationWorkerSenderOptions`. When wired, the worker
    invokes it once at the `failed-permanent` transition with the
    entry's recorded source tokenIds. Production wiring flips each
    source back to a spendable state (typically `confirmed`) gated
    on a same-source-no-other-live-outbox check; the audit's gap
    was that the worker provided no signal at all.

  - The hook is best-effort: throws are caught and emitted as a
    `transfer:failed` event for triage, but the `failed-permanent`
    transition itself stands. This preserves the terminal-state
    contract — a recovery hook failure must not re-introduce the
    locked-pending bug it exists to fix.

  - Optional shape preserves back-compat with the 90+ existing
    finalization-worker-sender tests (which do not set the hook).

Tests: 5 new H5 regression tests covering:
  - Hook fires with the recorded source ids on failed-permanent
  - Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
  - Hook absent: worker still transitions cleanly
  - Hook throws: failed-permanent stands, transfer:failed emitted
  - Hook fires AFTER the outbox transition (ordering invariant)

The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.

All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.

Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
@vrogojin vrogojin force-pushed the fix/issue-333-h2-uxf-manifest-tokenid-binding branch from 19fc37f to 6d9f26c Compare May 30, 2026 15:53
@vrogojin vrogojin force-pushed the fix/issue-333-h3-mergepkg-no-silent-drop branch from 2c8ec79 to eb3f577 Compare May 30, 2026 15:53
vrogojin added a commit that referenced this pull request May 30, 2026
…e in disposition-engine

modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.

Fix:

  - New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
    interface on disposition-engine.ts. The hook canonically re-derives
    RequestId from (authenticator.publicKey, sourceState) and compares
    to the bundle's requestId; production wiring calls RequestId.create
    and compares toJSON.

  - Optional `assertRequestIdBinding?` dep added to
    DispositionEngineInput. When provided, the engine calls it BEFORE
    every verifyProof invocation:
      * `{ ok: true }`  → proof verification proceeds.
      * `{ ok: false }` → cryptoInvalid('proof-invalid') — same
        §5.3 [C](3) class as a clean PATH_INVALID at the verifier,
        because the (proof, requestId) pair does not bind to this tx.
      * throw → structuralInvalid('proof-throw').
    The binding gate fires BEFORE verifyProof so a forged binding
    short-circuits the (potentially expensive) on-chain proof check.

  - Optional shape preserves back-compat with the 66 existing engine
    tests (which do not set the hook) and with all production callers
    that have not yet been updated.

  - Plumbed through legacy-shape-adapter.ts: new optional
    `assertRequestIdBinding?` field on LegacyShapeAdapterInput is
    forwarded to DispositionEngineInput in buildEngineInput. Existing
    callers that don't supply the hook degrade to the pre-fix
    behaviour; new callers pass the production-grade RequestId.create
    comparer.

Tests: 7 new H4 regression tests covering:
  - Back-compat (hook absent → proof verification proceeds)
  - Hook returns ok=true (binding asserted, proof verified)
  - Hook receives both bundle requestId and authenticator unchanged
  - Hook returns ok=false (cryptoInvalid('proof-invalid'),
    verifyProof NOT invoked)
  - Hook throws (structuralInvalid('proof-throw'))
  - Binding gate fires BEFORE verifyProof (ordering invariant)
  - Null-proof chain entries do NOT trigger the binding gate

All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.

Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…permanent

modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.

Fix:

  - Added optional `sourceTokenIds?: ReadonlyArray<string>` to
    `UxfTransferOutboxEntry`. The instant-sender now records the
    selected source set at outbox-record construction time via
    `OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
    on `undefined` for back-compat with entries written before H5.

  - Added optional `recoverFailedPermanentSources?(sources, outboxId)`
    hook to `FinalizationWorkerSenderOptions`. When wired, the worker
    invokes it once at the `failed-permanent` transition with the
    entry's recorded source tokenIds. Production wiring flips each
    source back to a spendable state (typically `confirmed`) gated
    on a same-source-no-other-live-outbox check; the audit's gap
    was that the worker provided no signal at all.

  - The hook is best-effort: throws are caught and emitted as a
    `transfer:failed` event for triage, but the `failed-permanent`
    transition itself stands. This preserves the terminal-state
    contract — a recovery hook failure must not re-introduce the
    locked-pending bug it exists to fix.

  - Optional shape preserves back-compat with the 90+ existing
    finalization-worker-sender tests (which do not set the hook).

Tests: 5 new H5 regression tests covering:
  - Hook fires with the recorded source ids on failed-permanent
  - Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
  - Hook absent: worker still transitions cleanly
  - Hook throws: failed-permanent stands, transfer:failed emitted
  - Hook fires AFTER the outbox transition (ordering invariant)

The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.

All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.

Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
@vrogojin vrogojin force-pushed the fix/issue-333-h2-uxf-manifest-tokenid-binding branch from 6d9f26c to f927a45 Compare May 30, 2026 16:02
@vrogojin vrogojin force-pushed the fix/issue-333-h3-mergepkg-no-silent-drop branch from eb3f577 to 4422e84 Compare May 30, 2026 16:02
vrogojin added a commit that referenced this pull request May 30, 2026
…e in disposition-engine

modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.

Fix:

  - New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
    interface on disposition-engine.ts. The hook canonically re-derives
    RequestId from (authenticator.publicKey, sourceState) and compares
    to the bundle's requestId; production wiring calls RequestId.create
    and compares toJSON.

  - Optional `assertRequestIdBinding?` dep added to
    DispositionEngineInput. When provided, the engine calls it BEFORE
    every verifyProof invocation:
      * `{ ok: true }`  → proof verification proceeds.
      * `{ ok: false }` → cryptoInvalid('proof-invalid') — same
        §5.3 [C](3) class as a clean PATH_INVALID at the verifier,
        because the (proof, requestId) pair does not bind to this tx.
      * throw → structuralInvalid('proof-throw').
    The binding gate fires BEFORE verifyProof so a forged binding
    short-circuits the (potentially expensive) on-chain proof check.

  - Optional shape preserves back-compat with the 66 existing engine
    tests (which do not set the hook) and with all production callers
    that have not yet been updated.

  - Plumbed through legacy-shape-adapter.ts: new optional
    `assertRequestIdBinding?` field on LegacyShapeAdapterInput is
    forwarded to DispositionEngineInput in buildEngineInput. Existing
    callers that don't supply the hook degrade to the pre-fix
    behaviour; new callers pass the production-grade RequestId.create
    comparer.

Tests: 7 new H4 regression tests covering:
  - Back-compat (hook absent → proof verification proceeds)
  - Hook returns ok=true (binding asserted, proof verified)
  - Hook receives both bundle requestId and authenticator unchanged
  - Hook returns ok=false (cryptoInvalid('proof-invalid'),
    verifyProof NOT invoked)
  - Hook throws (structuralInvalid('proof-throw'))
  - Binding gate fires BEFORE verifyProof (ordering invariant)
  - Null-proof chain entries do NOT trigger the binding gate

All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.

Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…permanent

modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.

Fix:

  - Added optional `sourceTokenIds?: ReadonlyArray<string>` to
    `UxfTransferOutboxEntry`. The instant-sender now records the
    selected source set at outbox-record construction time via
    `OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
    on `undefined` for back-compat with entries written before H5.

  - Added optional `recoverFailedPermanentSources?(sources, outboxId)`
    hook to `FinalizationWorkerSenderOptions`. When wired, the worker
    invokes it once at the `failed-permanent` transition with the
    entry's recorded source tokenIds. Production wiring flips each
    source back to a spendable state (typically `confirmed`) gated
    on a same-source-no-other-live-outbox check; the audit's gap
    was that the worker provided no signal at all.

  - The hook is best-effort: throws are caught and emitted as a
    `transfer:failed` event for triage, but the `failed-permanent`
    transition itself stands. This preserves the terminal-state
    contract — a recovery hook failure must not re-introduce the
    locked-pending bug it exists to fix.

  - Optional shape preserves back-compat with the 90+ existing
    finalization-worker-sender tests (which do not set the hook).

Tests: 5 new H5 regression tests covering:
  - Hook fires with the recorded source ids on failed-permanent
  - Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
  - Hook absent: worker still transitions cleanly
  - Hook throws: failed-permanent stands, transfer:failed emitted
  - Hook fires AFTER the outbox transition (ordering invariant)

The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.

All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.

Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
@vrogojin vrogojin changed the base branch from fix/issue-333-h2-uxf-manifest-tokenid-binding to main May 30, 2026 16:09
…mode

Before this fix, `UxfPackage.merge` (→ `mergePkg`) silently dropped any
tokenId whose per-token resolver threw. A single malformed sibling
element inside a Rule 4 synthetic-rebuild made a legitimately-received
token vanish from the merged manifest with only a `logger.warn`. On the
receive path this manifested as token loss from view: the caller had no
machine-readable signal that anything had gone wrong.

Fix:

  - `UxfPackage.merge()` now returns `{ skipped: MergeSkip[] }`. Each
    `MergeSkip` records the tokenId, the original error, the target's
    prior root (if any), and the source's incoming root that failed
    to incorporate. Callers can react: replay, retry, surface to UI,
    or quarantine.

  - `opts.strict: true` aggregates skipped tokens into a
    `UxfError('MERGE_PARTIAL_FAILURE')` thrown BEFORE the atomic apply
    phase — target state is unchanged on the throw. Use on receive-
    path JOINs where any silent drop is unacceptable; leave default-off
    for opportunistic peer JOINs where partial coverage is fine.

  - `opts.onSkip` fires once per skipped tokenId for callers that want
    telemetry visibility without strict-mode failure. Throws inside
    onSkip are caught and logged — observability must not change merge
    semantics.

  - New error code `MERGE_PARTIAL_FAILURE` on UxfErrorCode. The thrown
    error also carries a non-typed `skipped` field with the structured
    skip list so callers needing machine-readable details can read it
    via `(err as { skipped }).skipped`.

  - Back-compat at the internal `mergePkg` boundary: legacy positional
    `(target, source, verifiedProofs)` callers keep working via a
    Set-vs-opts shape probe. Public callers (4 in tree: PaymentsModule,
    profile-token-storage-provider, profile/consolidation,
    flush-scheduler) ignore the returned `{ skipped }` and continue to
    use merge as void — behavior unchanged for them.

Tests: 8 new H3 regression tests covering empty-skipped baseline,
captured-skip on resolver throw, MergeSkip metadata fields,
strict-mode hard fail, strict-mode atomicity invariant (target
unchanged on throw), strict-mode happy path, onSkip callback fires
once per skip, and onSkip throw does not change merge semantics.
All 451 existing uxf tests pass unchanged. 8163 total unit tests pass.

Refs: #333 (H3). Stacked on #351 (H2) → #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
@vrogojin vrogojin force-pushed the fix/issue-333-h3-mergepkg-no-silent-drop branch from 4422e84 to 768963e Compare May 30, 2026 16:09
vrogojin added a commit that referenced this pull request May 30, 2026
…e in disposition-engine

modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.

Fix:

  - New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
    interface on disposition-engine.ts. The hook canonically re-derives
    RequestId from (authenticator.publicKey, sourceState) and compares
    to the bundle's requestId; production wiring calls RequestId.create
    and compares toJSON.

  - Optional `assertRequestIdBinding?` dep added to
    DispositionEngineInput. When provided, the engine calls it BEFORE
    every verifyProof invocation:
      * `{ ok: true }`  → proof verification proceeds.
      * `{ ok: false }` → cryptoInvalid('proof-invalid') — same
        §5.3 [C](3) class as a clean PATH_INVALID at the verifier,
        because the (proof, requestId) pair does not bind to this tx.
      * throw → structuralInvalid('proof-throw').
    The binding gate fires BEFORE verifyProof so a forged binding
    short-circuits the (potentially expensive) on-chain proof check.

  - Optional shape preserves back-compat with the 66 existing engine
    tests (which do not set the hook) and with all production callers
    that have not yet been updated.

  - Plumbed through legacy-shape-adapter.ts: new optional
    `assertRequestIdBinding?` field on LegacyShapeAdapterInput is
    forwarded to DispositionEngineInput in buildEngineInput. Existing
    callers that don't supply the hook degrade to the pre-fix
    behaviour; new callers pass the production-grade RequestId.create
    comparer.

Tests: 7 new H4 regression tests covering:
  - Back-compat (hook absent → proof verification proceeds)
  - Hook returns ok=true (binding asserted, proof verified)
  - Hook receives both bundle requestId and authenticator unchanged
  - Hook returns ok=false (cryptoInvalid('proof-invalid'),
    verifyProof NOT invoked)
  - Hook throws (structuralInvalid('proof-throw'))
  - Binding gate fires BEFORE verifyProof (ordering invariant)
  - Null-proof chain entries do NOT trigger the binding gate

All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.

Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…permanent

modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.

Fix:

  - Added optional `sourceTokenIds?: ReadonlyArray<string>` to
    `UxfTransferOutboxEntry`. The instant-sender now records the
    selected source set at outbox-record construction time via
    `OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
    on `undefined` for back-compat with entries written before H5.

  - Added optional `recoverFailedPermanentSources?(sources, outboxId)`
    hook to `FinalizationWorkerSenderOptions`. When wired, the worker
    invokes it once at the `failed-permanent` transition with the
    entry's recorded source tokenIds. Production wiring flips each
    source back to a spendable state (typically `confirmed`) gated
    on a same-source-no-other-live-outbox check; the audit's gap
    was that the worker provided no signal at all.

  - The hook is best-effort: throws are caught and emitted as a
    `transfer:failed` event for triage, but the `failed-permanent`
    transition itself stands. This preserves the terminal-state
    contract — a recovery hook failure must not re-introduce the
    locked-pending bug it exists to fix.

  - Optional shape preserves back-compat with the 90+ existing
    finalization-worker-sender tests (which do not set the hook).

Tests: 5 new H5 regression tests covering:
  - Hook fires with the recorded source ids on failed-permanent
  - Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
  - Hook absent: worker still transitions cleanly
  - Hook throws: failed-permanent stands, transfer:failed emitted
  - Hook fires AFTER the outbox transition (ordering invariant)

The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.

All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.

Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
@vrogojin vrogojin merged commit 56134c3 into main May 30, 2026
3 checks passed
@vrogojin vrogojin deleted the fix/issue-333-h3-mergepkg-no-silent-drop branch May 30, 2026 16:15
vrogojin added a commit that referenced this pull request May 30, 2026
…e in disposition-engine

modules/payments/transfer/disposition-engine.ts:629 called `verifyProof
(proof, trustBase, requestId)` with the bundle-supplied requestId and
trusted that the un-audited hydrateChain adapter had derived it
canonically. If the adapter erred or a malicious sender hand-crafted
the bundle with a proof anchored to a DIFFERENT transaction's
requestId, the proof would still verify (it IS a genuine on-chain
proof) but it would be incorrectly attributed to this transaction —
a proof-binding forgery.

Fix:

  - New `AssertRequestIdBindingFn` type and `AssertRequestIdBindingResult`
    interface on disposition-engine.ts. The hook canonically re-derives
    RequestId from (authenticator.publicKey, sourceState) and compares
    to the bundle's requestId; production wiring calls RequestId.create
    and compares toJSON.

  - Optional `assertRequestIdBinding?` dep added to
    DispositionEngineInput. When provided, the engine calls it BEFORE
    every verifyProof invocation:
      * `{ ok: true }`  → proof verification proceeds.
      * `{ ok: false }` → cryptoInvalid('proof-invalid') — same
        §5.3 [C](3) class as a clean PATH_INVALID at the verifier,
        because the (proof, requestId) pair does not bind to this tx.
      * throw → structuralInvalid('proof-throw').
    The binding gate fires BEFORE verifyProof so a forged binding
    short-circuits the (potentially expensive) on-chain proof check.

  - Optional shape preserves back-compat with the 66 existing engine
    tests (which do not set the hook) and with all production callers
    that have not yet been updated.

  - Plumbed through legacy-shape-adapter.ts: new optional
    `assertRequestIdBinding?` field on LegacyShapeAdapterInput is
    forwarded to DispositionEngineInput in buildEngineInput. Existing
    callers that don't supply the hook degrade to the pre-fix
    behaviour; new callers pass the production-grade RequestId.create
    comparer.

Tests: 7 new H4 regression tests covering:
  - Back-compat (hook absent → proof verification proceeds)
  - Hook returns ok=true (binding asserted, proof verified)
  - Hook receives both bundle requestId and authenticator unchanged
  - Hook returns ok=false (cryptoInvalid('proof-invalid'),
    verifyProof NOT invoked)
  - Hook throws (structuralInvalid('proof-throw'))
  - Binding gate fires BEFORE verifyProof (ordering invariant)
  - Null-proof chain entries do NOT trigger the binding gate

All 66 existing disposition-engine tests pass unchanged. All 1491
transfer tests pass. 8170 total unit tests pass. tsc clean.

Refs: #333 (H4). Stacked on #352 (H3) → #351 (H2) → #349 (H1) → #348 (C3)
→ #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…permanent

modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.

Fix:

  - Added optional `sourceTokenIds?: ReadonlyArray<string>` to
    `UxfTransferOutboxEntry`. The instant-sender now records the
    selected source set at outbox-record construction time via
    `OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
    on `undefined` for back-compat with entries written before H5.

  - Added optional `recoverFailedPermanentSources?(sources, outboxId)`
    hook to `FinalizationWorkerSenderOptions`. When wired, the worker
    invokes it once at the `failed-permanent` transition with the
    entry's recorded source tokenIds. Production wiring flips each
    source back to a spendable state (typically `confirmed`) gated
    on a same-source-no-other-live-outbox check; the audit's gap
    was that the worker provided no signal at all.

  - The hook is best-effort: throws are caught and emitted as a
    `transfer:failed` event for triage, but the `failed-permanent`
    transition itself stands. This preserves the terminal-state
    contract — a recovery hook failure must not re-introduce the
    locked-pending bug it exists to fix.

  - Optional shape preserves back-compat with the 90+ existing
    finalization-worker-sender tests (which do not set the hook).

Tests: 5 new H5 regression tests covering:
  - Hook fires with the recorded source ids on failed-permanent
  - Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
  - Hook absent: worker still transitions cleanly
  - Hook throws: failed-permanent stands, transfer:failed emitted
  - Hook fires AFTER the outbox transition (ordering invariant)

The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.

All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.

Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…permanent

modules/payments/transfer/finalization-worker-sender.ts transitioned
the outbox entry to `failed-permanent` on any hard-fail and never
touched the source tokens that the instant-sender had marked
`transferring`/`pending` at submit time. With orphan auto-recovery
default-OFF, the spender-side balance was permanently locked as
unspendable.

Fix:

  - Added optional `sourceTokenIds?: ReadonlyArray<string>` to
    `UxfTransferOutboxEntry`. The instant-sender now records the
    selected source set at outbox-record construction time via
    `OutboxBuildArgs.sourceTokenIds`. Optional with `[]` semantics
    on `undefined` for back-compat with entries written before H5.

  - Added optional `recoverFailedPermanentSources?(sources, outboxId)`
    hook to `FinalizationWorkerSenderOptions`. When wired, the worker
    invokes it once at the `failed-permanent` transition with the
    entry's recorded source tokenIds. Production wiring flips each
    source back to a spendable state (typically `confirmed`) gated
    on a same-source-no-other-live-outbox check; the audit's gap
    was that the worker provided no signal at all.

  - The hook is best-effort: throws are caught and emitted as a
    `transfer:failed` event for triage, but the `failed-permanent`
    transition itself stands. This preserves the terminal-state
    contract — a recovery hook failure must not re-introduce the
    locked-pending bug it exists to fix.

  - Optional shape preserves back-compat with the 90+ existing
    finalization-worker-sender tests (which do not set the hook).

Tests: 5 new H5 regression tests covering:
  - Hook fires with the recorded source ids on failed-permanent
  - Pre-H5 entries (no sourceTokenIds field) get `[]` (not skipped)
  - Hook absent: worker still transitions cleanly
  - Hook throws: failed-permanent stands, transfer:failed emitted
  - Hook fires AFTER the outbox transition (ordering invariant)

The "hook does NOT fire on success/in-progress" guarantee is
structurally enforced by code placement (inside the
`if (totalFailure > 0)` branch); a behavioural test would require
duplicating the full §6.1 fake-adapter wiring for a property the
code-review already shows.

All 1498 transfer tests pass. 8175 total unit tests pass. tsc clean.

Refs: #333 (H5). Stacked on #353 (H4) → #352 (H3) → #351 (H2) → #349 (H1)
→ #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…nifestCas

ManifestCas.update's precondition check compared `observed.rootHash`
(a string read from the entry) to `prev.contentHash` (the caller's
expected value). Both sides are arbitrary writer-supplied labels — a
CAS pass meant "the labels agree", not "the content under the label
is the content the caller meant". An attacker writing an entry with
a forged `rootHash` field that doesn't match the actual pool content
slipped through unchanged.

Fix:

  - New optional `VerifyEntryRootFn` hook accepted via the ManifestCas
    constructor's `opts.verifyEntryRoot`. When wired, the hook is
    invoked AFTER the label CAS passes and BEFORE the write. The hook
    fetches the content the entry's `rootHash` references (production:
    the UXF pool element), recomputes the hash, and asserts it matches
    the declaration.

  - New `'integrity-failed'` discriminator on `ManifestCasResult.reason`
    with structured `integrityDetail` for triage. The result also
    carries `observed.contentHash` so callers can re-read state.

  - `ManifestCidRewriteCasError.casReason` widened to surface
    `'integrity-failed'`. The worker's outer retry loop already
    treats any throw from manifest-cid-rewrite as a hard CAS failure
    that bubbles up; the new reason rides through unchanged but
    preserves structure for operator triage.

  - Skipped on the `prev === null` (asserts-no-entry) path because
    there is no observed entry to verify. Skipped on label-CAS-mismatch
    because the swap has already been rejected.

  - Optional shape preserves back-compat: all 9 existing manifest-cas
    tests and the 100% of production callers that do not (yet) supply
    the verifier continue to behave identically. Production wiring is
    a follow-up that does not block this PR.

Tests: 6 new H7 regression tests covering:
  - Back-compat (verifier omitted → label-only CAS as before)
  - Happy path (verifier called AFTER label CAS, BEFORE write)
  - Integrity failure (returns 'integrity-failed', no write happens,
    storage unchanged, integrityDetail propagated)
  - Skip on prev=null (no observed entry to verify)
  - Skip on label-CAS-mismatch (short-circuit before verifier)
  - 'integrity-failed' has its own discriminator distinct from
    cas-mismatch / not-found / concurrent-modification

All 1512 transfer tests pass. 8181 total unit tests pass. tsc clean.

Refs: #333 (H7). Stacked on #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
vrogojin added a commit that referenced this pull request May 30, 2026
…ER test-coverage gap

The soak (manual-test-full-recovery.sh) routinely surfaces
`[ERROR] [V6-RECOVER] Stranded receive <hex> hit permanent recipient-
address mismatch (HD-index recovery exhausted) (no retry):
VerificationError: Recipient address mismatch` at §C → §D. The
existing unit tests cover helpers in isolation
(PaymentsModule.recipient-address-mismatch-recovery.test.ts) and the
error CLASSIFIER (PaymentsModule.proof-polling-persistence.test.ts
#269), but neither proves the composition: "given a sibling HD index
in the recipient's inventory, tryRecover finds it AND the SDK's
verifyRecipient agrees on the derived address". This left CI blind to
the regression mode where tryRecover returns a false-positive
candidate that Token.update later rejects.

This commit closes the gap at three layers:

  L1. Real-SDK integration test
      (tests/integration/payments/v6-recover-real-sdk-recovery.test.ts):
      - Replaces the file-level vi.mock for SigningService /
        UnmaskedPredicate / TokenId / TokenType / HashAlgorithm with
        REAL SDK imports.
      - Builds two real HD-derived signing services.
      - Constructs the sender-targeted DIRECT address via real
        UnmaskedPredicate.create → reference.toAddress (the EXACT path
        the SDK's verifyRecipient uses).
      - Drives `tryRecoverSigningServiceForRecipient` against real
        TokenId / TokenType / Uint8Array salt.
      - 4 tests: happy path, no-match negative, deriveAddressInfo throws,
        and (L2 composition) tryRecover + predicate construction
        round-trip equals the transferTx target address.

  L3. CI workflow (.github/workflows/soak-nightly.yml):
      - Nightly cron + workflow_dispatch trigger.
      - Probes testnet aggregator + Nostr relay; skip-not-fail when
        either is unreachable (third-party outages cannot block
        releases).
      - Checks out sphere-cli alongside sphere-sdk, links the local
        build, runs manual-test-full-recovery.sh with SPHERE_DEBUG=*.
      - Captures soak.log + workspace as artifacts (30-day retention
        on failure, 7-day on success).
      - Surfaces metrics in the workflow summary: V6-RECOVER count,
        Stranded receive count, POINTER_MONOTONICITY_VIOLATION count,
        bcast_pub>0 count, exit code.

What's tested by what (audit roll-up):
  - SDK address-derivation regression in tryRecover → L1 (real SDK).
  - tryRecover ↔ predicate-construction composition → L1 composition
    test (proves verifyRecipient cannot throw against the recovered
    state).
  - Cross-process / Nostr / IPFS / multi-daemon regressions → L3 soak.
  - Helper-level edge cases (mock-friendly) → existing
    PaymentsModule.recipient-address-mismatch-recovery.test.ts.
  - Error-classification on hard fail → existing #269 test in
    PaymentsModule.proof-polling-persistence.test.ts.

Tests pass: 4/4 new integration tests + all 8185 unit tests. tsc clean.

Refs: #333 (V6-RECOVER test-coverage gap, follow-up to H1-H7 stack).
Stacked on #355 (H7) → #354 (H5) → #353 (H4) → #352 (H3) → #351 (H2)
→ #349 (H1) → #348 (C3) → #347 (C2) → #346 (C1).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant