Skip to content

fix(payments): V6-RECOVER #387/#389/#390 — durable invalid + root-cause source-state reconstruction#388

Merged
vrogojin merged 5 commits into
mainfrom
fix/issue-387-v6-recover-invalid-persistence
Jun 4, 2026
Merged

fix(payments): V6-RECOVER #387/#389/#390 — durable invalid + root-cause source-state reconstruction#388
vrogojin merged 5 commits into
mainfrom
fix/issue-387-v6-recover-invalid-persistence

Conversation

@vrogojin
Copy link
Copy Markdown
Contributor

@vrogojin vrogojin commented Jun 3, 2026

Closes #387, closes #389, closes #390.

This PR grew from a single-issue fix for #387 into a 3-issue V6-RECOVER hardening bundle. The original #387 work made the permanent-invalid verdict survive restart; the #389 follow-up tightened ledger discipline; and #390 finally fixed the root cause — the malformed source-token reconstruction that was producing the spurious "Recipient address mismatch" verdicts in the first place. Bundling keeps the chain of fixes deployable as a unit (each builds on the previous).

Commits

SHA Issue Summary
a9e6242 #387 Durable V6-RECOVER 'invalid' status across reload; balance excludes ledger; soak gate.
bc28737 #389 PR #388 review follow-up — V6-RECOVER ledger discipline + decimal-aware soak gate.
e47da61 #390 Manual repro (manual-test-simple-send.sh) — deterministically reproduces the root-cause failure in ~30 s.
2f9a690 #390 Root-cause fix — recoverStrandedReceivedTokens rebuilds source-token state from lastTxJson.data.sourceState (was: kept top-level recipient-predicate state).
b8b526d release Version bump to 0.0.a1 for downstream testing.

#387 — Durable V6-RECOVER invalid status (original PR scope)

Summary

  • v6RecoverPermanent ledger is now authoritative for balance + recovery (was: silently lost on TXF round-trip).
  • aggregateTokens, addToken, loadFromStorageData, restoreV6RecoverPermanent, and the V6-RECOVER stamping site all consult / re-apply the ledger via a canonical-id-first lookup helper.
  • 16 unit tests + 4 integration tests, both files fail without the fix.
  • Soak script gains assert_no_unconfirmed_after_finalize gate + confirmed-only balance diff. Testnet soak: ALL GREEN, 779 s.

Why the bug existed

txfToToken → determineTokenStatus re-derives Token.status from the TXF transactions array only — 'invalid' is never encoded. The in-memory 'invalid' write that finalizeStrandedReceivedToken makes after a V6-RECOVER permanent-fail was silently downgraded back to 'pending' on the next loadFromStorageData. aggregateTokens then included the unspendable token in unconfirmed UCT.

The v6RecoverPermanent ledger added in #378 was the right defense-in-depth, but it was only consulted by recoverStrandedReceivedTokens and the drain predicate — never by balance aggregation or the Nostr at-least-once re-ingest path. So the verdict survived restart on disk but the polluted balance kept resurfacing.

What the fix does

  1. New helper applyV6RecoverPermanentInvalidStatus() patches in-memory tokens to 'invalid' based on the ledger.
  2. Called at end of loadFromStorageData — survives initial load AND every sync() reload.
  3. Called at end of restoreV6RecoverPermanent — handles cold-start ordering (ledger arrives after tokens).
  4. aggregateTokens belt-and-braces filter — balance NEVER includes a ledgered token.
  5. addToken ledger-aware ingest — Nostr at-least-once replay persists incoming token as 'invalid' (still returns true so cursor advances; rejecting would cause perpetual replay).
  6. Canonical-id-first lookup — new isV6RecoverPermanentToken(token, mapKey?) extracts canonical genesis tokenId from sdkData with map-key fallback (handles UUID-keyed addToken entries).
  7. finalizeStrandedReceivedToken keys the ledger by canonical genesis tokenId (was: map key — ambiguous post-replay) and awaits the save (was: fire-and-forget).

#389 — PR #388 review follow-up: ledger discipline + decimal-aware soak gate

Bundled hardening from the review of the #387 commit:

  • Tighter ledger guards on the V6-RECOVER stamping site (no fire-and-forget on the persist).
  • Decimal-aware unconfirmed-residue gate in manual-test-full-recovery.sh so any non-zero fractional UCT residue post-finalize fails the soak — previously the gate's integer-only regex passed 0.0000001 silently.

#390 — Root cause: source-token state-reconstruction bug

Symptom

sphere payments send <amount> UCT @<recipient> on two freshly-created testnet wallets produced a token whose recipient predicate appeared not to bind to the recipient's HD signers. Bob's payments receive --finalize hit VerificationError: Recipient address mismatch and #387 stamped the durable permanent-invalid verdict. End-to-end: alice -10 UCT confirmed, bob +0 UCT — funds effectively lost. The PR #388 soak (manual-test-full-recovery.sh) passed end-to-end because §D exercises the invoice-payment path; the direct-send path (sphere payments send) was broken on every fresh send.

Root cause (NOT what the issue body hypothesised)

The send path was correct — alice resolved @bob to bob's published directAddress and minted a transfer commitment targeting it. The bug was in bob's V6-RECOVER recovery path, specifically in recoverStrandedReceivedTokens (~line 9626 pre-fix):

const sourceTokenJsonObj = {
  ...parsed,
  transactions: txs.slice(0, -1),     // strip the last unfinalized tx
};

This rebuilt the source-at-state-N-1 by spreading the persisted parsed JSON and stripping the last transaction. But it kept the top-level state field unchanged. For instant-mode receives, the ingestion path (saveCommitmentOnlyToken and the UXF receive at line 2807) writes state.predicate = recipient_predicate so the on-disk shape advertises bob as owner once finalize completes. The source token used by Token.update → transaction.verify → verifyRecipient therefore exposed state.predicate.publicKey = bob while genesis.data.recipient = alice's directAddress — the SDK unconditionally returned 'Recipient address mismatch'.

#387/#388/#389 only stamped the durable verdict — they did NOT fix this construction. #390 is the actual root cause.

Fix

Replace state with lastTxJson.data.sourceState (= alice's mint state, the sender's predicate at state N-1):

const sourceTokenJsonObj = {
  ...parsed,
  state: (lastTxJson.data as { sourceState?: unknown }).sourceState,  // ← #390
  transactions: txs.slice(0, -1),
};

Mirrors the correct pattern already in tryLocalFinalizeUnconfirmed (~line 10249) and the UXF assembly path (~line 16284).

Regression coverage

  • Unit: PaymentsModule.proof-polling-persistence.test.ts"recoverStrandedReceivedTokens rewrites sourceTokenJson.state to the transfer sourceState (fresh alice→bob direct send: recipient predicate doesn't bind to recipient HD signers (root cause behind #387/#388/#389) #390)" — pins the invariant by inserting distinct markers in the top-level state vs the last-tx sourceState and asserting the registered proof-polling job's sourceTokenJson carries the SENDER's marker. Fails without the fix.
  • E2E: manual-test-simple-send.sh (committed in e47da61, ~30 s against testnet). Pre-fix reliably ASSERT FAIL (bob-confirmed-rise-10-UCT); post-fix all 4 ASSERT OK.

Test plan

Acceptance criteria (#390)

  • A fresh alice → fresh bob sphere payments send 10 UCT @bob followed by sphere payments receive --finalize on bob's side leaves bob with +10 UCT CONFIRMED and 0 unconfirmed, NO V6-RECOVER permanent-verdict fires.
  • Alice's CONFIRMED balance drops by exactly 10 UCT (smallest-unit integer comparison — no float coercion).
  • Manual reproduction (manual-test-simple-send.sh) gates an end-to-end clean run.
  • Regression unit test pinning the source-state reconstruction invariant.

…des ledger; soak gate

V6-RECOVER permanent-mismatch verdicts were silently downgraded back to
'pending' on every load: `determineTokenStatus` (in txf-serializer) only
encodes `pending`/`confirmed` from the TXF transactions array, so the
application-level `'invalid'` write was lost on the next reload.
`aggregateTokens` then surfaced the unspendable token as unconfirmed
UCT, polluting balance across repeated `payments receive --finalize`
calls. The persistent `v6RecoverPermanent` ledger added in #378 was
correct on disk but never consulted at the balance / re-ingest surfaces.

This commit makes the ledger authoritative without changing the TXF
format:

  * New helper `applyV6RecoverPermanentInvalidStatus()` walks
    `this.tokens` and patches matching entries to `status='invalid'`.
  * Invoked at the end of every `loadFromStorageData` (initial load +
    every `sync()` round-trip) and at the end of `restoreV6RecoverPermanent`
    (handles cold-start ordering where the ledger hydrates AFTER the
    token map).
  * `aggregateTokens` adds a belt-and-braces ledger filter so balance
    NEVER includes a ledgered token, even if a future caller mutates
    status independently.
  * `addToken` honors the ledger on Nostr at-least-once replay —
    incoming tokens whose canonical id matches the verdict are
    persisted as `'invalid'`. We accept the write (return true) so
    the Nostr cursor advances; rejecting would cause perpetual replay.
  * Ledger lookups are canonical-id-first via the new
    `isV6RecoverPermanentToken(token, mapKey?)` helper — resolves via
    `extractTokenIdFromSdkData(sdkData)` with map-key fallback so
    UUID-keyed `addToken` entries still classify correctly.
  * `finalizeStrandedReceivedToken` keys the ledger by canonical
    genesis tokenId (was: map key, ambiguous post-replay) and awaits
    `saveV6RecoverPermanent()` so a process exit immediately after
    the verdict cannot lose the entry.

Tests:
  * tests/unit/modules/PaymentsModule.v6-recover-invalid-status-387.test.ts
    — 16 unit tests covering helper mechanics, getBalance filter,
      restoreV6RecoverPermanent re-application, loadFromStorageData
      round-trip, addToken honoring ledger, and all 4 acceptance
      criteria from #387.
  * tests/integration/payments/v6-recover-invalid-status-387.test.ts
    — 4 e2e tests across the full public `module.load()` lifecycle
      with simulated CLI restarts.
  * Both files fail without the fix (verified via `git stash`) and
    pass with the fix.

Soak hardening (`manual-test-full-recovery.sh`):
  * New `assert_no_unconfirmed_after_finalize` gate invoked at 5
    post-`receive --finalize` sites (§A.1, §A.2, §C.2, §D.4 alice,
    §D.4 bob) — would have caught #387 on the first run.
  * `normalize_snapshot` strips `(+ N unconfirmed)`, `[X+Y tokens]`,
    and `(N tokens)` clauses so existing `assert_diff_empty` calls
    compare CONFIRMED balances only. Unconfirmed pollution is caught
    by the dedicated gate, not masked equally on both sides of a diff.
  * Self-tests T8 + T9 added; all 9 normalize self-tests pass.

Soak run on testnet (779s, ALL GREEN):
  * All 14 ASSERT OK including 5 new #387 gates.
  * Zero address-mismatch / VerificationError / stranded receive
    diagnostics — the fix held end-to-end across `sphere clear` +
    full recovery, the exact scenario from the issue.

Closes #387.
…cipline + decimal soak gate

Addresses all 15 findings from issue #389, the code-review follow-up
to PR #388 (#387 fix). The PR #388 soak was ALL GREEN but the review
identified bypass paths the soak did not exercise and an
instrumentation gap that would silently let future pollution slip past
the new gate (gate used `[1-9][0-9]*` against real CLI output that
emits decimal amounts like `(+ 0.0000001 unconfirmed)`).

Critical (PR #388 gap closure):

  * #1 — `finalizeReceivedToken` now consults `isV6RecoverPermanentToken`
    BEFORE the status write. Without this, a restored proof-polling job
    on cold start (or any stale in-memory job) could overwrite a
    ledgered `'invalid'` with `'confirmed'` on the next proof arrival,
    re-introducing the exact regression #387 closed. The guard also
    cleans up the polling job so subsequent ticks don't re-enter.

  * #2 — `assert_no_unconfirmed_after_finalize` widened to match
    decimal amounts. The CRITICAL constraint (per the integer-only-
    internals review note): the gate MUST NOT coerce the matched amount
    through `awk +0` (IEEE-754 double — silently lossy above 2^53). The
    new check is pure string/character-class matching: pattern
    `\(\+ [0-9.]*[1-9][0-9.]* unconfirmed\)` requires at least one
    non-zero digit anywhere in the amount run, valid at arbitrary
    precision. Mirrors the SDK's bigint-only aggregation in
    `aggregateTokens`.

  * #3 — `normalize_snapshot` sed pattern widened `[0-9]+` → `[0-9.]+`
    so diff-based gates no longer false-pass/fail on decimal-amount
    snapshots.

High:

  * #4 — `addToken` patches the caller's reference IN PLACE
    (`token.status = 'invalid'`) instead of rebinding a local via
    spread-into-new-object. Callers commonly read the incoming token
    after addToken returns to populate event payloads (the
    `emitEvent('transfer:incoming', { tokens: [incoming] })` pattern);
    the spread version left the caller seeing the pre-patch status.

  * #5 — `updateToken` mirrors addToken's ledger consult. A
    finalization worker or future internal caller could otherwise hand
    `updateToken` a `'confirmed'` status for a ledgered token and
    silently overwrite the durable `'invalid'` verdict.

  * #6 — `load()` reorders `restoreV6RecoverPermanent` BEFORE
    `restoreProofPollingJobs`. Re-registered polling jobs for ledgered
    tokens now skip registration entirely; the previous ordering left
    a cold-start race window in which a polling tick firing between the
    two calls would call `finalizeReceivedToken` on a token whose
    ledger entry had not yet hydrated.

Medium:

  * #7 — `validate()` short-circuits ledgered tokens straight to the
    `invalid` array. Eliminates wasted aggregator round-trips and
    prevents the misleading-`valid=true`-on-permanently-rejected-token
    return shape.

  * #8 — `scheduleResolveUnconfirmed` predicate adds the ledger check
    so the periodic retry timer isn't armed during the cold-start
    window for tokens that will be patched once `applyV6RecoverPermanent-
    InvalidStatus` runs. Restores symmetry with `hasUnconfirmedOrInflight`.

  * #9 — `getTokens()` returns a patched view for ledgered tokens
    even before `applyV6RecoverPermanentInvalidStatus` has run on the
    in-memory map. Closes the cold-start window for AccountingModule /
    SwapModule / direct API consumers. Hot path (empty ledger)
    early-exits before the array walk.

Low / cleanup:

  * #10 — `applyV6RecoverPermanentInvalidStatus` adds `'transferring'`
    to the early-exit set. A `'transferring'` token represents an
    in-flight outbound send; flipping it would abort the send mid-way.

  * #11 — `saveV6RecoverPermanent` failure escalates to `logger.error`
    and schedules an exponential-backoff retry (2/4/8/16/32 s, capped
    at 5 attempts; cleared on destroy + address switch). Deliberately
    does NOT emit `transfer:operator-alert` — its `code` field is a
    strict 14-value `DispositionReason` union (snapshot-tested,
    requires ADR for additions) which represents transfer-disposition
    outcomes, not storage failures.

  * #12 — Strengthens the "Non-ledgered token continues to surface"
    integration test with explicit `confirmedAmount === '7'` +
    `confirmedTokenCount === 1` assertions. Also fixes the test
    fixture: pre-#389 the healthy token's masked predicate didn't
    bind to the wallet and was archived by `loadFromStorageData`'s
    balance-model invariant, so the original test couldn't have caught
    an over-broad ledger filter regression. Switched to unmasked
    predicate using the wallet's chainPubkey.

  * #13 — `applyV6RecoverPermanentInvalidStatus` drops the
    `token.updatedAt = Date.now()` bump. The patch runs on every TXF
    reload (because `determineTokenStatus` re-derives status from
    transactions/inclusionProofs only); bumping `updatedAt` polluted
    the "last meaningful change" signal that downstream observers
    (AccountingModule, history projection, UI sort) read.

  * #14 — Diagnostic grep on gate failure uses the same widened
    pattern as the gate itself so operators see exactly the lines that
    tripped, not unrelated 'unconfirmed' log lines.

  * #15 — New integration test that loads a #378-shaped ledger
    payload (canonical-id-keyed) through the #388 + #389 stack and
    asserts status, balance, recovery, AND `getTokens()` all agree.
    Confirms zero migration risk for any pre-existing #378 ledger
    payloads.

Tests:

  * tests/unit/modules/PaymentsModule.v6-recover-invalid-status-387.test.ts
    grows by 7 #389 regression tests (now 23 total). Each pins one of
    the new guards: finalizeReceivedToken ledger short-circuit, addToken
    in-place mutation, updateToken ledger coercion,
    scheduleResolveUnconfirmed predicate, getTokens patched view,
    applyV6RecoverPermanentInvalidStatus skipping 'transferring',
    applyV6RecoverPermanentInvalidStatus NOT bumping `updatedAt`.

  * tests/integration/payments/v6-recover-invalid-status-387.test.ts
    grows from 4 to 5 tests — strengthened positive case (#12) +
    #378 backward-compat (#15).

  * Soak self-tests (`RUN_NORMALIZE_TESTS=1`) gain T10 (decimal-
    amount detection: clean / decimal-pollution / zero-decimal /
    zero-integer) and T11 (normalize_snapshot strips decimal
    `(+ N.M unconfirmed)` clauses). All 11 pass.

Verification:

  * `npx tsc --noEmit` — clean.
  * `npx vitest run tests/unit` — 8286 pass / 2 skipped (pre-existing).
  * `npx vitest run tests/integration/payments/` — 31 pass.
  * `npx eslint` on changed files — no new errors.
  * `RUN_NORMALIZE_TESTS=1 bash manual-test-full-recovery.sh` — 11 pass.

Closes #389.
vrogojin added 3 commits June 3, 2026 16:17
…oot cause

Adds `manual-test-simple-send.sh`, a self-contained reproduction of the
fresh-wallet direct-send failure tracked in issue #390. Two freshly-
created testnet wallets, alice faucets 100 UCT, alice runs
`sphere payments send @bob 10 UCT`, bob runs `payments receive
--finalize`. Alice's CONFIRMED balance correctly drops by 10 UCT but
bob's stays at 0 because the recipient predicate in the minted token
fails to bind to bob's HD signers — V6-RECOVER classifies it as
permanent and stamps the durable invalid ledger.

The script's verification gate is INTEGER-ONLY at arbitrary precision:

  * extracts UCT amount from `sphere balance` decimal output via string
    pad-and-concat (split on `.`, left-pad fractional part to UCT's 8
    decimals, concatenate, strip leading zeros). No `awk +0`, no `bc`,
    no float coercion — matches the SDK's bigint aggregation rule.
  * reuses the decimal-aware unconfirmed-residue pattern from #389 #2:
    `\(\+ [0-9.]*[1-9][0-9.]* unconfirmed\)` — pure
    string/character-class check.
  * compares smallest-unit integer deltas with shell arithmetic; for
    10 UCT (1_000_000_000) this comfortably fits int64, but the
    extraction step itself is precision-safe to any string-length
    amount.

Sections of the repro:

  1. Create alice wallet
  2. Create bob wallet
  3. Top up alice (faucet + sync + finalize)
  4. Baseline bob (expected zero UCT)
  5. alice → @bob (10 UCT, instant)
  6. bob payments receive --finalize
  7. Snapshot final balances
  8. Verify exact ±10 UCT CONFIRMED delta + no unconfirmed residue

The PR #388 soak (`manual-test-full-recovery.sh`) passes because it
exercises the invoice-payment path (which bypasses nametag → predicate
resolution), so the regression suite did not catch this. The direct-
send path requires investigation per #390's "diff success path against
failure path" strategy.

Tracks #390. Symptom-handling work for V6-RECOVER permanent verdicts
is on commit a9e6242 (#387 / PR #388) + bc28737 (#389 review follow-up).
…tate

`recoverStrandedReceivedTokens` rebuilt the source-at-state-N-1 by
spreading the persisted `parsed` JSON and stripping the last
transaction, but kept the top-level `state` field unchanged. For
instant-mode receives, that field carries the RECIPIENT's predicate
(written by `saveCommitmentOnlyToken` so the on-disk shape advertises
the recipient as owner once finalize completes). The source token used
by `Token.update` → `transaction.verify` → `verifyRecipient` therefore
exposed `state.predicate.publicKey = recipient` while
`genesis.data.recipient = sender's directAddress` — the SDK
unconditionally returns `'Recipient address mismatch'`. V6-RECOVER
then stamped the durable permanent-invalid verdict and the recipient
silently lost the value on every fresh alice→bob `sphere payments
send`.

Replace `state` with `lastTxJson.data.sourceState` (the sender's mint
state) so verify on the source-at-state-N-1 sees a consistent
(state.predicate ⇒ genesis.recipient) shape. Mirrors the correct
pattern already in `tryLocalFinalizeUnconfirmed` (~line 10249) and the
UXF assembly path (~line 16284). #387/#388/#389 only stamped the
durable verdict — they did NOT fix this construction; this is the
root cause filed as #390.

Regression coverage:
- Unit: `PaymentsModule.proof-polling-persistence.test.ts` —
  "recoverStrandedReceivedTokens rewrites sourceTokenJson.state to
  the transfer sourceState (#390)" pins the invariant by inserting
  distinct markers in the top-level state vs the last-tx sourceState
  and asserting the registered proof-polling job's sourceTokenJson
  carries the SENDER's marker.
- E2E: `manual-test-simple-send.sh` (committed in e47da61) now passes
  ASSERT OK (bob-confirmed-rise-10-UCT) on a fresh alice→bob send,
  reproducibly failing before the fix.
Marks the V6-RECOVER #390 fix branch (currently
fix/issue-387-v6-recover-invalid-persistence with the bundled
#387/#389/#390 commits) for downstream testing per
maintainer request.
@vrogojin vrogojin changed the title fix(payments)(#387): durable V6-RECOVER invalid status; balance excludes ledger fix(payments): V6-RECOVER #387/#389/#390 — durable invalid + root-cause source-state reconstruction Jun 3, 2026
@vrogojin vrogojin merged commit 2dfd7d7 into main Jun 4, 2026
3 checks passed
@vrogojin vrogojin deleted the fix/issue-387-v6-recover-invalid-persistence branch June 4, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant