fix(payments): V6-RECOVER #387/#389/#390 — durable invalid + root-cause source-state reconstruction#388
Merged
Conversation
…des ledger; soak gate V6-RECOVER permanent-mismatch verdicts were silently downgraded back to 'pending' on every load: `determineTokenStatus` (in txf-serializer) only encodes `pending`/`confirmed` from the TXF transactions array, so the application-level `'invalid'` write was lost on the next reload. `aggregateTokens` then surfaced the unspendable token as unconfirmed UCT, polluting balance across repeated `payments receive --finalize` calls. The persistent `v6RecoverPermanent` ledger added in #378 was correct on disk but never consulted at the balance / re-ingest surfaces. This commit makes the ledger authoritative without changing the TXF format: * New helper `applyV6RecoverPermanentInvalidStatus()` walks `this.tokens` and patches matching entries to `status='invalid'`. * Invoked at the end of every `loadFromStorageData` (initial load + every `sync()` round-trip) and at the end of `restoreV6RecoverPermanent` (handles cold-start ordering where the ledger hydrates AFTER the token map). * `aggregateTokens` adds a belt-and-braces ledger filter so balance NEVER includes a ledgered token, even if a future caller mutates status independently. * `addToken` honors the ledger on Nostr at-least-once replay — incoming tokens whose canonical id matches the verdict are persisted as `'invalid'`. We accept the write (return true) so the Nostr cursor advances; rejecting would cause perpetual replay. * Ledger lookups are canonical-id-first via the new `isV6RecoverPermanentToken(token, mapKey?)` helper — resolves via `extractTokenIdFromSdkData(sdkData)` with map-key fallback so UUID-keyed `addToken` entries still classify correctly. * `finalizeStrandedReceivedToken` keys the ledger by canonical genesis tokenId (was: map key, ambiguous post-replay) and awaits `saveV6RecoverPermanent()` so a process exit immediately after the verdict cannot lose the entry. Tests: * tests/unit/modules/PaymentsModule.v6-recover-invalid-status-387.test.ts — 16 unit tests covering helper mechanics, getBalance filter, restoreV6RecoverPermanent re-application, loadFromStorageData round-trip, addToken honoring ledger, and all 4 acceptance criteria from #387. * tests/integration/payments/v6-recover-invalid-status-387.test.ts — 4 e2e tests across the full public `module.load()` lifecycle with simulated CLI restarts. * Both files fail without the fix (verified via `git stash`) and pass with the fix. Soak hardening (`manual-test-full-recovery.sh`): * New `assert_no_unconfirmed_after_finalize` gate invoked at 5 post-`receive --finalize` sites (§A.1, §A.2, §C.2, §D.4 alice, §D.4 bob) — would have caught #387 on the first run. * `normalize_snapshot` strips `(+ N unconfirmed)`, `[X+Y tokens]`, and `(N tokens)` clauses so existing `assert_diff_empty` calls compare CONFIRMED balances only. Unconfirmed pollution is caught by the dedicated gate, not masked equally on both sides of a diff. * Self-tests T8 + T9 added; all 9 normalize self-tests pass. Soak run on testnet (779s, ALL GREEN): * All 14 ASSERT OK including 5 new #387 gates. * Zero address-mismatch / VerificationError / stranded receive diagnostics — the fix held end-to-end across `sphere clear` + full recovery, the exact scenario from the issue. Closes #387.
Closed
7 tasks
…cipline + decimal soak gate Addresses all 15 findings from issue #389, the code-review follow-up to PR #388 (#387 fix). The PR #388 soak was ALL GREEN but the review identified bypass paths the soak did not exercise and an instrumentation gap that would silently let future pollution slip past the new gate (gate used `[1-9][0-9]*` against real CLI output that emits decimal amounts like `(+ 0.0000001 unconfirmed)`). Critical (PR #388 gap closure): * #1 — `finalizeReceivedToken` now consults `isV6RecoverPermanentToken` BEFORE the status write. Without this, a restored proof-polling job on cold start (or any stale in-memory job) could overwrite a ledgered `'invalid'` with `'confirmed'` on the next proof arrival, re-introducing the exact regression #387 closed. The guard also cleans up the polling job so subsequent ticks don't re-enter. * #2 — `assert_no_unconfirmed_after_finalize` widened to match decimal amounts. The CRITICAL constraint (per the integer-only- internals review note): the gate MUST NOT coerce the matched amount through `awk +0` (IEEE-754 double — silently lossy above 2^53). The new check is pure string/character-class matching: pattern `\(\+ [0-9.]*[1-9][0-9.]* unconfirmed\)` requires at least one non-zero digit anywhere in the amount run, valid at arbitrary precision. Mirrors the SDK's bigint-only aggregation in `aggregateTokens`. * #3 — `normalize_snapshot` sed pattern widened `[0-9]+` → `[0-9.]+` so diff-based gates no longer false-pass/fail on decimal-amount snapshots. High: * #4 — `addToken` patches the caller's reference IN PLACE (`token.status = 'invalid'`) instead of rebinding a local via spread-into-new-object. Callers commonly read the incoming token after addToken returns to populate event payloads (the `emitEvent('transfer:incoming', { tokens: [incoming] })` pattern); the spread version left the caller seeing the pre-patch status. * #5 — `updateToken` mirrors addToken's ledger consult. A finalization worker or future internal caller could otherwise hand `updateToken` a `'confirmed'` status for a ledgered token and silently overwrite the durable `'invalid'` verdict. * #6 — `load()` reorders `restoreV6RecoverPermanent` BEFORE `restoreProofPollingJobs`. Re-registered polling jobs for ledgered tokens now skip registration entirely; the previous ordering left a cold-start race window in which a polling tick firing between the two calls would call `finalizeReceivedToken` on a token whose ledger entry had not yet hydrated. Medium: * #7 — `validate()` short-circuits ledgered tokens straight to the `invalid` array. Eliminates wasted aggregator round-trips and prevents the misleading-`valid=true`-on-permanently-rejected-token return shape. * #8 — `scheduleResolveUnconfirmed` predicate adds the ledger check so the periodic retry timer isn't armed during the cold-start window for tokens that will be patched once `applyV6RecoverPermanent- InvalidStatus` runs. Restores symmetry with `hasUnconfirmedOrInflight`. * #9 — `getTokens()` returns a patched view for ledgered tokens even before `applyV6RecoverPermanentInvalidStatus` has run on the in-memory map. Closes the cold-start window for AccountingModule / SwapModule / direct API consumers. Hot path (empty ledger) early-exits before the array walk. Low / cleanup: * #10 — `applyV6RecoverPermanentInvalidStatus` adds `'transferring'` to the early-exit set. A `'transferring'` token represents an in-flight outbound send; flipping it would abort the send mid-way. * #11 — `saveV6RecoverPermanent` failure escalates to `logger.error` and schedules an exponential-backoff retry (2/4/8/16/32 s, capped at 5 attempts; cleared on destroy + address switch). Deliberately does NOT emit `transfer:operator-alert` — its `code` field is a strict 14-value `DispositionReason` union (snapshot-tested, requires ADR for additions) which represents transfer-disposition outcomes, not storage failures. * #12 — Strengthens the "Non-ledgered token continues to surface" integration test with explicit `confirmedAmount === '7'` + `confirmedTokenCount === 1` assertions. Also fixes the test fixture: pre-#389 the healthy token's masked predicate didn't bind to the wallet and was archived by `loadFromStorageData`'s balance-model invariant, so the original test couldn't have caught an over-broad ledger filter regression. Switched to unmasked predicate using the wallet's chainPubkey. * #13 — `applyV6RecoverPermanentInvalidStatus` drops the `token.updatedAt = Date.now()` bump. The patch runs on every TXF reload (because `determineTokenStatus` re-derives status from transactions/inclusionProofs only); bumping `updatedAt` polluted the "last meaningful change" signal that downstream observers (AccountingModule, history projection, UI sort) read. * #14 — Diagnostic grep on gate failure uses the same widened pattern as the gate itself so operators see exactly the lines that tripped, not unrelated 'unconfirmed' log lines. * #15 — New integration test that loads a #378-shaped ledger payload (canonical-id-keyed) through the #388 + #389 stack and asserts status, balance, recovery, AND `getTokens()` all agree. Confirms zero migration risk for any pre-existing #378 ledger payloads. Tests: * tests/unit/modules/PaymentsModule.v6-recover-invalid-status-387.test.ts grows by 7 #389 regression tests (now 23 total). Each pins one of the new guards: finalizeReceivedToken ledger short-circuit, addToken in-place mutation, updateToken ledger coercion, scheduleResolveUnconfirmed predicate, getTokens patched view, applyV6RecoverPermanentInvalidStatus skipping 'transferring', applyV6RecoverPermanentInvalidStatus NOT bumping `updatedAt`. * tests/integration/payments/v6-recover-invalid-status-387.test.ts grows from 4 to 5 tests — strengthened positive case (#12) + #378 backward-compat (#15). * Soak self-tests (`RUN_NORMALIZE_TESTS=1`) gain T10 (decimal- amount detection: clean / decimal-pollution / zero-decimal / zero-integer) and T11 (normalize_snapshot strips decimal `(+ N.M unconfirmed)` clauses). All 11 pass. Verification: * `npx tsc --noEmit` — clean. * `npx vitest run tests/unit` — 8286 pass / 2 skipped (pre-existing). * `npx vitest run tests/integration/payments/` — 31 pass. * `npx eslint` on changed files — no new errors. * `RUN_NORMALIZE_TESTS=1 bash manual-test-full-recovery.sh` — 11 pass. Closes #389.
7 tasks
…oot cause Adds `manual-test-simple-send.sh`, a self-contained reproduction of the fresh-wallet direct-send failure tracked in issue #390. Two freshly- created testnet wallets, alice faucets 100 UCT, alice runs `sphere payments send @bob 10 UCT`, bob runs `payments receive --finalize`. Alice's CONFIRMED balance correctly drops by 10 UCT but bob's stays at 0 because the recipient predicate in the minted token fails to bind to bob's HD signers — V6-RECOVER classifies it as permanent and stamps the durable invalid ledger. The script's verification gate is INTEGER-ONLY at arbitrary precision: * extracts UCT amount from `sphere balance` decimal output via string pad-and-concat (split on `.`, left-pad fractional part to UCT's 8 decimals, concatenate, strip leading zeros). No `awk +0`, no `bc`, no float coercion — matches the SDK's bigint aggregation rule. * reuses the decimal-aware unconfirmed-residue pattern from #389 #2: `\(\+ [0-9.]*[1-9][0-9.]* unconfirmed\)` — pure string/character-class check. * compares smallest-unit integer deltas with shell arithmetic; for 10 UCT (1_000_000_000) this comfortably fits int64, but the extraction step itself is precision-safe to any string-length amount. Sections of the repro: 1. Create alice wallet 2. Create bob wallet 3. Top up alice (faucet + sync + finalize) 4. Baseline bob (expected zero UCT) 5. alice → @bob (10 UCT, instant) 6. bob payments receive --finalize 7. Snapshot final balances 8. Verify exact ±10 UCT CONFIRMED delta + no unconfirmed residue The PR #388 soak (`manual-test-full-recovery.sh`) passes because it exercises the invoice-payment path (which bypasses nametag → predicate resolution), so the regression suite did not catch this. The direct- send path requires investigation per #390's "diff success path against failure path" strategy. Tracks #390. Symptom-handling work for V6-RECOVER permanent verdicts is on commit a9e6242 (#387 / PR #388) + bc28737 (#389 review follow-up).
…tate `recoverStrandedReceivedTokens` rebuilt the source-at-state-N-1 by spreading the persisted `parsed` JSON and stripping the last transaction, but kept the top-level `state` field unchanged. For instant-mode receives, that field carries the RECIPIENT's predicate (written by `saveCommitmentOnlyToken` so the on-disk shape advertises the recipient as owner once finalize completes). The source token used by `Token.update` → `transaction.verify` → `verifyRecipient` therefore exposed `state.predicate.publicKey = recipient` while `genesis.data.recipient = sender's directAddress` — the SDK unconditionally returns `'Recipient address mismatch'`. V6-RECOVER then stamped the durable permanent-invalid verdict and the recipient silently lost the value on every fresh alice→bob `sphere payments send`. Replace `state` with `lastTxJson.data.sourceState` (the sender's mint state) so verify on the source-at-state-N-1 sees a consistent (state.predicate ⇒ genesis.recipient) shape. Mirrors the correct pattern already in `tryLocalFinalizeUnconfirmed` (~line 10249) and the UXF assembly path (~line 16284). #387/#388/#389 only stamped the durable verdict — they did NOT fix this construction; this is the root cause filed as #390. Regression coverage: - Unit: `PaymentsModule.proof-polling-persistence.test.ts` — "recoverStrandedReceivedTokens rewrites sourceTokenJson.state to the transfer sourceState (#390)" pins the invariant by inserting distinct markers in the top-level state vs the last-tx sourceState and asserting the registered proof-polling job's sourceTokenJson carries the SENDER's marker. - E2E: `manual-test-simple-send.sh` (committed in e47da61) now passes ASSERT OK (bob-confirmed-rise-10-UCT) on a fresh alice→bob send, reproducibly failing before the fix.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #387, closes #389, closes #390.
This PR grew from a single-issue fix for #387 into a 3-issue V6-RECOVER hardening bundle. The original #387 work made the permanent-invalid verdict survive restart; the #389 follow-up tightened ledger discipline; and #390 finally fixed the root cause — the malformed source-token reconstruction that was producing the spurious "Recipient address mismatch" verdicts in the first place. Bundling keeps the chain of fixes deployable as a unit (each builds on the previous).
Commits
a9e6242'invalid'status across reload; balance excludes ledger; soak gate.bc28737e47da61manual-test-simple-send.sh) — deterministically reproduces the root-cause failure in ~30 s.2f9a690recoverStrandedReceivedTokensrebuilds source-token state fromlastTxJson.data.sourceState(was: kept top-level recipient-predicate state).b8b526d0.0.a1for downstream testing.#387 — Durable V6-RECOVER invalid status (original PR scope)
Summary
v6RecoverPermanentledger is now authoritative for balance + recovery (was: silently lost on TXF round-trip).aggregateTokens,addToken,loadFromStorageData,restoreV6RecoverPermanent, and the V6-RECOVER stamping site all consult / re-apply the ledger via a canonical-id-first lookup helper.assert_no_unconfirmed_after_finalizegate + confirmed-only balance diff. Testnet soak: ALL GREEN, 779 s.Why the bug existed
txfToToken → determineTokenStatusre-derivesToken.statusfrom the TXF transactions array only —'invalid'is never encoded. The in-memory'invalid'write thatfinalizeStrandedReceivedTokenmakes after a V6-RECOVER permanent-fail was silently downgraded back to'pending'on the nextloadFromStorageData.aggregateTokensthen included the unspendable token in unconfirmed UCT.The
v6RecoverPermanentledger added in #378 was the right defense-in-depth, but it was only consulted byrecoverStrandedReceivedTokensand the drain predicate — never by balance aggregation or the Nostr at-least-once re-ingest path. So the verdict survived restart on disk but the polluted balance kept resurfacing.What the fix does
applyV6RecoverPermanentInvalidStatus()patches in-memory tokens to'invalid'based on the ledger.loadFromStorageData— survives initial load AND everysync()reload.restoreV6RecoverPermanent— handles cold-start ordering (ledger arrives after tokens).aggregateTokensbelt-and-braces filter — balance NEVER includes a ledgered token.addTokenledger-aware ingest — Nostr at-least-once replay persists incoming token as'invalid'(still returnstrueso cursor advances; rejecting would cause perpetual replay).isV6RecoverPermanentToken(token, mapKey?)extracts canonical genesis tokenId fromsdkDatawith map-key fallback (handles UUID-keyedaddTokenentries).finalizeStrandedReceivedTokenkeys the ledger by canonical genesis tokenId (was: map key — ambiguous post-replay) and awaits the save (was: fire-and-forget).#389 — PR #388 review follow-up: ledger discipline + decimal-aware soak gate
Bundled hardening from the review of the #387 commit:
manual-test-full-recovery.shso any non-zero fractional UCT residue post-finalize fails the soak — previously the gate's integer-only regex passed0.0000001silently.#390 — Root cause: source-token state-reconstruction bug
Symptom
sphere payments send <amount> UCT @<recipient>on two freshly-created testnet wallets produced a token whose recipient predicate appeared not to bind to the recipient's HD signers. Bob'spayments receive --finalizehitVerificationError: Recipient address mismatchand #387 stamped the durable permanent-invalid verdict. End-to-end: alice -10 UCT confirmed, bob +0 UCT — funds effectively lost. The PR #388 soak (manual-test-full-recovery.sh) passed end-to-end because §D exercises the invoice-payment path; the direct-send path (sphere payments send) was broken on every fresh send.Root cause (NOT what the issue body hypothesised)
The send path was correct — alice resolved
@bobto bob's publisheddirectAddressand minted a transfer commitment targeting it. The bug was in bob's V6-RECOVER recovery path, specifically inrecoverStrandedReceivedTokens(~line 9626 pre-fix):This rebuilt the source-at-state-N-1 by spreading the persisted
parsedJSON and stripping the last transaction. But it kept the top-levelstatefield unchanged. For instant-mode receives, the ingestion path (saveCommitmentOnlyTokenand the UXF receive at line 2807) writesstate.predicate = recipient_predicateso the on-disk shape advertises bob as owner once finalize completes. The source token used byToken.update → transaction.verify → verifyRecipienttherefore exposedstate.predicate.publicKey = bobwhilegenesis.data.recipient = alice's directAddress— the SDK unconditionally returned'Recipient address mismatch'.#387/#388/#389 only stamped the durable verdict — they did NOT fix this construction. #390 is the actual root cause.
Fix
Replace
statewithlastTxJson.data.sourceState(= alice's mint state, the sender's predicate at state N-1):Mirrors the correct pattern already in
tryLocalFinalizeUnconfirmed(~line 10249) and the UXF assembly path (~line 16284).Regression coverage
PaymentsModule.proof-polling-persistence.test.ts— "recoverStrandedReceivedTokens rewrites sourceTokenJson.state to the transfer sourceState (fresh alice→bob direct send: recipient predicate doesn't bind to recipient HD signers (root cause behind #387/#388/#389) #390)" — pins the invariant by inserting distinct markers in the top-level state vs the last-txsourceStateand asserting the registered proof-polling job'ssourceTokenJsoncarries the SENDER's marker. Fails without the fix.manual-test-simple-send.sh(committed ine47da61, ~30 s against testnet). Pre-fix reliablyASSERT FAIL (bob-confirmed-rise-10-UCT); post-fix all 4 ASSERT OK.Test plan
npx vitest run tests/unit/modules/→ 1804 pass (0 fail; pre-existing flake on AccountingModule invoice-payment-attribution reproduced once across two runs, unrelated to this PR)npx vitest run tests/unit/modules/PaymentsModule.proof-polling-persistence.test.ts→ 29 pass (includes new fresh alice→bob direct send: recipient predicate doesn't bind to recipient HD signers (root cause behind #387/#388/#389) #390 regression test)npx vitest run tests/unit/modules/PaymentsModule.v6-recover-invalid-status-387.test.ts→ 23 passnpx vitest run tests/unit/modules/PaymentsModule.v6-recover-permanent-378.test.ts→ 5 passmanual-test-simple-send.sh) → ALL GREEN — alice -10 UCT, bob +10 UCT, both CONFIRMED, no residuemanual-test-full-recovery.shsoak — ALL GREEN at 513 s, 14 ASSERT OK (per PR #388 review follow-up: V6-RECOVER ledger consult gaps + soak gate regex misses decimals #389 verification)Acceptance criteria (#390)
sphere payments send 10 UCT @bobfollowed bysphere payments receive --finalizeon bob's side leaves bob with +10 UCT CONFIRMED and 0 unconfirmed, NO V6-RECOVER permanent-verdict fires.manual-test-simple-send.sh) gates an end-to-end clean run.