Skip to content

fix(job): add wallet sync timeout recovery#46

Draft
dolcalmi wants to merge 2 commits intomainfrom
feat/wallet-sync-timeout-recovery
Draft

fix(job): add wallet sync timeout recovery#46
dolcalmi wants to merge 2 commits intomainfrom
feat/wallet-sync-timeout-recovery

Conversation

@dolcalmi
Copy link
Copy Markdown

@dolcalmi dolcalmi commented Apr 6, 2026

Summary

  • add warning (20m) and hard timeout (30m) guards around the BDK keychain sync stage in sync_wallet so hung sync attempts fail and retry automatically instead of requiring pod restarts
  • add an explicit JobError::BdkSyncTimeout variant with wallet/keychain context to improve incident diagnosis when a sync attempt exceeds the hard timeout
  • stop swallowing spawn_sync_wallet errors inside sync_all_wallets and emit structured error logs with account/wallet identifiers for per-wallet enqueue visibility

Add warning and hard timeout thresholds around BDK sync so hung wallet sync attempts fail and retry instead of requiring pod restarts. Improve observability by logging sync_wallet spawn failures with wallet/account context.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds guardrails around the BDK wallet/keychain sync job execution path to make hung sync attempts observable and to fail/retry instead of requiring pod restarts.

Changes:

  • Wrap keychain_wallet.sync() in a hard timeout and emit a warning when sync duration exceeds a threshold.
  • Introduce JobError::BdkSyncTimeout with wallet/keychain context for easier incident diagnosis.
  • Stop silently ignoring spawn_sync_wallet failures in sync_all_wallets by logging structured errors.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/job/sync_wallet.rs Adds warning + hard timeout handling around BDK keychain sync.
src/job/error.rs Adds a new BdkSyncTimeout error variant with contextual fields.
src/job/mod.rs Logs per-wallet failures when enqueueing/spawning sync_wallet jobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/job/sync_wallet.rs Outdated
Comment on lines +112 to +124
let sync_start = tokio::time::Instant::now();
let bdk_sync_outcome =
tokio::time::timeout(BDK_SYNC_HARD_TIMEOUT, keychain_wallet.sync(blockchain)).await;
let sync_elapsed = sync_start.elapsed();
if sync_elapsed >= BDK_SYNC_WARN_AFTER {
tracing::warn!(
wallet_id = %data.wallet_id,
keychain_id = %keychain_id,
elapsed_secs = sync_elapsed.as_secs(),
warn_after_secs = BDK_SYNC_WARN_AFTER.as_secs(),
"bdk sync exceeded warning threshold"
);
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning threshold is only evaluated after the BDK sync future completes (or times out), so you won't emit a warning at ~20 minutes while the sync is still running/hung; you'll only see it at completion/timeout. If the intent is an in-flight early warning, consider running a separate timer (e.g., a sleep/select) that logs once the warn threshold is crossed while continuing to await the sync until the hard timeout.

Copilot uses AI. Check for mistakes.
Comment thread src/job/sync_wallet.rs
Comment on lines +112 to +135
let sync_start = tokio::time::Instant::now();
let bdk_sync_outcome =
tokio::time::timeout(BDK_SYNC_HARD_TIMEOUT, keychain_wallet.sync(blockchain)).await;
let sync_elapsed = sync_start.elapsed();
if sync_elapsed >= BDK_SYNC_WARN_AFTER {
tracing::warn!(
wallet_id = %data.wallet_id,
keychain_id = %keychain_id,
elapsed_secs = sync_elapsed.as_secs(),
warn_after_secs = BDK_SYNC_WARN_AFTER.as_secs(),
"bdk sync exceeded warning threshold"
);
}
match bdk_sync_outcome {
Ok(Ok(())) => {}
Ok(Err(e)) => return Err(e.into()),
Err(_) => {
return Err(JobError::BdkSyncTimeout {
timeout_secs: BDK_SYNC_HARD_TIMEOUT.as_secs(),
wallet_id: data.wallet_id.to_string(),
keychain_id: keychain_id.to_string(),
})
}
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio::time::timeout will stop waiting, but it won’t actually stop the underlying work if keychain_wallet.sync() is blocked inside spawn_blocking (see KeychainWallet::sync using tokio::task::spawn_blocking). On timeout, the blocking sync can continue running in the background, potentially piling up stuck blocking threads and/or overlapping keychain syncs, which undermines the “recovery” goal. Consider ensuring the underlying sync is bounded/cancellable (e.g., add an internal cancellation/timeout mechanism inside the blocking section, or restructure so the timed-out work cannot keep consuming blocking threads).

Copilot uses AI. Check for mistakes.
Switch sync timeout handling to threshold logging without early return so spawn_blocking sync work is not orphaned in the background. Keep warning and hard-threshold visibility while awaiting completion of the active keychain sync.
@dolcalmi dolcalmi marked this pull request as draft April 6, 2026 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants