Conversation
Add warning and hard timeout thresholds around BDK sync so hung wallet sync attempts fail and retry instead of requiring pod restarts. Improve observability by logging sync_wallet spawn failures with wallet/account context.
There was a problem hiding this comment.
Pull request overview
Adds guardrails around the BDK wallet/keychain sync job execution path to make hung sync attempts observable and to fail/retry instead of requiring pod restarts.
Changes:
- Wrap
keychain_wallet.sync()in a hard timeout and emit a warning when sync duration exceeds a threshold. - Introduce
JobError::BdkSyncTimeoutwith wallet/keychain context for easier incident diagnosis. - Stop silently ignoring
spawn_sync_walletfailures insync_all_walletsby logging structured errors.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/job/sync_wallet.rs |
Adds warning + hard timeout handling around BDK keychain sync. |
src/job/error.rs |
Adds a new BdkSyncTimeout error variant with contextual fields. |
src/job/mod.rs |
Logs per-wallet failures when enqueueing/spawning sync_wallet jobs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let sync_start = tokio::time::Instant::now(); | ||
| let bdk_sync_outcome = | ||
| tokio::time::timeout(BDK_SYNC_HARD_TIMEOUT, keychain_wallet.sync(blockchain)).await; | ||
| let sync_elapsed = sync_start.elapsed(); | ||
| if sync_elapsed >= BDK_SYNC_WARN_AFTER { | ||
| tracing::warn!( | ||
| wallet_id = %data.wallet_id, | ||
| keychain_id = %keychain_id, | ||
| elapsed_secs = sync_elapsed.as_secs(), | ||
| warn_after_secs = BDK_SYNC_WARN_AFTER.as_secs(), | ||
| "bdk sync exceeded warning threshold" | ||
| ); | ||
| } |
There was a problem hiding this comment.
The warning threshold is only evaluated after the BDK sync future completes (or times out), so you won't emit a warning at ~20 minutes while the sync is still running/hung; you'll only see it at completion/timeout. If the intent is an in-flight early warning, consider running a separate timer (e.g., a sleep/select) that logs once the warn threshold is crossed while continuing to await the sync until the hard timeout.
| let sync_start = tokio::time::Instant::now(); | ||
| let bdk_sync_outcome = | ||
| tokio::time::timeout(BDK_SYNC_HARD_TIMEOUT, keychain_wallet.sync(blockchain)).await; | ||
| let sync_elapsed = sync_start.elapsed(); | ||
| if sync_elapsed >= BDK_SYNC_WARN_AFTER { | ||
| tracing::warn!( | ||
| wallet_id = %data.wallet_id, | ||
| keychain_id = %keychain_id, | ||
| elapsed_secs = sync_elapsed.as_secs(), | ||
| warn_after_secs = BDK_SYNC_WARN_AFTER.as_secs(), | ||
| "bdk sync exceeded warning threshold" | ||
| ); | ||
| } | ||
| match bdk_sync_outcome { | ||
| Ok(Ok(())) => {} | ||
| Ok(Err(e)) => return Err(e.into()), | ||
| Err(_) => { | ||
| return Err(JobError::BdkSyncTimeout { | ||
| timeout_secs: BDK_SYNC_HARD_TIMEOUT.as_secs(), | ||
| wallet_id: data.wallet_id.to_string(), | ||
| keychain_id: keychain_id.to_string(), | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
tokio::time::timeout will stop waiting, but it won’t actually stop the underlying work if keychain_wallet.sync() is blocked inside spawn_blocking (see KeychainWallet::sync using tokio::task::spawn_blocking). On timeout, the blocking sync can continue running in the background, potentially piling up stuck blocking threads and/or overlapping keychain syncs, which undermines the “recovery” goal. Consider ensuring the underlying sync is bounded/cancellable (e.g., add an internal cancellation/timeout mechanism inside the blocking section, or restructure so the timed-out work cannot keep consuming blocking threads).
Switch sync timeout handling to threshold logging without early return so spawn_blocking sync work is not orphaned in the background. Keep warning and hard-threshold visibility while awaiting completion of the active keychain sync.
Summary
sync_walletso hung sync attempts fail and retry automatically instead of requiring pod restartsJobError::BdkSyncTimeoutvariant with wallet/keychain context to improve incident diagnosis when a sync attempt exceeds the hard timeoutspawn_sync_walleterrors insidesync_all_walletsand emit structured error logs with account/wallet identifiers for per-wallet enqueue visibility