fix: graceful shutdown of bg goroutines before DB close by Dunsin-cyber · Pull Request #158 · arkade-os/go-sdk

Dunsin-cyber · 2026-04-29T08:11:52Z

fixes #154

Added a goroutineWg sync.WaitGroup to arkClient to track the 4 background goroutines (listenForArkTxs, listenForOnchainTxs, listenDbEvents, periodicRefreshDb). Add(4) is called before the outer goroutine in Unlock() so the counter is set before any shutdown can race it.

Summary by CodeRabbit

Bug Fixes
- Improved shutdown synchronization to ensure background processes properly terminate before lock/unlock transitions complete, enhancing reliability and stability.

coderabbitai · 2026-04-29T08:12:08Z

Warning

Rate limit exceeded

@Dunsin-cyber has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 34 minutes and 39 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ba6c1107-efa3-4510-86e1-931db4f355cd

📥 Commits

Reviewing files that changed from the base of the PR and between 1106180 and 5307875.

📒 Files selected for processing (1)

init.go

Walkthrough

The PR introduces a sync.WaitGroup to the arkClient struct to coordinate graceful shutdown of background goroutines before database closure. Four long-running goroutines defer Done() calls to signal completion. During lock, the client signals goroutines to stop and waits for completion before proceeding.

Changes

Cohort / File(s)	Summary
Client shutdown synchronization `client.go`, `init.go`	Adds `sync.WaitGroup` to `arkClient` and updates initialization/shutdown logic. Goroutines (`listenForArkTxs`, `listenForOnchainTxs`, `listenDbEvents`, `periodicRefreshDb`) now defer `Done()` calls. `Unlock` signals `Add(4)` before spawning background tasks. `Lock` blocks on `goroutineWg.Wait()` after invoking `stopFn()` to ensure all goroutines complete before database closure.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: implementing graceful shutdown of background goroutines before database closure, which directly addresses the race condition issue.
Linked Issues check	✅ Passed	Changes implement graceful shutdown via WaitGroup to coordinate goroutine completion before DB close, directly addressing issue `#154`'s requirement.
Out of Scope Changes check	✅ Passed	All changes are focused on adding WaitGroup synchronization to background goroutines, directly related to the linked issue objective.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 34 minutes and 39 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

arkanaai

Arkana Code Review — #158

The intent is correct: wait for background goroutines to drain before closing the DB. However, the current implementation has a critical deadlock bug and a silent hang risk. Requesting changes.

🔴 CRITICAL: Deadlock when outer goroutine exits early

init.go:101 — a.goroutineWg.Add(4) is called unconditionally before the outer goroutine, but the 4 inner goroutines are only spawned at lines 123-128, after discoverHDWalletKeys, finalizePendingTxs, and refreshDb all succeed.

If discoverHDWalletKeys fails (line 106-110), the outer goroutine returns early. The 4 child goroutines are never launched, so Done() is never called 4 times. Any subsequent call to Stop(), Lock(), or Reset() will call goroutineWg.Wait() and deadlock forever.

// init.go — current code in PR
a.goroutineWg.Add(4)       // counter = 4
go func() {
    // ...
    if _, err := a.discoverHDWalletKeys(ctx); err != nil {
        a.syncCh <- err
        close(a.syncCh)
        return              // ← EARLY EXIT: 4 Done()s never happen → DEADLOCK
    }
    // ... only here are the 4 goroutines spawned
}()

Fix: Move Add(4) to just before the 4 go calls (line 122), or add compensating Done() calls on every early-return path. The cleaner approach:

go func() {
    a.Explorer().Start()
    // ...
    if _, err := a.discoverHDWalletKeys(ctx); err != nil {
        a.syncCh <- err
        close(a.syncCh)
        return
    }
    // ...
    a.goroutineWg.Add(4)  // ← move here, right before spawning
    go a.listenForArkTxs(ctx)
    go a.listenForOnchainTxs(ctx)
    go a.listenDbEvents(ctx)
    go a.periodicRefreshDb(ctx)
}()

🔴 HIGH: Shutdown during init phase also deadlocks

Even if discoverHDWalletKeys succeeds, if Lock()/Stop()/Reset() is called while the outer goroutine is still running Explorer().Start(), discoverHDWalletKeys(), or refreshDb() — i.e. before the 4 goroutines are launched — goroutineWg.Wait() will still deadlock because Add(4) already ran but no Done() has been called yet.

This is the same root cause as above: the Add and the go launches are not atomic and can be interrupted by shutdown.

Moving Add(4) to just before the goroutine launches (as shown above) fixes both issues, because if shutdown happens before Add(4), the WaitGroup is still at 0 and Wait() returns immediately.

🟡 MEDIUM: No timeout / safety net on `Wait()`

If any of the 4 goroutines hangs (e.g. a blocking gRPC stream that doesn't respect context cancellation), goroutineWg.Wait() will block the caller forever. Consider adding a timeout:

done := make(chan struct{})
go func() {
    a.goroutineWg.Wait()
    close(done)
}()
select {
case <-done:
case <-time.After(10 * time.Second):
    log.Warn("timed out waiting for background goroutines to stop")
}

🟡 MEDIUM: Missing test coverage

There are no tests for:

Stop() / Lock() when discoverHDWalletKeys fails (would catch the deadlock)
Stop() called during the init phase (before goroutines are launched)
Unlock() → Lock() → Unlock() cycle (WaitGroup reuse correctness)

Given this is fixing a race condition, at minimum a test that calls Stop() shortly after Unlock() (to exercise the early-shutdown path) would be valuable.

✅ Good

defer a.goroutineWg.Done() at the top of each goroutine is correct — covers early returns (e.g. wallet == nil checks, refreshDbInterval == 0).
All three shutdown paths (Stop, Lock, Reset) are covered.
The stopOnce in Stop() ensures Wait() isn't called concurrently from multiple callers.

Summary: The approach is sound but the placement of Add(4) creates a guaranteed deadlock on any early-exit path in the outer goroutine. Move Add(4) to immediately before the 4 go calls to fix.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

client.go (1)
1089-1142: ⚠️ Potential issue | 🟠 Major

Inner publisher goroutines in listenDbEvents are not tracked by goroutineWg.

The three go func() publishers spawned at lines 1108, 1121, and 1134 execute outside the goroutine tracking mechanism. When Stop() or Reset() call a.goroutineWg.Wait(), they return while these publishers may still be publishing events, allowing shutdown to complete before event publication finishes. Add a.goroutineWg.Add(1) before each publisher goroutine, or publish inline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client.go` around lines 1089 - 1142, The publisher goroutines inside
listenDbEvents are not tracked by a.goroutineWg, so shutdown via Stop()/Reset()
can finish before those goroutines complete; fix by incrementing the wait group
before spawning each anonymous publisher (call a.goroutineWg.Add(1) immediately
before each go func for the UtxoStore, VtxoStore and TransactionStore event
handlers) and ensuring each goroutine defers a.goroutineWg.Done() at its start,
or alternatively remove the go and publish inline (calling
a.utxoBroadcaster.publish, a.vtxoBroadcaster.publish, a.txBroadcaster.publish
synchronously) so the work is accounted for by the existing goroutine lifecycle.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@init.go`:
- Around line 101-104: The stopFn is published before the wait-group counter is
incremented, which reintroduces a shutdown race; move the a.goroutineWg.Add(4)
call so it happens before assigning a.stopFn = cancel (i.e., call
a.goroutineWg.Add(4) immediately after creating bgCtx/cancel and before setting
a.stopFn) so concurrent Lock()/Reset()/Stop() that call Wait() cannot observe a
non-nil stopFn while the wait-group is still zero.

---

Outside diff comments:
In `@client.go`:
- Around line 1089-1142: The publisher goroutines inside listenDbEvents are not
tracked by a.goroutineWg, so shutdown via Stop()/Reset() can finish before those
goroutines complete; fix by incrementing the wait group before spawning each
anonymous publisher (call a.goroutineWg.Add(1) immediately before each go func
for the UtxoStore, VtxoStore and TransactionStore event handlers) and ensuring
each goroutine defers a.goroutineWg.Done() at its start, or alternatively remove
the go and publish inline (calling a.utxoBroadcaster.publish,
a.vtxoBroadcaster.publish, a.txBroadcaster.publish synchronously) so the work is
accounted for by the existing goroutine lifecycle.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 511ee76b-55d5-45ea-a939-2daef0da77e4

📥 Commits

Reviewing files that changed from the base of the PR and between 2d035b6 and 1106180.

📒 Files selected for processing (2)

client.go
init.go

coderabbitai · 2026-04-29T08:18:30Z

 	bgCtx, cancel := context.WithCancel(context.Background())
 	a.stopFn = cancel

+	a.goroutineWg.Add(4)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the init.go file fd -t f "init.go" | head -5

Repository: arkade-os/go-sdk

Length of output: 68

🏁 Script executed:

# Get the full context around the claimed problem area rg -n "bgCtx.*WithCancel|a\.stopFn|a\.goroutineWg\.Add$4$" -B 2 -A 2

Repository: arkade-os/go-sdk

Length of output: 795

🏁 Script executed:

# Find the Lock, Reset, and Stop methods mentioned in the review ast-grep --pattern 'func ($_) Lock() { $$$ }' | head -20

Repository: arkade-os/go-sdk

Length of output: 42

🏁 Script executed:

# Also search for Wait() calls and stopFn checks rg -n "a\.stopFn.*nil|Wait" -B 2 -A 2

Repository: arkade-os/go-sdk

Length of output: 4467

🏁 Script executed:

# Read init.go lines around the problem area sed -n '80,160p' init.go | cat -n

Repository: arkade-os/go-sdk

Length of output: 1798

🏁 Script executed:

# Also check if there are any goroutines already running before line 101 sed -n '1,100p' init.go | cat -n

Repository: arkade-os/go-sdk

Length of output: 3392

🏁 Script executed:

# Find the Lock, Reset, and Stop method implementations rg -n "func $.*$ (Lock|Reset|Stop)\(" init.go client.go -A 10

Repository: arkade-os/go-sdk

Length of output: 1129

Move Add(4) before publishing stopFn.

a.stopFn becomes visible on line 102 before the wait-group counter is incremented on line 104. A concurrent call to Lock(), Reset(), or Stop() can observe a non-nil stopFn, call Wait() while the counter is still 0, and return immediately—allowing teardown to proceed before the startup goroutines are even launched. This reopens the shutdown race this PR is intended to fix.

Suggested fix

bgCtx, cancel := context.WithCancel(context.Background()) - a.stopFn = cancel - a.goroutineWg.Add(4) + a.stopFn = cancel go func() { a.Explorer().Start()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

bgCtx, cancel := context.WithCancel(context.Background())

a.stopFn = cancel

a.goroutineWg.Add(4)

bgCtx, cancel := context.WithCancel(context.Background())

a.goroutineWg.Add(4)

a.stopFn = cancel

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@init.go` around lines 101 - 104, The stopFn is published before the wait-group counter is incremented, which reintroduces a shutdown race; move the a.goroutineWg.Add(4) call so it happens before assigning a.stopFn = cancel (i.e., call a.goroutineWg.Add(4) immediately after creating bgCtx/cancel and before setting a.stopFn) so concurrent Lock()/Reset()/Stop() that call Wait() cannot observe a non-nil stopFn while the wait-group is still zero.

arkanaai

Arkana Code Review — #158 (re-review)

Commit 5307875 addresses both critical findings from my previous review:

🔴 Deadlock on early exit — FIXED. Add(4) moved to right before the 4 goroutine launches, after the refreshDb error check. Early return no longer leaves the WaitGroup stranded.
🔴 Shutdown during init — FIXED. Outer goroutine tracked with Add(1) + defer Done(). Wait() now correctly handles shutdown at any phase — during init, during listeners, or after completion.
Bonus fix: if err != nil { return } prevents launching listener goroutines on refreshDb failure — good defensive addition.

The WaitGroup lifecycle is now correct:

Add(1) before outer goroutine → defer Done() in outer goroutine
Add(4) only reached on success, right before 4 go calls → each defers Done()
All 3 shutdown paths (Lock, Stop, Reset) call stopFn() then Wait()

My two MEDIUM suggestions (timeout on Wait(), test coverage) remain unaddressed but are not blocking. They can be follow-ups.

LGTM. ✅

arkanaai

Arkana Code Review — #158 (re-review)

Commit 5307875 addresses both critical findings from my previous review:

🔴 Deadlock on early exit — FIXED. Add(4) moved to right before the 4 goroutine launches, after the refreshDb error check. Early return no longer leaves the WaitGroup stranded.
🔴 Shutdown during init — FIXED. Outer goroutine tracked with Add(1) + defer Done(). Wait() now correctly handles shutdown at any phase — during init, during listeners, or after completion.
Bonus fix: if err != nil { return } prevents launching listener goroutines on refreshDb failure — good defensive addition.

The WaitGroup lifecycle is now correct:

Add(1) before outer goroutine → defer Done() in outer goroutine
Add(4) only reached on success, right before 4 go calls → each defers Done()
All 3 shutdown paths (Lock, Stop, Reset) call stopFn() then Wait()

My two MEDIUM suggestions (timeout on Wait(), test coverage) remain unaddressed but are not blocking. They can be follow-ups.

LGTM. ✅

altafan · 2026-04-30T09:51:08Z

@Dunsin-cyber in #153 we're gonna move these listeners into a contract watcher, and therefore this code is going to subject to breaking changes (don't look at it now as it requires indeed changes). If we don't solve directly there, would be better to postpone these fixes after that pr gets merged.

Dunsin-cyber · 2026-04-30T20:05:02Z

@Dunsin-cyber in #153 we're gonna move these listeners into a contract watcher, and therefore this code is going to subject to breaking changes (don't look at it now as it requires indeed changes). If we don't solve directly there, would be better to postpone these fixes after that pr gets merged.

alright, got it. thanks @altafan

fix: graceful shutdown of bg goroutines before DB close

1106180

arkanaai Bot requested changes Apr 29, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

fix: remove waitgroup races in goroutine lifecycle

5307875

arkanaai Bot approved these changes Apr 29, 2026

View reviewed changes

Kukks requested a review from altafan April 29, 2026 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: graceful shutdown of bg goroutines before DB close#158

fix: graceful shutdown of bg goroutines before DB close#158
Dunsin-cyber wants to merge 2 commits into
arkade-os:masterfrom
Dunsin-cyber:fix/DBClose-race-condition

Dunsin-cyber commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Rate limit exceeded

❌ Failed checks (1 warning)

Uh oh!

arkanaai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Uh oh!

arkanaai Bot left a comment

Uh oh!

arkanaai Bot left a comment

Uh oh!

altafan commented Apr 30, 2026

Uh oh!

Dunsin-cyber commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dunsin-cyber commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

arkanaai Bot left a comment

Choose a reason for hiding this comment

Arkana Code Review — #158

🔴 CRITICAL: Deadlock when outer goroutine exits early

🔴 HIGH: Shutdown during init phase also deadlocks

🟡 MEDIUM: No timeout / safety net on Wait()

🟡 MEDIUM: Missing test coverage

✅ Good

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

arkanaai Bot left a comment

Choose a reason for hiding this comment

Arkana Code Review — #158 (re-review)

Uh oh!

arkanaai Bot left a comment

Choose a reason for hiding this comment

Arkana Code Review — #158 (re-review)

Uh oh!

altafan commented Apr 30, 2026

Uh oh!

Dunsin-cyber commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dunsin-cyber commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

🟡 MEDIUM: No timeout / safety net on `Wait()`