-
Notifications
You must be signed in to change notification settings - Fork 86
mcp/stream: fix prevent potential goroutine leak on blocked incoming channel #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the CL! Can you add a test for this?
0f708af
to
c71f318
Compare
mcp/streamable_test.go
Outdated
|
||
// Now simulate calling Close() and blocking the goroutine | ||
close(conn.done) | ||
time.Sleep(50 * time.Millisecond) // Allow goroutine to exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a sync.WaitGroup instead of a sleep to remove flakiness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔, I agree that sync.WaitGroup
is a great tool for waiting on goroutines, but in this case, since scanEvents
is launched within handleSSE
and doesn't expose a separate wait condition, sync.WaitGroup
wouldn't directly help in waiting for scanEvents
to exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, we could wrap the call to handleSSE:
go func() {
conn.handleSSE(resp)
wg.Done()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋, sorry for my delayed response.
I’ve implemented the proposed approach using a sync.WaitGroup
around handleSSE
, and replaced the time.Sleep
with wg.Wait()
.
However, the test fails intermittently — although the leak has been correctly fixed. My understanding is that the scanEvents
goroutine, which is launched inside handleSSE
, might not observe the closed done
channel and exit before handleSSE
returns and wg.Wait()
completes due to scheduling variance. So the test catches it as a false positive leak, even though the fix is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase with upstream and upload the test?
I added the waitgroup and I am not able to reproduce the failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick response!
I actually noticed two days ago that the leak issue had been fixed — great to see that resolved.
I've rebased the branch and pushed the test update using sync.WaitGroup
as suggested. On my local runs, the test sometimes passes, but occasionally fails due to a detected goroutine leak, which I believe is caused by scheduling variance.
This test currently relies on timing in two places to simulate the blocked goroutine scenario, and I personally feel this makes it a bit less elegant and deterministic. Given the potential for flakiness in CI, I'm also happy to close this PR if you'd prefer not to include this regression test.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why it's failing on your machine, it seems to work on mine after running it 10000 times 🤔 :
:~/w/go-sdk/mcp$ go test -count=10000 -race -run TestStreamableClientConnSSEGoroutineLeak
PASS
ok github.com/modelcontextprotocol/go-sdk/mcp 12.092s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋, sorry for the delay and thank you so much for your patience and continued feedback!
To help validate the issue further, I added a dedicated step to my GitHub Actions workflow in my personal fork using the following command:
- name: Test
run: go test -v ./...
- name: TestGoroutineLeak
run: go test -count=1000 -v -race -run TestStreamableClientConnSSEGoroutineLeak ./mcp
This directly targets the streamable_test.go
file in the mcp
package, where the leak test is defined. I noticed a failure in this setup using the same environment as upstream CI.
Here's the link to the failing run: link
When you have a moment, you could take a look and confirm whether it aligns with what you’d expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that!
I understand now that sleep is necessary to allow for scheduling variance. We can keep the sync.waitgroup and add back the small delay to allow for scanEvents to shut down. Then it should be good to submit. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for all the helpful discussion and feedback.
While preparing to update the PR, I noticed that the latest version of the codebase has significantly changed the SSE handling logic( #133 ). The handleSSE
function no longer starts a separate scanEvents
goroutine. Instead, it now processes the stream directly through processStream
in a blocking loop.
As a result, the potential scanEvents
goroutine leak scenario we were testing for no longer exists in this implementation. The regression test now always passes, but it no longer provides meaningful protection or value.
To avoid unnecessary test complexity and maintenance cost, I'm going to go ahead and close this PR and the related issue.Please feel free to reopen if there's still interest in preserving a version of the test for historical reference.
Thanks again for your time and thoughtful review!
mcp/streamable_test.go
Outdated
|
||
// Now simulate calling Close() and blocking the goroutine | ||
close(conn.done) | ||
time.Sleep(50 * time.Millisecond) // Allow goroutine to exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, we could wrap the call to handleSSE:
go func() {
conn.handleSSE(resp)
wg.Done()
}
Change:
Adds a regression test to ensure the
scanEvents
goroutine exits properly when thedone
channel is closed and theincoming
channel is full.Previously, there was a potential for goroutine leak if
scanEvents
was blocked on writing toincoming
and thedone
channel was closed before it could exit. This issue has already been fixed in prior changes — this test is intended to capture that edge case and prevent regressions.