-
Notifications
You must be signed in to change notification settings - Fork 74
fix(cuda): ensure streams have observed events before destroying #2242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes CUDA stream and event lifecycle issues to ensure events are properly observed before being destroyed. The main purpose is to prevent premature destruction of events and streams that could cause synchronization issues or resource corruption.
Key changes:
- Removed the dedicated stream from
MemoryMerkleTreeand simplified synchronization to use the default stream - Split event tracking in
MemoryMerkleSubTreeintocreated_buffer_eventandbuild_completion_eventfor clearer lifecycle management - Added explicit event cleanup in
update_with_touched_blocksafter D2H synchronization - Implemented
DropforMemoryMerkleTreeto ensure proper cleanup
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/vm/src/system/cuda/merkle_tree/mod.rs | Refactored stream/event handling: removed dedicated tree stream, split subtree events into creation and completion events, improved synchronization in finalize() and drop_subtrees(), added Drop implementation, and optimized event cleanup after D2H transfers |
| crates/vm/src/system/cuda/memory.rs | Updated Drop implementation to use the refactored drop_subtrees() method with clearer documentation about synchronization requirements |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment has been minimized.
This comment has been minimized.
CodSpeed Performance ReportMerging #2242 will improve performances by ×12Comparing
|
Golovanov399
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing everything on the default stream should work, but I wouldn't trust my judgement on this
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
aae868a to
8f23d8c
Compare
This comment has been minimized.
This comment has been minimized.
8f23d8c to
682e44c
Compare
gaxiom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Mem Merkle Tree will wait for all subtrees to complete. I think that's fine
Commit: 682e44c |
Comparison to show there's no perf regression: https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/19352734146