Conversation
devylongs
approved these changes
Apr 2, 2026
shaaibu7
approved these changes
Apr 2, 2026
AggregateCommitteeSignatures() held the global fork-choice mutex for the entire duration of leanmultisig.Aggregate() FFI calls (500ms-11s). This blocked all consensus operations — block processing, attestation handling, and time advances — causing gean to fall behind and enter a sync loop. Split into three phases: 1. Lock: collect inputs (pubkeys, signatures, data) from gossip cache 2. Unlock: run expensive leanmultisig.Aggregate() FFI calls 3. Lock: store resulting proofs back into cache Block production (produce.go) retains the synchronous locked path via buildAggregatedAttestationsFromSignedLocked() since it needs proofs before publishing the block. Matches zeam's pattern of separating signatures_mutex from the forkchoice lock (forkchoice.zig:308).
31bc702 to
6da5b96
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AggregateCommitteeSignatures()into three lock/unlock/lock phases so the expensiveleanmultisig.Aggregate()FFI call (500ms–11s) runs without holding the fork-choice mutexproduce.go) retains the synchronous locked path viabuildAggregatedAttestationsFromSignedLocked()since proofs are needed before publishingContext
When gean runs as aggregator,
AggregateCommitteeSignatures()held the global fork-choice mutex for the entire duration of proof building. With 6 participants this took 11+ seconds, blocking all consensus operations and causing gean to miss 2–3 slots per aggregation cycle. The effect snowballed — missed slots accumulated more attestations, making the next aggregation even slower.Confirmed by devnet-3 logs and
docker stats: gean used 322% CPU as aggregator while non-aggregator clients used under 2%.After this fix, gean still syncs ~1 block occasionally when aggregation takes 3+ seconds (the proof time is unchanged — that's the upstream
leanmultisiglibrary cost), but it no longer falls behind or misses multiple slots. Without--is-aggregator, gean does not sync at all.Pattern adopted from zeam's separation of
signatures_mutexfrom the forkchoice lock (forkchoice.zig:308).Test Plan
go build ./...compiles cleanlygo vet ./...passesgo test ./chain/forkchoice/... -count=1— forkchoice tests passgo test -race ./chain/forkchoice/...— no data races--is-aggregator— gean stays at head, no multi-slot sync gapsaggregation completeduration unchanged (proof time is not reduced, only unblocked)behind_peers=falseafter initial catch-upCloses fix(forkchoice): release mutex during aggregation proof building to prevent consensus stall #195