fix: skip pre-finalized attestations instead of aborting block import#692
Merged
ch4r10t33r merged 5 commits intomainfrom Mar 25, 2026
Merged
fix: skip pre-finalized attestations instead of aborting block import#692ch4r10t33r merged 5 commits intomainfrom
ch4r10t33r merged 5 commits intomainfrom
Conversation
When the SSZ state grows beyond ~3 MB the server switches from sending a Content-Length response to Transfer-Encoding: chunked. The previous body-reading loop called readSliceShort which internally goes through: readSliceShort → readVec → defaultReadVec → contentLengthStream contentLengthStream accesses reader.state.body_remaining_content_length but that field is not active for chunked responses (state is 'ready'), causing a panic: thread 1 panic: access of union field 'body_remaining_content_length' while field 'ready' is active Replace the manual request/response loop with client.fetch() using a std.Io.Writer.Allocating as the response_writer. fetch() calls response.readerDecompressing() + streamRemaining() which dispatches through chunkedStream or contentLengthStream correctly based on the actual transfer encoding used by the server.
After checkpoint sync, incoming blocks may carry attestations whose target slots predate the finalized anchor. IsJustifiableSlot correctly identifies these as non-justifiable, but the `try` was propagating the error fatally, causing the entire block import to fail. This creates a cascading gap: block N fails → blocks N+1..M fail (missing parent) → no epoch-boundary attestations accumulate → justified checkpoint never advances → forkchoice stays stuck in `initing` indefinitely. Fix: catch InvalidJustifiableSlot and treat it as `false`. The attestation is then silently skipped via the existing !is_target_justifiable check, exactly as all other non-viable attestations (unknown source/target/head, stale slot, etc.) are handled. The block imports successfully, the chain catches up, and the node exits the initing state. Update the test that was asserting the old (buggy) error-propagation behaviour to instead assert that process_attestations succeeds.
After checkpoint sync the forkchoice starts in the initing state and waits for a first justified checkpoint before declaring itself ready. The status-response sync handler was checking getSyncStatus() and treating fc_initing the same as synced — doing nothing. This created a deadlock: the node never requested blocks from ahead peers because it was in fc_initing, and it could never leave fc_initing because no blocks were imported. Fix the deadlock in two places: 1. Status-response handler: add an explicit fc_initing branch that requests the peer's head block when the peer is ahead of our anchor slot. This mirrors the behind_peers branch but uses head_slot for the comparison (finalized_slot is not yet meaningful in fc_initing). 2. Periodic sync refresh: every SYNC_STATUS_REFRESH_INTERVAL_SLOTS (8) slots, re-send our status to all connected peers when not synced. This recovers from the case where all peers were already connected before the fix was deployed, so no new connection event fires and the status-response handler would never be re-triggered.
g11tech
approved these changes
Mar 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two bugs prevented a checkpoint-synced node from ever catching up to the network head.
Bug 1: Skip pre-finalized attestations (already merged in previous commit)
process_attestationscalledtry utils.IsJustifiableSlot(...), which returnedStateTransitionError.InvalidJustifiableSlotfor attestations referencing slots that are legal but fall outside a justifiable range. This aborted the entire block import. Fix: usecatch falseso invalid attestations are skipped rather than failing the block.Bug 2:
fc_initingdeadlock (this PR)After checkpoint sync the forkchoice starts in the
.initingstate and waits for a first justified checkpoint before declaring itself ready. The status-response sync handler was callinggetSyncStatus()and treatingfc_initingthe same as.synced— doing nothing:This created a hard deadlock:
fc_initing.fc_initing.Fix
Status-response handler (
node.zig): add an explicitfc_initingbranch that requests the peer's head block when the peer's head slot is ahead of our anchor. This mirrors thebehind_peersbranch and breaks the deadlock.Periodic sync refresh (
node.zig,constants.zig): everySYNC_STATUS_REFRESH_INTERVAL_SLOTS(8 slots ≈ 32 s), re-send our status to all connected peers when infc_initingorbehind_peers. This recovers from the case where all peers were already connected before the fix was deployed — without this, no new connection event fires and the status-response handler is never re-triggered for existing peers.Test plan
Head Slotadvances past the anchor within ~30 s of startup"peer … is ahead during fc init … requesting head block"appears in the logsforkchoicetransitions frominitingtoreadyand validator duties activate