fix: skip pre-finalized attestations instead of aborting block import by ch4r10t33r · Pull Request #692 · blockblaz/zeam

ch4r10t33r · 2026-03-24T21:06:57Z

Problem

Two bugs prevented a checkpoint-synced node from ever catching up to the network head.

Bug 1: Skip pre-finalized attestations (already merged in previous commit)

process_attestations called try utils.IsJustifiableSlot(...), which returned StateTransitionError.InvalidJustifiableSlot for attestations referencing slots that are legal but fall outside a justifiable range. This aborted the entire block import. Fix: use catch false so invalid attestations are skipped rather than failing the block.

Bug 2: `fc_initing` deadlock (this PR)

After checkpoint sync the forkchoice starts in the .initing state and waits for a first justified checkpoint before declaring itself ready. The status-response sync handler was calling getSyncStatus() and treating fc_initing the same as .synced — doing nothing:

.synced, .no_peers, .fc_initing => {},

This created a hard deadlock:

The node needed blocks to exit fc_initing.
The sync code refused to fetch blocks while in fc_initing.

Fix

Status-response handler (node.zig): add an explicit fc_initing branch that requests the peer's head block when the peer's head slot is ahead of our anchor. This mirrors the behind_peers branch and breaks the deadlock.

Periodic sync refresh (node.zig, constants.zig): every SYNC_STATUS_REFRESH_INTERVAL_SLOTS (8 slots ≈ 32 s), re-send our status to all connected peers when in fc_initing or behind_peers. This recovers from the case where all peers were already connected before the fix was deployed — without this, no new connection event fires and the status-response handler is never re-triggered for existing peers.

Test plan

Deploy image, checkpoint-sync to a non-genesis slot, verify Head Slot advances past the anchor within ~30 s of startup
Verify the log line "peer … is ahead during fc init … requesting head block" appears in the logs
Verify forkchoice transitions from initing to ready and validator duties activate

When the SSZ state grows beyond ~3 MB the server switches from sending a Content-Length response to Transfer-Encoding: chunked. The previous body-reading loop called readSliceShort which internally goes through: readSliceShort → readVec → defaultReadVec → contentLengthStream contentLengthStream accesses reader.state.body_remaining_content_length but that field is not active for chunked responses (state is 'ready'), causing a panic: thread 1 panic: access of union field 'body_remaining_content_length' while field 'ready' is active Replace the manual request/response loop with client.fetch() using a std.Io.Writer.Allocating as the response_writer. fetch() calls response.readerDecompressing() + streamRemaining() which dispatches through chunkedStream or contentLengthStream correctly based on the actual transfer encoding used by the server.

After checkpoint sync, incoming blocks may carry attestations whose target slots predate the finalized anchor. IsJustifiableSlot correctly identifies these as non-justifiable, but the `try` was propagating the error fatally, causing the entire block import to fail. This creates a cascading gap: block N fails → blocks N+1..M fail (missing parent) → no epoch-boundary attestations accumulate → justified checkpoint never advances → forkchoice stays stuck in `initing` indefinitely. Fix: catch InvalidJustifiableSlot and treat it as `false`. The attestation is then silently skipped via the existing !is_target_justifiable check, exactly as all other non-viable attestations (unknown source/target/head, stale slot, etc.) are handled. The block imports successfully, the chain catches up, and the node exits the initing state. Update the test that was asserting the old (buggy) error-propagation behaviour to instead assert that process_attestations succeeds.

After checkpoint sync the forkchoice starts in the initing state and waits for a first justified checkpoint before declaring itself ready. The status-response sync handler was checking getSyncStatus() and treating fc_initing the same as synced — doing nothing. This created a deadlock: the node never requested blocks from ahead peers because it was in fc_initing, and it could never leave fc_initing because no blocks were imported. Fix the deadlock in two places: 1. Status-response handler: add an explicit fc_initing branch that requests the peer's head block when the peer is ahead of our anchor slot. This mirrors the behind_peers branch but uses head_slot for the comparison (finalized_slot is not yet meaningful in fc_initing). 2. Periodic sync refresh: every SYNC_STATUS_REFRESH_INTERVAL_SLOTS (8) slots, re-send our status to all connected peers when not synced. This recovers from the case where all peers were already connected before the fix was deployed, so no new connection event fires and the status-response handler would never be re-triggered.

ch4r10t33r and others added 3 commits March 23, 2026 15:25

Merge branch 'main' into fix/checkpoint-sync-chunked-body

a47b037

ch4r10t33r marked this pull request as ready for review March 24, 2026 21:38

ch4r10t33r requested review from anshalshukla and g11tech March 24, 2026 21:38

ch4r10t33r changed the title ~~types: skip pre-finalized attestations instead of aborting block import~~ fix: skip pre-finalized attestations instead of aborting block import Mar 24, 2026

g11tech approved these changes Mar 25, 2026

View reviewed changes

Merge branch 'main' into fix/checkpoint-sync-chunked-body

ec599fc

ch4r10t33r merged commit 437256b into main Mar 25, 2026
13 of 22 checks passed

ch4r10t33r deleted the fix/checkpoint-sync-chunked-body branch March 25, 2026 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip pre-finalized attestations instead of aborting block import#692

fix: skip pre-finalized attestations instead of aborting block import#692
ch4r10t33r merged 5 commits intomainfrom
fix/checkpoint-sync-chunked-body

ch4r10t33r commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ch4r10t33r commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Bug 1: Skip pre-finalized attestations (already merged in previous commit)

Bug 2: fc_initing deadlock (this PR)

Fix

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ch4r10t33r commented Mar 24, 2026 •

edited

Loading

Bug 2: `fc_initing` deadlock (this PR)