Skip to content

crash: integer overflow in SSZ serializedSize during processCachedDescendants (catch-up sync) #696

@zclawz

Description

@zclawz

Crash: integer overflow in SSZ serializedSize during processCachedDescendants

Severity: Critical — node crashes repeatedly during catch-up sync, preventing participation

Stack Trace

thread 1 panic: integer overflow
ssz-0.0.9/src/lib.zig:45:9: 0x17d5ede in serializedSize__anon_10551 (zeam)
ssz-0.0.9/src/lib.zig:89:43: 0x18e5e50 in serialize__anon_1654536 (zeam)
pkgs/database/src/rocksdb.zig:209:30: 0x18ebff3 in onBlock (zeam)
pkgs/node/src/node.zig:410:57: 0x1a99f30 in processCachedDescendants (zeam)
pkgs/node/src/node.zig:489:42: 0x17f1254 in onInterval (zeam)
pkgs/node/src/utils.zig:42:33: 0x1b6c398 in callback (zeam)
libxev/src/backend/io_uring.zig:806:29: 0x17ba9eb in run (zeam)
pkgs/cli/src/node.zig:426:27: 0x181404a in run (zeam)
pkgs/cli/src/main.zig:780:26: 0x181c8bb in mainInner (zeam)
pkgs/cli/src/main.zig:241:14: 0x17b7fdd in main (zeam)

The crash occurs consistently — same stack trace on every restart, hitting the same code path in processCachedDescendants → onBlock → ssz serialize.

Observed Behaviour

  1. Node starts with checkpoint sync from slot 10570.
  2. Connects to peers and begins catch-up block sync.
  3. During processCachedDescendants (triggered by onInterval), the SSZ serializedSize computation overflows an integer when attempting to serialize a block for RocksDB storage.
  4. Process panics and restarts. This repeats on every restart — the problematic block is re-fetched and re-encountered on each run.

Additional Errors (same session)

  • error.UnknownSourceBlock — repeated across many blocks during attestation validation in chain.zig
  • error.OutOfMemoryFailed to process cached block warnings during the same catch-up window

Likely Root Cause

The SSZ library uses u32 offsets, so serializedSize can overflow if the serialized size of a block exceeds 2^32 - 1 bytes (~4 GiB). The most plausible cause is a block with an abnormally large list field (e.g. attestations, deposits, or BLS changes) — possibly the node is accepting or constructing a malformed/oversized block during sync.

Could also indicate that multiple blocks are being concatenated or a slice is not properly bounded before being passed to serialize.

Reproduction

  • Checkpoint sync from https://leanpoint.leanroadmap.org/lean/v0/states/finalized (slot 10570)
  • Devnet3 with 9 peers (ethlambda ×5, gean ×1, nlean ×1, qlean ×1, lantern ×1)
  • Crash occurs reproducibly within ~1 minute of startup

Suggested Fix

  1. Add a bounds check in rocksdb.zig:onBlock before calling ssz.serialize — validate block size is within u32 range and log + skip if not.
  2. Investigate why a block in the devnet3 chain has an abnormally large serialized size.
  3. Check if processCachedDescendants is accumulating blocks/data across iterations without bounds.

Environment

  • zeam devnet3 (v0.3.3)
  • Checkpoint sync mode
  • attestation-committee-count: not set (defaults to spec value)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions