Skip to content

Bound in-memory growth during long non-finalization periods#176

Closed
pablodeymo wants to merge 4 commits intomainfrom
memory-growth-non-finalization
Closed

Bound in-memory growth during long non-finalization periods#176
pablodeymo wants to merge 4 commits intomainfrom
memory-growth-non-finalization

Conversation

@pablodeymo
Copy link
Collaborator

Summary

  • Add safety-net pruning to DB storage: prune States, Blocks, LiveChain, signatures, and attestation data older than 1024 slots behind head when finalization stalls
  • Evict stale pending blocks whose parents never arrive, preventing unbounded growth of pending_blocks / pending_block_parents maps
  • Return Vec directly from Store iterator methods that were already collecting internally, removing the impl Iterator facade and letting callers avoid double-collecting

Closes #103. Partially addresses #126.

Changes in detail

1. Safety-net pruning (crates/storage/src/store.rs)

Cherry-picked from safety-net-pruning branch. When finalization is healthy this is a no-op (cutoff equals finalized slot). When finalization stalls for >1024 slots, prunes:

Table What gets pruned
States States for blocks with slot <= cutoff (protected: head, finalized, justified, safe target)
BlockHeaders / BlockBodies / BlockSignatures Block data with slot <= cutoff (same protected set)
LiveChain Slot index entries with slot <= cutoff
GossipSignatures Signatures with slot <= cutoff
LatestKnownAttestations / LatestNewAttestations Attestation data referencing pruned blocks
LatestKnownAggregatedPayloads / LatestNewAggregatedPayloads Aggregated proofs with slot <= cutoff

Cutoff formula: max(finalized_slot, head_slot - 1024)

Called once per slot at interval 0.

2. Evict stale pending blocks (crates/blockchain/src/lib.rs)

BlockChainServer holds three in-memory maps for orphan blocks waiting on missing parents:

pending_blocks:        HashMap<H256, HashSet<H256>>   // parent_root → children roots
pending_block_parents: HashMap<H256, H256>            // block_root → missing ancestor
pending_block_slots:   HashMap<H256, u64>             // block_root → slot (NEW)

Without eviction, if a parent never arrives (network partition, eclipse attack, pruned ancestor), these entries accumulate forever. The new pending_block_slots map tracks each pending block's slot, and prune_pending_blocks(cutoff_slot) evicts entries older than the safety-net cutoff.

Wiring:

  • Insert: when a block is pended in process_or_pend_block (both direct pend and ancestor-walk loop)
  • Remove: when a pending block's parent arrives in collect_pending_children
  • Evict: at interval 0 right after safety_net_prune(), using the same cutoff formula

3. Honest Vec return types (crates/storage/src/store.rs, crates/blockchain/src/store.rs, crates/blockchain/tests/forkchoice_spectests.rs)

Three Store methods pretended to return lazy iterators but internally .collect() into a Vec first, then returned .into_iter():

  • iter_gossip_signatures()impl Iterator over collected Vec
  • iter_known_aggregated_payloads() → same
  • iter_new_aggregated_payloads() → same

Changed to return Vec directly. This makes the allocation cost visible and lets callers avoid double-collecting (e.g., store.iter_gossip_signatures().collect::<Vec<_>>() was collecting a Vec that was already collected).

Updated all callers:

  • aggregate_committee_signatures: removed redundant .collect()
  • produce_block_with_signatures: removed redundant .collect()
  • update_safe_target: added .into_iter() before .collect() into HashMap
  • extract_latest_known_attestations: added .into_iter() before .map()
  • Forkchoice spectests: added .into_iter() before .map()

How to test

make fmt    # passes
make lint   # passes
make test   # all workspace tests pass (102/102)

In a live devnet without finalization, after 1024 slots:

  • Observe "Safety-net pruning: finalization stalled" log from the DB pruner
  • Observe "Pruned stale pending blocks" log from the in-memory pruner
  • Memory should plateau instead of growing linearly

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

🤖 Kimi Code Review

Security & Correctness Review

1. Critical: Pruning logic can delete finalized state

In store.rs:1054-1065, the prune_states function uses BlockHeaders table to determine which states to prune, but it doesn't verify the block is actually finalized. This could delete states for finalized blocks if the slot is ≤ cutoff_slot.

Fix: Add check to ensure root != self.latest_finalized().root before pruning states.

2. Race condition in pending block pruning

In lib.rs:495-500, the prune_pending_blocks function collects stale roots while holding references to pending_block_slots, then removes them. However, between collection and removal, new blocks could be added with the same roots.

Fix: Use drain_filter pattern or collect keys first, then remove in separate step.

3. Memory leak in pending_blocks cleanup

In lib.rs:518-523, the cleanup only removes empty parent entries from pending_blocks, but orphaned children in other parent entries aren't cleaned up when their stale roots are pruned.

Fix: Need to scan all parent entries for stale children, not just the ones being directly pruned.

4. Inconsistent iterator changes

The change from iterator to Vec return types in store.rs methods (iter_known_aggregated_payloads, iter_new_aggregated_payloads, iter_gossip_signatures) creates unnecessary allocations and breaks the lazy evaluation pattern.

Recommendation: Revert to iterator types or provide both iterator and collection methods.

5. Potential overflow in cutoff calculation

In store.rs:981, the calculation head_slot.saturating_sub(MAX_UNFINALIZED_SLOTS) could underflow if head_slot < MAX_UNFINALIZED_SLOTS, though saturating_sub prevents panic.

Note: This is handled correctly, but worth documenting the edge case behavior.

6. Missing error handling in pruning

Several pruning functions (prune_states, prune_old_blocks, etc.) use expect() on database operations which could panic on I/O errors.

Fix: Propagate errors properly instead of panicking.

7. Inefficient duplicate iteration

In prune_states and prune_old_blocks, the code iterates BlockHeaders table twice - once for states and once for blocks. This could be combined.

Optimization: Single pass to collect all keys, then batch delete across tables.

8. Missing validation in safety_net_prune

The safety_net_prune function doesn't validate that cutoff_slot is reasonable (e.g., not in the future).

Fix: Add bounds checking: cutoff_slot <= head_slot.

Minor Issues

  • Typo in comment: store.rs:970 - "finalization-triggered pruning already handles it" should be "finalization-triggered pruning already handles them"
  • Inconsistent logging: Some pruning functions log counts, others don't. Standardize logging across all pruning operations.
  • Magic number: 1024 appears in both lib.rs:177 and as MAX_UNFINALIZED_SLOTS. Should use the constant consistently.

Positive Notes

  • Good addition of safety-net pruning to prevent OOM attacks
  • Proper protection of critical roots during pruning
  • Clean separation of concerns between blockchain and store layers

Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

🤖 Codex Code Review

  1. High: pruning can trigger a panic in attestation target selection (remote DoS vector under stalled finalization).
    get_attestation_target assumes parent headers are always available and uses expect("parent block exists") in two walk-back loops (store.rs, store.rs, store.rs).
    The new safety-net pruning deletes old headers/states by slot (store.rs, store.rs), explicitly violating that assumption during long non-finality. If the walk crosses pruned depth, the node panics.
    Suggested fix: make get_attestation_target return Result<Checkpoint, StoreError> and handle missing parents gracefully (fallback to latest_justified/safe_target, no panic).

  2. High: safety-net pruning can break consensus liveness on valid long non-finality forks.
    safety_net_prune removes non-finalized historical blocks/states older than head - 1024 (store.rs, store.rs, store.rs).
    In 3SF/LMD-GHOST contexts with prolonged non-finality, a valid competing branch may require ancestors older than this window. After pruning, has_state(parent_root) fails and block processing can get stuck in pending/request loops instead of importing valid chain progress (lib.rs, lib.rs).
    Suggested fix: prune only finalized history, or gate this mode behind explicit degraded-mode config with clear consensus-tradeoff docs.

  3. Medium: inconsistent cutoff semantics create index/data skew at boundary slot.
    prune_live_chain deletes < cutoff (store.rs, store.rs), while header/body/state pruning deletes <= cutoff (store.rs, store.rs).
    This can leave LiveChain entries pointing to roots with missing headers at exactly cutoff_slot, which weakens invariants and can cause avoidable Unknown*Block failures in attestation validation paths.
    Suggested fix: make boundary conditions consistent across all pruners.

  4. Medium: performance regression risk from full-table scans every slot during stalled finalization.
    Under non-finality, on_tick runs pruning at each slot start (lib.rs), and safety_net_prune scans BlockHeaders twice (prune_states + prune_old_blocks) (store.rs, store.rs).
    Suggested fix: single-pass key collection reused across state/header deletes, or keep a slot index for old roots to avoid O(N) scans per slot.

Assumptions:

  • I could not run cargo check in this environment because rustup cannot write to /home/runner/.rustup/tmp (permission denied), so this is static review only.

Brief positive note:

  • The pending-block memory bound addition is directionally good (pending_block_slots + pruning) and addresses a real OOM class; it just needs stronger safety around consensus invariants and panic-free fallback paths.

Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

🤖 Claude Code Review

Now I have enough context to write the full review.


PR Review: Bound in-memory growth during long non-finalization periods

Overall the PR is well-motivated and the implementation is thoughtful. The three-part change (DB safety-net, pending-block eviction, Vec return types) is clearly decomposed. Below are the issues I found, ordered by severity.


Correctness / Potential Bugs

1. prune_live_chain off-by-one creates dangling LiveChain references

prune_live_chain uses a strict < boundary (slot < finalized_slot), while prune_states and prune_old_blocks use <=:

// store.rs:447 — prunes slot < cutoff_slot, keeps slot == cutoff_slot
.take_while(|(k, _)| {
    let (slot, _) = decode_live_chain_key(k);
    slot < finalized_slot  // strict <
})
// store.rs:1040 — prunes slot <= cutoff_slot
if header.slot <= cutoff_slot {

When safety_net_prune calls both with the same cutoff_slot, blocks at exactly cutoff_slot have their headers, bodies, and states deleted but their LiveChain entry preserved. The LiveChain then holds a dangling reference for that slot. During a stall this entry is unlikely to be traversed, but it's an inconsistency. Consider using the same boundary (<) in prune_states / prune_old_blocks, or adjust prune_live_chain to use <= when called from the safety-net path.


2. Non-atomic pruning — possible cross-table inconsistency on crash

safety_net_prune issues 7 separate write batches (one per prune_* call). A crash between any two leaves the store partially pruned:

  • prune_states commits → crash → prune_old_blocks never runs → state deleted but block header/body still present (or vice versa after the next run)
  • prune_old_blocks commits → crash → prune_live_chain never runs → block headers deleted but LiveChain entry remains

For a safety-net path this may be acceptable, but a single atomic batch would eliminate the window. Worth a comment at minimum explaining the deliberate choice.


3. Magic number 1024 duplicated in lib.rs

MAX_UNFINALIZED_SLOTS is defined in crates/storage/src/store.rs but the cutoff formula in lib.rs:177 hardcodes 1024:

// lib.rs:177
let cutoff = finalized_slot.max(self.store.head_slot().saturating_sub(1024));

If MAX_UNFINALIZED_SLOTS is ever changed, lib.rs won't follow. Either make the constant pub and re-use it, or expose a helper like Store::safety_net_cutoff() -> u64 that encapsulates the formula.


4. prune_states and prune_old_blocks scan BlockHeaders twice

Both functions are byte-for-byte identical in their scan/filter logic (lines 1026–1046 and 1062–1081). That's two full BlockHeaders table scans where one would suffice. The keys-to-delete set is the same in both cases; they only differ in which tables they delete from. A single shared helper that returns the prunable key list would halve the I/O:

fn collect_prunable_block_roots(
    &self, cutoff_slot: u64, protected: &HashSet<H256>
) -> Vec<Vec<u8>> { ... }

Minor Issues

5. Log condition silently drops signature-only pruning

// store.rs:1007
if pruned_states > 0 || pruned_blocks > 0 || pruned_chain > 0 {

pruned_sigs and pruned_att_data are not part of the condition. If only gossip signatures or attestation data are pruned (plausible on first activation if blocks are somehow already gone), the log is suppressed. Consider || pruned_sigs > 0 || pruned_att_data > 0.


6. Redundant .into_iter() calls at call sites

After changing the return types to Vec, several call sites add .into_iter() before .map(...). That's fine since Vec doesn't implement Iterator directly, so the chain is correct. However, the for loop in update_safe_target:

for (key, new_proofs) in store.iter_new_aggregated_payloads() { ... }

works without .into_iter() because for calls it implicitly. The explicit .into_iter() in the test file and extract_latest_known_attestations is consistent and readable; just noting it for completeness.


What's Good

  • The no-op guard (if cutoff_slot <= finalized_slot { return; }) is correct and efficient — no work on a healthy chain.
  • The protected_roots set (head, finalized, justified, safe_target) covers the critical roots that fork choice needs even in a stalled state.
  • prune_pending_blocks correctly maintains all three maps atomically in-memory: pending_block_slots, pending_block_parents, and the retain-based cleanup of pending_blocks. There's no leak in the map-level bookkeeping.
  • Evicted pending block DB entries are handled: since insert_pending_block writes to BlockHeaders/BlockBodies/BlockSignatures (not LiveChain), prune_old_blocks will find and delete them on the same tick via the BlockHeaders scan.
  • The Vec return-type refactor is correct. The old code was doing a Vec allocation inside the method and then wrapping it in an iterator; callers then re-collected. Now the allocation is honest and visible.
  • The take_while early-exit in prune_live_chain is a legitimate optimization since LiveChain keys are big-endian slot-prefixed and therefore sorted.

Automated review by Claude (Anthropic) · custom prompt

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Implemented comprehensive memory management safeguards to prevent OOM during extended non-finalization periods. The changes add safety-net pruning to the database layer that automatically prunes data older than 1024 slots behind head when finalization stalls, and introduces age-based eviction for in-memory pending blocks whose parents never arrive.

Key improvements:

  • Safety-net DB pruning triggered at interval 0 prevents unbounded growth of States, Blocks, LiveChain, signatures, and attestation data when finalization is stalled for >1024 slots
  • New pending_block_slots map tracks pending block ages, enabling eviction of stale orphan blocks from memory maps
  • Iterator methods now honestly return Vec instead of impl Iterator, eliminating internal double-collection and making allocation costs visible to callers
  • All workspace tests pass (102/102), changes are backward-compatible

Confidence Score: 4/5

  • This PR is safe to merge with low risk - addresses a real OOM vulnerability with well-scoped changes
  • The implementation is clean and well-tested, reusing existing proven pruning methods for the safety-net mechanism. The API changes (Vec return types) are straightforward refactorings with all callers properly updated. Minor confidence reduction due to limited ability to verify edge cases in pending block eviction logic without live devnet testing under non-finalization conditions
  • Pay close attention to crates/blockchain/src/lib.rs for the pending block eviction logic, particularly the interaction between the three maps during pruning

Important Files Changed

Filename Overview
crates/blockchain/src/lib.rs Adds pending block eviction mechanism with new pending_block_slots map and prune_pending_blocks method called at interval 0
crates/storage/src/store.rs Implements safety-net pruning for stalled finalization and changes iterator methods to return Vec directly instead of impl Iterator
crates/blockchain/src/store.rs Updates callers of changed iterator methods by removing redundant .collect() calls or adding .into_iter() where needed
crates/blockchain/tests/forkchoice_spectests.rs Updates test code to handle new Vec return types from iterator methods by adding .into_iter() before .map()

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Slot Tick - Interval 0] --> B{Calculate Cutoff}
    B --> C[cutoff = max finalized_slot, head_slot - 1024]
    C --> D{cutoff > finalized_slot?}
    D -->|No - Finalization Healthy| E[Skip Safety-Net Pruning]
    D -->|Yes - Finalization Stalled| F[Store.safety_net_prune]
    F --> G[Build Protected Roots Set]
    G --> H[head, finalized, justified, safe_target]
    H --> I[Prune States slot ≤ cutoff]
    I --> J[Prune Blocks slot ≤ cutoff]
    J --> K[Prune LiveChain slot ≤ cutoff]
    K --> L[Prune Gossip Signatures]
    L --> M[Prune Attestation Data]
    M --> N[Prune Aggregated Payloads]
    N --> O[BlockChain.prune_pending_blocks cutoff]
    O --> P[Find Stale Pending Blocks slot ≤ cutoff]
    P --> Q[Remove from pending_block_slots]
    Q --> R[Remove from pending_block_parents]
    R --> S[Clean from pending_blocks children]
    S --> T[Remove Empty Parent Entries]
    T --> U[Log Pruning Stats]
Loading

Last reviewed commit: 6d62a2a

  When the chain runs without finalization (e.g., insufficient aggregators),
  all pruning is disabled since every prune function gates on finalized_slot
  advancing. The States table has no pruning at all, and each state is 100+ MB.
  After ~12 hours without finalization this can reach terabytes of data.

  Add a safety-net that computes cutoff_slot = max(finalized_slot,
  head_slot - 1024) and prunes states, blocks, live chain, signatures,
  attestation data, and aggregated payloads older than the cutoff. Protected
  roots (head, finalized, justified, safe_target) are never pruned. When
  finalization is healthy, cutoff equals finalized_slot and this is a no-op.

  Runs once per slot at interval 0 in on_tick, after tick processing but
  before block proposal.
When a block's parent never arrives (network partition, attack), entries in
pending_blocks and pending_block_parents accumulate forever. Add a
pending_block_slots map to track each pending block's slot, then evict
entries older than the safety-net cutoff (max of finalized_slot and
head_slot - 1024) at interval 0, right after safety_net_prune().
iter_gossip_signatures, iter_known_aggregated_payloads, and
iter_new_aggregated_payloads all .collect() into a Vec internally then
return .into_iter(). Return the Vec directly to make the allocation cost
visible and let callers avoid double-collecting.
…4 constant

  safety_net_prune already computes the cutoff using MAX_UNFINALIZED_SLOTS.
  Have it return the value so prune_pending_blocks can reuse it instead of
  recomputing with a hardcoded 1024.
@pablodeymo pablodeymo force-pushed the memory-growth-non-finalization branch from 6d62a2a to eddf01c Compare March 2, 2026 20:29
@MegaRedHand
Copy link
Collaborator

We'll try another approach

@MegaRedHand MegaRedHand closed this Mar 2, 2026
@MegaRedHand MegaRedHand deleted the memory-growth-non-finalization branch March 2, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle long periods of non-finalization

2 participants