Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions crates/blockchain/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,11 @@ impl BlockChainServer {
.inspect_err(|err| error!(%err, "Failed to publish aggregated attestation"));
}

// Safety-net pruning once per slot: prevents OOM when finalization is stalled
if interval == 0 {
self.store.safety_net_prune();
}

// Now build and publish the block (after attestations have been accepted)
if let Some(validator_id) = proposer_validator_id {
self.propose_block(slot, validator_id);
Expand Down
136 changes: 135 additions & 1 deletion crates/storage/src/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ use ethlambda_types::{
signature::ValidatorSignature,
state::{ChainConfig, State},
};
use tracing::info;
use tracing::{info, warn};

/// Key for looking up individual validator signatures.
/// Used to index signature caches by (validator, message) pairs.
Expand Down Expand Up @@ -112,6 +112,10 @@ fn decode_live_chain_key(bytes: &[u8]) -> (u64, H256) {
(slot, root)
}

/// Maximum number of unfinalized slots to retain before safety-net pruning kicks in.
/// 1024 slots at 4 seconds each = ~68 minutes of chain history.
const MAX_UNFINALIZED_SLOTS: u64 = 1024;

/// Fork choice store backed by a pluggable storage backend.
///
/// The Store maintains all state required for fork choice and block processing:
Expand Down Expand Up @@ -965,6 +969,136 @@ impl Store {
self.get_state(&self.head())
.expect("head state is always available")
}

// ============ Safety-Net Pruning ============

/// Safety-net pruning: prevents OOM when finalization is stalled.
///
/// Computes `cutoff_slot = max(finalized_slot, head_slot - MAX_UNFINALIZED_SLOTS)`.
/// When finalization is healthy, `cutoff == finalized_slot` and this is a no-op
/// (finalization-triggered pruning already handles it).
/// When finalization is stalled, prunes data older than 1024 slots behind head.
pub fn safety_net_prune(&mut self) {
let head_slot = self.head_slot();
let finalized_slot = self.latest_finalized().slot;
let cutoff_slot = finalized_slot.max(head_slot.saturating_sub(MAX_UNFINALIZED_SLOTS));

// No-op when finalization is healthy
if cutoff_slot <= finalized_slot {
return;
}

// Build set of roots that must never be pruned
let protected_roots: HashSet<H256> = [
self.head(),
self.latest_finalized().root,
self.latest_justified().root,
self.safe_target(),
]
.into_iter()
.collect();

let pruned_states = self.prune_states(cutoff_slot, &protected_roots);
let pruned_blocks = self.prune_old_blocks(cutoff_slot, &protected_roots);
let pruned_chain = self.prune_live_chain(cutoff_slot);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prune_live_chain doesn't respect protected_roots, which could break fork choice if justified_slot < cutoff_slot.

When justified_slot < cutoff_slot (e.g., justified at slot 500, cutoff at 976), the LiveChain entry for the justified checkpoint gets pruned. Fork choice starts from the justified root and requires its LiveChain entry to look up (slot, parent_root) (see fork_choice/src/lib.rs:52-54). Without this entry, blocks.get(&start_root) returns None, causing fork choice to fail.

This scenario occurs when both finalization and justification are stalled for >1024 slots (~68 minutes). While less common than finalization-only stalls, it's possible with severe network issues or insufficient validator participation.

Protected roots should either:

  1. Be checked in prune_live_chain (skip pruning LiveChain entries for protected roots), or
  2. Use a separate cutoff that ensures justified/safe always remain in LiveChain (e.g., max(finalized_slot, justified_slot.saturating_sub(1024)))
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 1003

Comment:
`prune_live_chain` doesn't respect `protected_roots`, which could break fork choice if `justified_slot < cutoff_slot`.

When `justified_slot < cutoff_slot` (e.g., justified at slot 500, cutoff at 976), the LiveChain entry for the justified checkpoint gets pruned. Fork choice starts from the justified root and requires its LiveChain entry to look up `(slot, parent_root)` (see `fork_choice/src/lib.rs:52-54`). Without this entry, `blocks.get(&start_root)` returns None, causing fork choice to fail.

This scenario occurs when both finalization and justification are stalled for >1024 slots (~68 minutes). While less common than finalization-only stalls, it's possible with severe network issues or insufficient validator participation.

Protected roots should either:
1. Be checked in `prune_live_chain` (skip pruning LiveChain entries for protected roots), or  
2. Use a separate cutoff that ensures justified/safe always remain in LiveChain (e.g., `max(finalized_slot, justified_slot.saturating_sub(1024))`)

How can I resolve this? If you propose a fix, please make it concise.

let pruned_sigs = self.prune_gossip_signatures(cutoff_slot);
let pruned_att_data = self.prune_attestation_data_by_root(cutoff_slot);
self.prune_aggregated_payload_table(Table::LatestNewAggregatedPayloads, cutoff_slot);
self.prune_aggregated_payload_table(Table::LatestKnownAggregatedPayloads, cutoff_slot);

if pruned_states > 0 || pruned_blocks > 0 || pruned_chain > 0 {
info!(
cutoff_slot,
head_slot,
finalized_slot,
pruned_states,
pruned_blocks,
pruned_chain,
pruned_sigs,
pruned_att_data,
"Safety-net pruning: finalization stalled"
);
}
}

/// Prune states for blocks with slot <= cutoff_slot, preserving protected roots.
///
/// Iterates BlockHeaders to find slots, then deletes matching States entries.
/// Returns the count of pruned states.
fn prune_states(&mut self, cutoff_slot: u64, protected_roots: &HashSet<H256>) -> usize {
let view = self.backend.begin_read().expect("read view");
let mut keys_to_delete = vec![];

for (key_bytes, value_bytes) in view
.prefix_iterator(Table::BlockHeaders, &[])
.expect("iterator")
.filter_map(|r| r.ok())
{
let Some(header) = BlockHeader::from_ssz_bytes(&value_bytes).ok() else {
warn!("Failed to decode block header during safety-net pruning");
continue;
};

if header.slot <= cutoff_slot {
let root = H256::from_ssz_bytes(&key_bytes).expect("valid root");
if !protected_roots.contains(&root) {
keys_to_delete.push(key_bytes.to_vec());
}
}
}
drop(view);

let count = keys_to_delete.len();
if !keys_to_delete.is_empty() {
let mut batch = self.backend.begin_write().expect("write batch");
batch
.delete_batch(Table::States, keys_to_delete)
.expect("delete states");
batch.commit().expect("commit");
}
count
}

/// Prune block headers, bodies, and signatures for blocks with slot <= cutoff_slot,
/// preserving protected roots. Returns the count of pruned blocks.
fn prune_old_blocks(&mut self, cutoff_slot: u64, protected_roots: &HashSet<H256>) -> usize {
let view = self.backend.begin_read().expect("read view");
let mut keys_to_delete = vec![];

for (key_bytes, value_bytes) in view
.prefix_iterator(Table::BlockHeaders, &[])
.expect("iterator")
.filter_map(|r| r.ok())
{
let Some(header) = BlockHeader::from_ssz_bytes(&value_bytes).ok() else {
continue;
};
Comment on lines +1073 to +1075
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently continues on decode failure, inconsistent with prune_states:1038 which warns. Add logging for consistency:

Suggested change
let Some(header) = BlockHeader::from_ssz_bytes(&value_bytes).ok() else {
continue;
};
let Some(header) = BlockHeader::from_ssz_bytes(&value_bytes).ok() else {
warn!("Failed to decode block header during safety-net pruning");
continue;
};
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 1073-1075

Comment:
Silently continues on decode failure, inconsistent with `prune_states:1038` which warns. Add logging for consistency:

```suggestion
            let Some(header) = BlockHeader::from_ssz_bytes(&value_bytes).ok() else {
                warn!("Failed to decode block header during safety-net pruning");
                continue;
            };
```

How can I resolve this? If you propose a fix, please make it concise.


if header.slot <= cutoff_slot {
let root = H256::from_ssz_bytes(&key_bytes).expect("valid root");
if !protected_roots.contains(&root) {
keys_to_delete.push(key_bytes.to_vec());
}
}
}
drop(view);

let count = keys_to_delete.len();
if !keys_to_delete.is_empty() {
let mut batch = self.backend.begin_write().expect("write batch");
batch
.delete_batch(Table::BlockHeaders, keys_to_delete.clone())
.expect("delete block headers");
batch
.delete_batch(Table::BlockBodies, keys_to_delete.clone())
.expect("delete block bodies");
batch
.delete_batch(Table::BlockSignatures, keys_to_delete)
.expect("delete block signatures");
batch.commit().expect("commit");
}
count
}
}

/// Write block header, body, and signatures onto an existing batch.
Expand Down