Skip to content

fix: do not penalize peers for Invalid blocks from new_payload#297

Open
constwz wants to merge 2 commits intodevelopfrom
fix/issue-290-skip-invalid-peer-penalty
Open

fix: do not penalize peers for Invalid blocks from new_payload#297
constwz wants to merge 2 commits intodevelopfrom
fix/issue-290-skip-invalid-peer-penalty

Conversation

@constwz
Copy link
Contributor

@constwz constwz commented Mar 23, 2026

Summary

  • Stop penalizing peers when new_payload returns Invalid. Return None instead of Err(BadBlock) to the network layer.

Problem

In BSC's PoSA with devp2p block propagation, new_payload returning Invalid frequently results from timing issues during concurrent reorgs — the block itself is legitimate but was executed against the wrong state. The current code applies BadBlock reputation penalty (-16,384) per occurrence.

Under BSC's fast block time (down to 0.45s), this creates a death spiral:

  1. Concurrent reorg → some blocks executed on wrong state → Invalid
  2. Peers penalized: 4 hits → peer banned (threshold -51,200)
  3. Recovery: +1/sec → 4.5 hours per penalty — but new reorgs keep arriving
  4. Fewer peers → easier to fall behind → more reorgs → more penalties → fewer peers
  5. Eventually peers=1 → network gap → pipeline sync stall (Occasional sync stall triggered by deep reorg with low peer count #290)

This is the root cause behind peers dropping from 4–5 to 1 as described in #290, and the persistent peer loss in #258.

Solution

All Invalid results from new_payload are now logged but do not trigger peer reputation penalties. This aligns with geth's behavior:

  • geth's fetcher only calls dropPeer when verifyHeader() fails (header-level validation)
  • geth does not penalize peers when insertChain() fails (execution-level validation)
  • reth's new_payload combines both header and execution validation, so we cannot distinguish between the two — the safe default is to not penalize

Truly malicious peers (sending malformed messages, protocol violations) are still caught by the network layer's existing BadMessage / BadProtocol reputation penalties, which operate at the RLPx session level before blocks ever reach new_payload.

Test plan

  • Deploy to BSC testnet node that currently experiences peer loss
  • Monitor reth_network_connected_peers — should stabilize at normal levels (20-50)
  • Monitor with RUST_LOG="net::peers=trace" — confirm no BadBlock reputation changes
  • Verify reth_consensus_engine_beacon_pipeline_runs stops incrementing during normal operation
  • Confirm node handles concurrent reorgs without losing peers

Refs #290, #258

🤖 Generated with Claude Code

In BSC's PoSA with devp2p block propagation, new_payload returning
Invalid frequently results from timing issues during concurrent
reorgs — the block itself is legitimate but was executed against the
wrong state. The current code reports all Invalid blocks as BadBlock
to the network layer, applying -16384 reputation per occurrence.
Under BSC's fast block time (0.45s), this rapidly drains peers:
just 4 reorg-related Invalids ban a peer, while recovery takes
4.5 hours per penalty at +1/sec.

This aligns with geth's behavior: geth's fetcher only drops a peer
when header verification fails (verifyHeader), never when block
execution fails (insertChain). Truly malicious peers are still
caught by the network layer's BadMessage / BadProtocol penalties.

Refs #290, #258

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants