Skip to content

Panic on checkpoint-sync restart: 'no reactor running' in delay_map (Tokio runtime context missing) #698

@zclawz

Description

@zclawz

Description

zeam crashes with a panic during a checkpoint-sync based restart. The panic originates in the delay_map crate when attempting to use a Tokio runtime outside of its context.

Error

thread '<unnamed>' (1) panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/delay_map-0.4.1/src/hashmap_delay.rs:94:46:
there is no reactor running, must be called from the context of a Tokio 1.x runtime
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5, aborting
General protection exception (no address available)
???:?:?: 0x73da065b99a2 in ??? (libc.so.6)
Unwind information for libc.so.6:0x73da065b99a2 was not available, trace may be incomplete

Context

  • Triggered on checkpoint-sync based restart
  • Crash occurs after receiving gossip aggregation for slot=10591
  • Node was receiving blocks_by_root requests and responding with block not found warnings immediately before the crash
  • The panic happens in a spawned thread not running within a Tokio async runtime context

Relevant Log Snippet

Mar-25 18:46:37.562 [s=10591 i=4] [info] (zeam): [node] received gossip aggregation for slot=10591 from peer=unknown_peer
thread '<unnamed>' (1) panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/delay_map-0.4.1/src/hashmap_delay.rs:94:46:
there is no reactor running, must be called from the context of a Tokio 1.x runtime

Root Cause (Suspected)

A thread spawned outside of the Tokio async runtime is attempting to use delay_map::HashMapDelay which internally relies on Tokio timers. This is likely triggered by the checkpoint-sync restart path initializing something in a non-async context.

Steps to Reproduce

  1. Start zeam with checkpoint-sync
  2. Restart the node using checkpoint-sync based restart
  3. Wait for gossip aggregations to arrive

Environment

Suggested Fix

Ensure any code paths using delay_map::HashMapDelay (or Tokio-dependent primitives) during checkpoint-sync restart are executed within a Tokio runtime context (e.g. via tokio::runtime::Handle::current() or ensuring the thread is spawned with tokio::spawn).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions