Skip to content

genlayer patches: decoded calldata, historical eth_call, mempool, ws subscribe, gas limit, client version#1

Open
MuncleUscles wants to merge 9 commits into
mainfrom
feat/eth-subscribe
Open

genlayer patches: decoded calldata, historical eth_call, mempool, ws subscribe, gas limit, client version#1
MuncleUscles wants to merge 9 commits into
mainfrom
feat/eth-subscribe

Conversation

@MuncleUscles

@MuncleUscles MuncleUscles commented Apr 16, 2026

Copy link
Copy Markdown
Member

Branch of matter-labs/anvil-zksync with a set of patches that make it usable as a local L1 for the GenLayer E2E stack. All six commits are self-contained and can be split out into individual upstream PRs later.

Commits

  1. feat: implement eth_subscribe/eth_unsubscribe over WebSocket — WS subscriptions for newHeads and logs, gated on ServerBuilder having WS enabled. Required by any client that watches chain events (including genlayer-node's SubscribeNewHead/SubscribeFilterLogs).

  2. feat: queue future-nonce transactions in the mempool — match real Ethereum mempool semantics. anvil-zksync rejected any tx whose nonce was ahead of the account's current nonce; this patch queues them and promotes on-demand.

  3. feat: identify as anvil in web3_clientVersion — return anvil/... so upstream dev-network detection in OZ Upgrades, wagmi, and friends works.

  4. feat: report per-tx gas limit (80M) for block.gasLimit — prevents aggressive gas clamping that otherwise rejected large deploys.

  5. fix(rpc): return decoded calldata from eth_getTransactionByHashget_tx_api was putting the raw RLP-encoded signed transaction into api::Transaction.input. Every ABI-aware client expects decoded calldata starting with the 4-byte selector. Use l2_tx.execute.calldata instead. This is straightforwardly a bug and worth upstreaming first.

  6. feat(rpc): honor block parameter on eth_call using previous_states snapshotscall_impl dropped the block parameter and always served latest state. Wire the existing per-block snapshots (already used by eth_getStorageAt) into eth_call as well. System-context keys are restored from current storage so the VM's block-level assertions (assert_next_block) stay consistent. On PrunedBlock errors we warn and fall back to current storage rather than hard-erroring, to keep retry-based consumers from hanging.

What this enables

Running the full GenLayer E2E test suite against a single local anvil-zksync process instead of the 30+ minute geth-based consensus deploy on CI. 33 of the 35 sampled E2E scenarios pass against this binary (the two failures are unrelated test-setup timing issues).

Relationship to upstream

This is an internal fork. Upstream PRs for the cleaner commits (especially 5, 1, 2, 4) can be opened separately against matter-labs/anvil-zksync without depending on this fork.

Test plan

  • cargo build --release --bin anvil-zksync — clean
  • Manual probe: eth_call at historical blocks now returns distinct values (same contract, different blocks, verified via getLatestAcceptedTxCount against the GenLayer consensus contract)
  • eth_getTransactionByHash returns 0xaab85d64... (decoded selector) instead of 0x02f9012e... (raw RLP)
  • Full GenLayer E2E: 001_happy_path, 006_constant_eqp, 010_gen_call (all three sub-scenarios), 014_gen_dbg_trace, 017_gen_get_contract_state pass against this binary — these previously failed either on calldata mismatch or on historical eth_call returning latest state

Summary by CodeRabbit

  • New Features

    • WebSocket subscriptions for eth_subscribe/eth_unsubscribe (newHeads, logs).
    • eth_call now supports block-specific execution context.
    • Mempool accepts and queues future-nonce transactions and promotes them as nonces advance.
  • Improvements

    • Increased historical state retention for more accurate historical calls.
    • Client version string adjusted for clearer tooling identification.
    • RPC telemetry and middleware enhanced.
  • Documentation

    • Updated supported RPC docs to mark subscription methods supported and clarify eth_subscribe behavior.

Enable WebSocket subscriptions for newHeads and logs event types.
Uses upstream EthPubSubServer trait from zksync_web3_decl with
broadcast channels for fan-out to multiple subscribers.

- Remove .http_only() to enable WS on the same port as HTTP
- Fix double set_rpc_middleware (second call replaced first)
- Add EthSubscriptionIdProvider for Ethereum-style hex IDs
- Broadcast full BlockHeader for newHeads (not just hash)
- Use PubSubFilter.matches() for log subscription filtering
- Validate topic count <= 4 before accepting log subscriptions
- Terminate subscriptions on anvil_reset/evm_revert via Notify
- Use send_timeout + sink.closed() for backpressure handling

Closes matter-labs#620
Add a pending-nonce queue to TxPool so that transactions whose nonce is
ahead of the sender's current account nonce are held until their
predecessor is mined, instead of being executed out of order and rejected
by the VM with `ValueMismatch`.

This matches geth/anvil mempool behavior and fixes parallel deploy flows
(e.g. zksync-era's parallel-deploy.ts) where multiple sequential-nonce
transactions are submitted via Promise.all from the same sender.

Changes:
- TxPool gains a `pending: HashMap<Address, BTreeMap<Nonce, Transaction>>`
  alongside the active set.
- New `add_tx_with_nonce_check(tx, current_nonce)` routes txs to active
  pool or pending queue based on nonce comparison.
- New `promote_pending(sender, next_nonce)` moves now-ready txs from
  pending to active after a block seals.
- `eth_sendRawTransaction` reads the sender's current nonce from storage
  and uses the new routing method.
- `NodeExecutor` collects unique senders before sealing, then re-reads
  their nonces post-execution and triggers `promote_pending` for each.
- `estimate_gas_impl` overrides future-nonce values in the estimation
  request with the current account nonce, so callers like ethers.js
  (which always estimates first) don't fail with `ValueMismatch`.

Tested with a smoke test that submits 5 sequential-nonce txs in REVERSE
order via viem; all 5 are accepted, queued, and mined in correct order.
Return `anvil/v<version> (zkSync v2.0)` instead of `zkSync/v2.0` so
tooling that special-cases dev networks recognizes anvil-zksync as one.

Most notably, OpenZeppelin Upgrades' `isDevelopmentNetwork` checks
`web3_clientVersion.split('/', 1)[0] === 'anvil'` to decide whether
to silently re-deploy implementations whose bytecode no longer exists
on a fresh chain. Without this, OZ throws `InvalidDeployment` from
stale .openzeppelin/<chain>.json entries, breaking any deploy script
that uses `deployProxy` against a fresh anvil-zksync instance.

Returning a string that starts with `anvil/` matches regular Anvil's
convention for local dev nodes and is the appropriate identity for an
anvil-style local zkSync node.
Previously `eth_getBlockByNumber` returned `MAX_BATCH_GAS_LIMIT` (~2^50)
as `block.gasLimit`. That's the VM batch gas limit (across all txs in
the block), not the limit for a single tx.

zkSync deploy tooling (e.g. genlayer-consensus' `epochAdvanceEpoch2.ts`)
reads `block.gasLimit` and uses it as the per-tx gas cap:

    const txGasLimit = blockGasLimit || 10_000_000n
    await staking.epochAdvance({ gasLimit: txGasLimit })

Sending a tx with gasLimit = 2^50 exceeds the VM's actual per-tx limit
(`MAX_L2_TX_GAS_LIMIT` = 80M) and triggers an `account-validation halt`
during execution.

Returning `MAX_L2_TX_GAS_LIMIT` (80M) as `block.gasLimit` matches what
real EVM chains report — the per-block gas cap is also the per-tx cap
because there's at most one tx with that limit in practice. Tooling
relying on `block.gasLimit` for gas estimation now works correctly.
`get_tx_api` populated `api::Transaction.input` with `common_data.input.data`,
which is documented in zksync-era (`core/lib/types/src/lib.rs`) as "Optional
input Ethereum-like encoded transaction if submitted via Web3 API" — i.e. the
raw signed RLP transaction bytes, not the inner calldata. For an EIP-1559
transaction this produced responses like:

    0x02f9012e8201040780840564eba0832fdc7894...aab85d64...

where `0x02` is the type byte, `f9012e` is the RLP list header, and the actual
4-byte function selector (`aab85d64` above) is buried inside the RLP-encoded
fields. Every ABI-aware tool expects `input` to start with the 4-byte selector
and silently mismatches when it does not — breaking any caller that does
`tx.input[:4] == fn_selector`.

Use `l2_tx.execute.calldata` instead; that is the decoded inner calldata and
matches the behavior of geth, foundry's anvil, and every other
Ethereum-compatible node. The unused `input_data` local binding is removed.
…apshots

`call_impl` (and the `call` JSON-RPC entry point) discarded the `block`
parameter and always executed against the current fork storage, so
`eth_call(..., at_block=N)` silently returned latest state. This is
a well-known limitation — the handler had an explicit `_block` placeholder
and a `TODO: Support` marker — but it leaves downstream tooling that does
archive-style queries (ABI-level readers, block-indexed RPCs) stuck with
stale data and no error signal.

The node already archives per-block state into `previous_states`
(`IndexMap<BlockHash, HashMap<StorageKey, StorageValue>>`, bounded by
`MAX_PREVIOUS_STATES`) and serves it from `get_storage_at_block` for
`eth_getStorageAt`. This change wires that same path into `eth_call`.

Specifically:

- Add `InMemoryNodeInner::read_storage_at_block`. For a historical block it
  clones the archived snapshot, then restores `SYSTEM_CONTEXT_ADDRESS` keys
  from current storage. That preserves user-contract state at block N while
  keeping the VM's internal block-level assertions (`assert_next_block` in
  multivm: `prev_block.number + 1 == next_block.number`, timestamp and
  hash chaining) consistent with the current batch env the caller builds
  from `create_l1_batch_env`.
- Thread `block: Option<BlockIdVariant>` through `InMemoryNode::run_l2_call`
  and `InMemoryNode::call_impl`.
- Plumb the JSON-RPC parameter instead of dropping it; remove the TODO.
- On error (e.g. requesting a block older than `MAX_PREVIOUS_STATES`,
  yielding `PrunedBlock`), log a warning and fall back to current storage
  rather than propagating the error. The live consumers we tested (genlayer
  node) interpret a hard error here as a transient RPC failure and retry
  indefinitely, turning "block outside archive window" into a hang. The
  warning preserves the original latest-state behavior while making the
  fallback observable.

Known limitations (called out in the new method's doc comment):
- Returns a flat `InMemoryStorage` on the historical path, so for forked
  networks it cannot reach back to the upstream fork for untouched keys.
  The `None` and not-found branches still use `ForkStorage::clone()`, which
  does fall back.
- System-context overlay means `eth_call` at a historical block sees the
  historical contract state but *current* block metadata. For the common
  case (ABI reads of user contracts), this is the intended behavior.

This unblocks running the GenLayer E2E test suite against anvil-zksync as a
local L1: the test harness's historical `gen_call` queries (`at block N`)
now resolve to the correct on-chain state instead of always-latest.
@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Implements Ethereum JSON-RPC PubSub (WebSocket) support (eth_subscribe/eth_unsubscribe for newHeads and logs), adds nonce-aware pending transaction queue and promotion, enables historical block-aware eth_call via storage snapshots, and wires subscription primitives and reset notifications through node/server layers.

Changes

Cohort / File(s) Summary
Docs & API exports
SUPPORTED_APIS.md, crates/api_decl/src/lib.rs, crates/api_server/src/lib.rs
Marked eth_subscribe/eth_unsubscribe as SUPPORTED and added EthPubSubServer/EthPubSubNamespace/EthSubscriptionIdProvider to public re-exports.
PubSub module
crates/api_server/src/impls/mod.rs, crates/api_server/src/impls/pubsub.rs
New pubsub module implementing Ethereum-compatible PubSub: EthPubSubNamespace, EthSubscriptionIdProvider, subscribe handling for newHeads and logs, event filtering, 5s send timeout, and reset/cleanup behavior.
Server wiring & middleware
crates/api_server/src/server.rs, crates/api_server/Cargo.toml
Added PubSub namespace into RPC module, set EthSubscriptionIdProvider, combined RPC logger and telemetry middleware; added async-trait and anyhow deps.
RPC implementations
crates/api_server/src/impls/eth.rs, crates/api_server/src/impls/web3.rs
eth_call now forwards block: Option<BlockIdVariant> into node call; max_priority_fee_per_gas returns zero; client_version returns an anvil-style string including CARGO_PKG_VERSION.
CLI / initialization
crates/cli/src/main.rs
Created block_subscription_tx and log_subscription_tx broadcast channels and shared reset_notify; threaded them into node initialization and executor construction.
In-memory node & historical calls
crates/core/src/node/in_memory.rs, crates/core/src/node/eth.rs, crates/core/src/node/inner/in_memory_inner.rs, crates/core/src/node/inner/mod.rs
Added broadcast senders and reset_notify fields; increased MAX_PREVIOUS_STATES to 1024; extended run_l2_call/call_impl to accept block param; implemented read_storage_at_block, resolve_historical_block, create_l1_batch_env_at_block, and reset-triggered subscription termination; emit headers/logs to subscription channels on seal.
Node executor & pool integration
crates/core/src/node/inner/node_executor.rs, crates/core/src/node/pool.rs, crates/core/src/node/eth.rs
NodeExecutor::new accepts optional TxPool; executor promotes pending txs after sealing; TxPool gains pending queue, add_tx_with_nonce_check, promote_pending, and pending_count; send_raw_transaction_impl now uses nonce-aware enqueueing.
Blockchain tx mapping
crates/core/src/node/inner/blockchain.rs
get_tx_api now derives input from l2_tx.execute.calldata instead of common_data.input.
Error formatting & imports
crates/zksync_error/src/*
Reordered some imports and simplified format! call layouts across error messages; no behavioral changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RPC_Server as RPC Server
    participant PubSub as EthPubSubNamespace
    participant Broadcast as Broadcast Channel
    participant Sink as Subscription Sink

    Client->>RPC_Server: eth_subscribe("newHeads")
    RPC_Server->>PubSub: subscribe(pending_sink, "newHeads")
    PubSub->>Broadcast: subscribe to block_tx
    PubSub->>PubSub: spawn forwarding task
    PubSub-->>RPC_Server: subscription_id
    RPC_Server-->>Client: {"jsonrpc":"2.0","result":"0x..."}

    Note over Broadcast,Sink: Event forwarding loop
    Broadcast->>PubSub: new header
    PubSub->>Sink: send SubscriptionMessage(header) (5s timeout)
    Sink-->>Client: {"jsonrpc":"2.0","method":"eth_subscription",...}

    Note over PubSub,Sink: Cleanup scenarios
    alt Sink closes
        PubSub->>PubSub: terminate task
    else Reset notified
        PubSub->>PubSub: notify_waiters() triggers cleanup
    else Channel lag
        PubSub->>PubSub: warn and close
    end
Loading
sequenceDiagram
    participant Client
    participant RPC as eth_call
    participant Node as InMemoryNode
    participant Storage as Storage

    Client->>RPC: eth_call(tx, block)
    RPC->>Node: call_impl(request, block)
    alt Block specified
        Node->>Node: read_storage_at_block(block)
        Node->>Storage: lookup historical snapshot
        alt Snapshot found
            Storage-->>Node: historical state
        else Not found
            Node-->>Node: fallback to fork_storage
        end
    else No block
        Node->>Node: use fork_storage
    end
    Node->>Node: apply state_override
    Node->>Node: run_l2_call with storage
    Node-->>RPC: result
    RPC-->>Client: response
Loading
sequenceDiagram
    participant Client
    participant RPC as send_raw_transaction
    participant Node as InMemoryNode
    participant Pool as TxPool
    participant Pending as Pending Queue

    Client->>RPC: send_raw_transaction(tx_data)
    RPC->>Node: send_raw_transaction_impl
    Node->>Node: decode & get sender current_nonce
    Node->>Pool: add_tx_with_nonce_check(tx, current_nonce)

    alt tx.nonce <= current_nonce
        Pool->>Pool: add_tx (active pool)
        Pool-->>Node: true
    else tx.nonce > current_nonce
        Pool->>Pending: insert pending[sender][nonce]
        Pool-->>Node: false (queued)
    end

    Note over Node,Pool: After block sealing
    Node->>Pool: promote_pending(sender, next_nonce)
    Pool->>Pending: read contiguous chain from next_nonce
    Pending-->>Pool: transactions ready
    Pool->>Pool: add_tx for each promoted tx
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Poem

🐰 I hopped into sockets with a twitch and a cheer,
Subscribed to newHeads and logs far and near,
Nonces lined up like carrots in rows,
Pending queues whisper where each transaction goes,
A rabbit's grin — the chain now hums clear! 🎉

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title comprehensively covers the six self-contained commits and their main purposes (decoded calldata, historical eth_call, mempool, ws subscribe, gas limit, client version), matching the detailed PR description and objectives.
Description check ✅ Passed The PR description provides excellent context: branch purpose, all six detailed commits with explanations, what it enables, test plan, and known issues. However, it deviates significantly from the repository's required template format (What/Why/Evidence sections with bullets and optional Notes).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/eth-subscribe

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MuncleUscles

Copy link
Copy Markdown
Member Author

Known issues (flagged by Codex adversarial review, deferred for internal use)

Both verified as either edge-case or not-applicable for the GenLayer E2E use case; both would need to be addressed before upstreaming individual patches to matter-labs/anvil-zksync.

Block context mismatch in historical `eth_call`

run_l2_call (crates/core/src/node/in_memory.rs:425-448) builds batch_env / system_env from create_l1_batch_env(), which derives the next block from the current head. We then swap in block-specific storage but leave the env alone, so contracts reading block.number, block.timestamp, base-fee, or any system-context-backed value at historical blocks get current values instead of the requested block's values.

Impact on GenLayer E2E: limited. Of the three consensus view functions we call historically — getLatestAcceptedTransactions, getLatestFinalizedTransactions, getTransactionAllData — only getTransactionAllData reads block.timestamp, and only on the Pending/Proposing branch used to compute the current activator. Most of our historical lookups target Accepted/Finalized statuses, which skip that branch entirely. A test that asserts activator identity at a Pending snapshot at a historical block during an idleness-timeout boundary would still misbehave — none have surfaced in the ~30-scenario run.

Before upstreaming to matter-labs: build the batch/system env from the requested block metadata as well, or reject non-latest block params until that is true.

Pool drop/clear/remove APIs don't touch the pending-nonce queue

drop_transaction, drop_transactions, and clear in crates/core/src/node/pool.rs:90-168 still mutate only the active inner set. anvil_dropTransaction, anvil_dropAllTransactions, and anvil_removePoolTransactions therefore report success while leaving queued transactions behind to be promoted once the nonce gap closes.

Impact on GenLayer E2E: zero — none of the three RPCs are called by the node, the test harness, or the runner scripts.

Before upstreaming: teach all pool-removal paths to inspect and mutate pending as well, and add regression tests.

128 historical snapshots is not enough for test runs that deploy, mutate,
then query at the deploy block — on our local GenLayer E2E setup a single
multi-minute scenario spans ~500–1000 blocks between the deploy and the
historical `gen_call`, and the query block gets pruned out of
`previous_states` before the test can reach it. `eth_call` then falls back
to latest state and the test sees "updated state" instead of "initial
state".

Each snapshot is a full clone of the storage HashMap, so the tradeoff is
memory. 1024 is ~8× the previous window at proportional memory cost
(~bounded few hundred MB to a few GB for realistic local stacks), which is
acceptable for a dev tool and comfortably covers the test scenarios we
hit.

Proper upstream fix would make this a CLI/config knob; leaving a TODO in
the doc comment.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
SUPPORTED_APIS.md (1)

140-140: ⚠️ Potential issue | 🟡 Minor

Stale description — web3_clientVersion no longer returns zkSync/v2.0.

Web3Namespace::client_version was changed in this PR to return anvil/v<CARGO_PKG_VERSION> (zkSync v2.0). Please update this row so the documented behavior matches the code.

✏️ Proposed doc fix
-| [`WEB3`](`#web3-namespace`) | [`web3_clientVersion`](`#web3_clientversion`) | `SUPPORTED` | Returns `zkSync/v2.0` |
+| [`WEB3`](`#web3-namespace`) | [`web3_clientVersion`](`#web3_clientversion`) | `SUPPORTED` | Returns `anvil/v<version> (zkSync v2.0)` (prefixed with `anvil/` so dev-network detection in tooling like OpenZeppelin Upgrades recognizes this as a local dev chain) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@SUPPORTED_APIS.md` at line 140, Update the table row for web3_clientVersion
to match the new behavior of Web3Namespace::client_version: replace the
description "Returns `zkSync/v2.0`" with something like "Returns
`anvil/v<CARGO_PKG_VERSION> (zkSync v2.0)`" so the documented return value
mirrors the code change (include the CARGO_PKG_VERSION placeholder as shown by
the implementation).
🧹 Nitpick comments (2)
crates/core/src/node/inner/blockchain.rs (1)

416-416: Decoded calldata source looks correct.

Switching input to l2_tx.execute.calldata properly returns the decoded calldata for eth_getTransactionByHash, matching standard Ethereum semantics (previously common_data.input held the raw RLP-encoded signed transaction, which is not what clients expect here).

Two small notes, both non-blocking:

  • l2_tx is locally owned in this closure, so the .clone() on calldata can be avoided by moving it. Since l2_tx is still used below for common_data/transaction_type/fees, consider destructuring once (e.g. let L2Tx { execute, common_data, .. } = l2_tx;) to move execute.calldata without cloning.
  • The raw: None field at Line 421 means eth_getTransactionByHash no longer exposes the raw signed tx anywhere in this response. If any downstream tooling in the GenLayer stack relied on the old (incorrect) input == raw RLP behavior, it will now see decoded calldata only — worth a quick sanity check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/inner/blockchain.rs` at line 416, Change the
construction that sets input: l2_tx.execute.calldata.clone().into() to move
calldata instead of cloning by first destructuring the local l2_tx (e.g. let
L2Tx { execute, common_data, transaction_type, .. } = l2_tx;) so you can use
input: execute.calldata.into() without .clone(); also confirm that raw: None
intentionally removes the raw RLP from the eth_getTransactionByHash response
and, if any downstream code expects the old raw RLP, update that consumer or add
a field to preserve it.
crates/cli/src/main.rs (1)

330-345: Wiring LGTM; minor nit on Arc import usage.

Channel capacities (1024) and Notify creation look reasonable; the initial receivers being dropped is the standard pattern for broadcast fan-out via tx.subscribe() on WS client connect.

Nit: Arc is already imported at line 45 (use std::sync::Arc;), so the fully-qualified path on line 332 is redundant.

🧹 Optional nit
-    let reset_notify = std::sync::Arc::new(tokio::sync::Notify::new());
+    let reset_notify = Arc::new(tokio::sync::Notify::new());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/cli/src/main.rs` around lines 330 - 345, The reset_notify creation
uses a fully-qualified std::sync::Arc::new(...) even though Arc is already
imported; replace std::sync::Arc::new(...) with Arc::new(...) where reset_notify
is defined (the variable used in the InMemoryNodeInner::init call) so the code
uses the existing use std::sync::Arc import and removes the redundant path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/core/src/node/in_memory.rs`:
- Around line 221-227: The block-creation change makes create_block() report
MAX_L2_TX_GAS_LIMIT instead of the VM batch limit, so update the assertions in
crates/core/src/node/eth.rs that currently call
get_max_batch_gas_limit(VmVersion::latest()) for produced blocks to expect
U256::from(MAX_L2_TX_GAS_LIMIT) (or otherwise reference the same
MAX_L2_TX_GAS_LIMIT constant) so tests match create_block()'s reported per-tx
gas limit.

In `@crates/core/src/node/inner/in_memory_inner.rs`:
- Around line 1312-1361: The helper read_storage_at_block currently returns
fork_storage or patches only SYSTEM_CONTEXT_ADDRESS from current_raw which lets
eth_call on historical blocks observe head metadata; update
read_storage_at_block and the VM env construction (where batch_env/system_env
are created in in_memory.rs) so that when a non-latest block is requested you
(1) load the matching historical state's block metadata (number, timestamp,
base_fee, etc.) from previous_states via the resolved block_hash and build both
the returned ReadStorage and the VM/system_env from that historical metadata, or
(2) if you cannot fully construct the historical VM/system context, reject
non-latest requests (return a Web3Error) instead of falling back to
fork_storage; ensure symbols referenced are read_storage_at_block,
previous_states, fork_storage, SYSTEM_CONTEXT_ADDRESS and the
batch_env/system_env construction points.

In `@crates/core/src/node/pool.rs`:
- Around line 12-14: The drop/remove APIs (drop_transaction, drop_transactions,
clear) only mutate TxPool::inner and fail to remove transactions from pending
(pending: Arc<RwLock<HashMap<Address, BTreeMap<Nonce, Transaction>>>>), so
pending entries can survive RPC drops and be promoted later; update each of
these methods to also acquire a write lock on pending and remove the
corresponding sender/nonce entries (for drop_transaction remove the specific
nonce from the sender's BTreeMap and remove the sender key if the map becomes
empty; for drop_transactions remove all nonces for the sender(s) being dropped;
for clear iterate and clear the entire pending map), ensuring you handle locking
correctly and keep invariants between inner and pending in TxPool.
- Around line 94-98: The code currently calls promote_pending(sender,
Nonce(current_nonce.0 + 1)) right after add_tx in the branch handling tx_nonce
<= current_nonce, which can promote N+1 into the active BTreeSet and allow
take_uniform()/TransactionOrder::Fees to select it before the gap-closing N is
mined; remove or disable the promote_pending call from this branch in pool.rs so
that queued txs are not promoted here and only add_tx(tx) is performed, and
ensure promotions remain handled in the post-seal logic in node_executor.rs (the
safe promotion point) instead; also update the nearby comment to state that
promotion is deferred until post-seal processing.

In `@SUPPORTED_APIS.md`:
- Around line 101-104: The table contains broken anchor links for ETH methods
eth_subscribe and eth_unsubscribe; either add corresponding detailed sections
titled "### eth_subscribe" and "### eth_unsubscribe" later in SUPPORTED_APIS.md
(matching the style/format of other ETH method sections like eth_call and
eth_getLogs) or remove the link wrappers in the table rows so they appear as
plain text (`eth_subscribe` and `eth_unsubscribe`) to match other non-detailed
entries; update the table rows referencing `ETH` / `eth_subscribe` and `ETH` /
`eth_unsubscribe` accordingly.

---

Outside diff comments:
In `@SUPPORTED_APIS.md`:
- Line 140: Update the table row for web3_clientVersion to match the new
behavior of Web3Namespace::client_version: replace the description "Returns
`zkSync/v2.0`" with something like "Returns `anvil/v<CARGO_PKG_VERSION> (zkSync
v2.0)`" so the documented return value mirrors the code change (include the
CARGO_PKG_VERSION placeholder as shown by the implementation).

---

Nitpick comments:
In `@crates/cli/src/main.rs`:
- Around line 330-345: The reset_notify creation uses a fully-qualified
std::sync::Arc::new(...) even though Arc is already imported; replace
std::sync::Arc::new(...) with Arc::new(...) where reset_notify is defined (the
variable used in the InMemoryNodeInner::init call) so the code uses the existing
use std::sync::Arc import and removes the redundant path.

In `@crates/core/src/node/inner/blockchain.rs`:
- Line 416: Change the construction that sets input:
l2_tx.execute.calldata.clone().into() to move calldata instead of cloning by
first destructuring the local l2_tx (e.g. let L2Tx { execute, common_data,
transaction_type, .. } = l2_tx;) so you can use input: execute.calldata.into()
without .clone(); also confirm that raw: None intentionally removes the raw RLP
from the eth_getTransactionByHash response and, if any downstream code expects
the old raw RLP, update that consumer or add a field to preserve it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3cd01f29-9bf9-4b31-9ff4-6c7840ee66d5

📥 Commits

Reviewing files that changed from the base of the PR and between 3a531cc and 3f51e75.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • SUPPORTED_APIS.md
  • crates/api_decl/src/lib.rs
  • crates/api_server/Cargo.toml
  • crates/api_server/src/impls/eth.rs
  • crates/api_server/src/impls/mod.rs
  • crates/api_server/src/impls/pubsub.rs
  • crates/api_server/src/impls/web3.rs
  • crates/api_server/src/lib.rs
  • crates/api_server/src/server.rs
  • crates/cli/src/main.rs
  • crates/core/src/node/eth.rs
  • crates/core/src/node/in_memory.rs
  • crates/core/src/node/inner/blockchain.rs
  • crates/core/src/node/inner/in_memory_inner.rs
  • crates/core/src/node/inner/mod.rs
  • crates/core/src/node/inner/node_executor.rs
  • crates/core/src/node/pool.rs

Comment on lines +221 to +227
// Report the per-tx gas limit (`MAX_L2_TX_GAS_LIMIT` = 80M) rather than the
// VM batch gas limit (~2^50). Tooling like `epochAdvanceEpoch2.ts` reads
// `block.gasLimit` and uses it as a per-tx gas cap, which would exceed the
// VM's actual per-tx limit and trigger an account-validation halt. Using
// the per-tx limit matches what real EVM chains report and what L2 clients
// expect.
gas_limit: U256::from(MAX_L2_TX_GAS_LIMIT),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the block gas-limit assertions to the new constant.

create_block() now reports MAX_L2_TX_GAS_LIMIT, but the expectations in crates/core/src/node/eth.rs still use get_max_batch_gas_limit(VmVersion::latest()) for produced blocks. Those assertions will start failing as soon as this behavior lands.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/in_memory.rs` around lines 221 - 227, The block-creation
change makes create_block() report MAX_L2_TX_GAS_LIMIT instead of the VM batch
limit, so update the assertions in crates/core/src/node/eth.rs that currently
call get_max_batch_gas_limit(VmVersion::latest()) for produced blocks to expect
U256::from(MAX_L2_TX_GAS_LIMIT) (or otherwise reference the same
MAX_L2_TX_GAS_LIMIT constant) so tests match create_block()'s reported per-tx
gas limit.

Comment thread crates/core/src/node/inner/in_memory_inner.rs
Comment on lines +12 to +14
/// Transactions waiting for their predecessor nonce to be mined.
/// Keyed by sender address, then by nonce.
pending: Arc<RwLock<HashMap<Address, BTreeMap<Nonce, Transaction>>>>,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update the removal APIs for the new pending store.

TxPool now has two sources of truth, but drop_transaction, drop_transactions, and clear still only mutate inner. A future-nonce tx left in pending can survive anvil_dropTransaction / anvil_dropAllTransactions and be promoted later, so those RPCs can report success without actually removing it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/pool.rs` around lines 12 - 14, The drop/remove APIs
(drop_transaction, drop_transactions, clear) only mutate TxPool::inner and fail
to remove transactions from pending (pending: Arc<RwLock<HashMap<Address,
BTreeMap<Nonce, Transaction>>>>), so pending entries can survive RPC drops and
be promoted later; update each of these methods to also acquire a write lock on
pending and remove the corresponding sender/nonce entries (for drop_transaction
remove the specific nonce from the sender's BTreeMap and remove the sender key
if the map becomes empty; for drop_transactions remove all nonces for the
sender(s) being dropped; for clear iterate and clear the entire pending map),
ensuring you handle locking correctly and keep invariants between inner and
pending in TxPool.

Comment on lines +94 to +98
if tx_nonce <= current_nonce {
self.add_tx(tx);
// Promote any contiguous pending txs that follow the next expected nonce.
self.promote_pending(sender, Nonce(current_nonce.0 + 1));
true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't promote queued txs into the active pool before the gap-closing tx is mined.

promote_pending() moves N+1... into the active BTreeSet, but take_uniform() still orders only by fee / submission order. With TransactionOrder::Fees, a promoted N+1 can outrank the newly added N and be sealed first, which recreates the nonce-gap failure this queue is trying to avoid. The post-seal promotion in crates/core/src/node/inner/node_executor.rs is the safe place to do this.

Suggested direction
-        if tx_nonce <= current_nonce {
-            self.add_tx(tx);
-            // Promote any contiguous pending txs that follow the next expected nonce.
-            self.promote_pending(sender, Nonce(current_nonce.0 + 1));
-            true
+        if tx_nonce <= current_nonce {
+            self.add_tx(tx);
+            true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if tx_nonce <= current_nonce {
self.add_tx(tx);
// Promote any contiguous pending txs that follow the next expected nonce.
self.promote_pending(sender, Nonce(current_nonce.0 + 1));
true
if tx_nonce <= current_nonce {
self.add_tx(tx);
true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/pool.rs` around lines 94 - 98, The code currently calls
promote_pending(sender, Nonce(current_nonce.0 + 1)) right after add_tx in the
branch handling tx_nonce <= current_nonce, which can promote N+1 into the active
BTreeSet and allow take_uniform()/TransactionOrder::Fees to select it before the
gap-closing N is mined; remove or disable the promote_pending call from this
branch in pool.rs so that queued txs are not promoted here and only add_tx(tx)
is performed, and ensure promotions remain handled in the post-seal logic in
node_executor.rs (the safe promotion point) instead; also update the nearby
comment to state that promotion is deferred until post-seal processing.

Comment thread SUPPORTED_APIS.md
Comment on lines +101 to +104
| [`ETH`](#eth-namespace) | [`eth_subscribe`](#eth_subscribe) | `SUPPORTED` | Starts a subscription (newHeads, logs) over WebSocket |
| [`ETH`](#eth-namespace) | [`eth_syncing`](#eth_syncing) | `SUPPORTED` | Returns an object containing data about the sync status or `false` when not syncing |
| [`ETH`](#eth-namespace) | [`eth_uninstallFilter`](#`eth_uninstallfilter) | `SUPPORTED` | Uninstalls a filter with given id |
| `ETH` | `eth_unsubscribe` | `NOT IMPLEMENTED` | Cancel a subscription to a particular event |
| [`ETH`](#eth-namespace) | [`eth_unsubscribe`](#eth_unsubscribe) | `SUPPORTED` | Cancel a subscription to a particular event |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Broken link fragments for eth_subscribe / eth_unsubscribe.

The table entries link to #eth_subscribe and #eth_unsubscribe, but no corresponding ### eth_subscribe / ### eth_unsubscribe sections were added further down in this document. Either add those sections (consistent with other ETH methods such as eth_call, eth_getLogs, etc.), or drop the link wrappers so these rows match the plain-text style used for other non-detailed entries. This is also what markdownlint MD051 is flagging.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 101-101: Link fragments should be valid

(MD051, link-fragments)


[warning] 103-103: Link fragments should be valid

(MD051, link-fragments)


[warning] 104-104: Link fragments should be valid

(MD051, link-fragments)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@SUPPORTED_APIS.md` around lines 101 - 104, The table contains broken anchor
links for ETH methods eth_subscribe and eth_unsubscribe; either add
corresponding detailed sections titled "### eth_subscribe" and "###
eth_unsubscribe" later in SUPPORTED_APIS.md (matching the style/format of other
ETH method sections like eth_call and eth_getLogs) or remove the link wrappers
in the table rows so they appear as plain text (`eth_subscribe` and
`eth_unsubscribe`) to match other non-detailed entries; update the table rows
referencing `ETH` / `eth_subscribe` and `ETH` / `eth_unsubscribe` accordingly.

MuncleUscles and others added 2 commits April 17, 2026 12:05
When `eth_call` is pinned to a historical block N, `run_l2_call` previously
built its `L1BatchEnv` / `SystemEnv` from `create_l1_batch_env()`, which
always derives the "next block on current head" (batch_number, miniblock,
and especially timestamp taken from `self.time.peek_next_timestamp()`).
The caller's requested historical block only affected the storage view
we swapped in — the VM still saw *current* `block.number` and
`block.timestamp` inside the call.

Contracts with time-sensitive logic (deadlines, idleness timeouts,
time-based branches — e.g. the `agreement_time_checker.py` pattern in
GenLayer E2E tests) therefore branched on latest-head time even when
the caller explicitly pinned a historical block, producing silently
wrong results.

Add `resolve_historical_block(Option<BlockIdVariant>) -> Option<L2BlockNumber>`
and `create_l1_batch_env_at_block(L2BlockNumber) -> Option<(L1BatchEnv, BlockContext)>`
on `InMemoryNodeInner`, and in `run_l2_call` build the env from the pinned
block when one is requested (timestamp = block.timestamp + 1 for monotonicity,
miniblock = block.number + 1, prev_block_hash = block.hash, batch number
derived the same way). Latest / None falls through to the original
`create_l1_batch_env()` path unchanged.

Not a full archive-execution implementation — calls still share the current
fee input provider and the current storage-overlay scheme (SYSTEM_CONTEXT
keys from current storage per the doc on `read_storage_at_block`). But it
fixes the immediate wrong-timestamp class of bugs that we observed
breaking time-sensitive GenLayer appeal tests.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/core/src/node/in_memory.rs (1)

418-469: ⚠️ Potential issue | 🟠 Major

Block parameter must be resolved consistently between batch env and storage lookups — do not call independently.

The block parameter is passed to both resolve_historical_block (line 435) and read_storage_at_block (line 454) independently. Each method reads self.blockchain via separate .await calls, and both can make different decisions about the same block input:

  1. Race condition: If blocks are added or pruned from previous_states between the two reads, resolve_historical_block might return Some(n) while read_storage_at_block cannot find that block's historical state and returns Ok(fork_storage) as fallback. This produces latest batch env + current storage, not the intended historical consistency.

  2. Divergent semantics: Both methods interpret unknown/future block IDs silently as "current" by returning None or Ok(fork_storage) respectively. If they read different blockchain states during their independent .await calls, they may disagree on whether a block is resolvable.

Fix: Resolve the block once and pass the resolved Option<L2BlockNumber> to both helpers, or short-circuit both to current when either cannot honor the historical request. This ensures batch env and storage always refer to the same block semantically.

Note: The fallback-on-error pattern at lines 448–453 is appropriate once read_storage_at_block is tightened to error on genuinely unreachable blocks rather than silently falling back for all unknowns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/in_memory.rs` around lines 418 - 469, In run_l2_call,
avoid calling resolve_historical_block and read_storage_at_block independently;
instead call resolve_historical_block(block).await once, store the resulting
Option<L2BlockNumber> (e.g. resolved_block) and use that same resolved_block
when invoking create_l1_batch_env_at_block and read_storage_at_block (or, if
create_l1_batch_env_at_block/read_storage_at_block cannot honor the historical
request, short-circuit to the current env/storage for both). Update the logic
around create_l1_batch_env_at_block, create_l1_batch_env, and
read_storage_at_block so they accept/consume the resolved_block decision and
ensure system_env (create_system_env) and storage_override (apply_state_override
/ StorageWithOverrides::new) are derived from the same resolved_block to
guarantee consistent historical vs latest semantics.
🧹 Nitpick comments (3)
crates/core/src/node/inner/in_memory_inner.rs (2)

259-307: Minor: redundant cast and a fragile default for l1_batch_number.

  • Line 273: (historical_batch + 1) as u32historical_batch is already u32 (Line 268), so the cast is dead. Drop it.
  • Line 268: block.l1_batch_number.unwrap_or_default() silently maps a None (which can happen for blocks where the L1 batch isn't tracked, e.g. some genesis/edge cases) to batch 0, producing batch = 1 in the env. For a historical call this can disagree with what block.number was actually executed under and skew anything that reads L1BatchNumber via system context. Prefer to bail out (return None) when l1_batch_number is missing so run_l2_call falls back deterministically rather than fabricating a batch number.
-        let historical_batch = block.l1_batch_number.unwrap_or_default().as_u32();
+        let historical_batch = block.l1_batch_number?.as_u32();
         let historical_miniblock = block.number.as_u32();
@@
-            batch: (historical_batch + 1) as u32,
+            batch: historical_batch + 1,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/inner/in_memory_inner.rs` around lines 259 - 307, In
create_l1_batch_env_at_block: remove the redundant cast on (historical_batch +
1) as u32 since historical_batch is already a u32, and change the handling of
block.l1_batch_number so that if l1_batch_number is None the function returns
None instead of using unwrap_or_default; use early return None when
block.l1_batch_number.is_none() and otherwise set historical_batch from the Some
value, then compute BlockContext.batch = historical_batch + 1 and construct
L1BatchEnv as before.

561-589: broadcast::send lag on bursty blocks/logs will silently drop notifications.

tokio::sync::broadcast channels have a fixed capacity (1024 in test_config, 16 in the test inner). When subscribers fall behind, the oldest queued items are dropped and receivers observe RecvError::Lagged(n). The producer side here uses let _ = ...send(...) (Lines 463 and 586), which discards both the no-subscribers Err(SendError) and any indication that the channel is being driven faster than consumers can drain it.

For a node that may emit hundreds of logs per block, this is the path most likely to silently corrupt a eth_subscribe("logs") stream for downstream tooling. Two cheap mitigations:

  • Size the channels based on expected fan-out (the receive side already has to handle Lagged, but make capacity proportional to max(logs_per_block) * subscribers).
  • At minimum, log at debug!/warn! when a send observes Err and there are known subscribers (i.e., block_subscription_tx.receiver_count() > 0), so subscriber starvation is observable.

Not a blocker for the tested GenLayer scenarios, but worth noting before upstreaming since eth_subscribe users (Geth-style consumers) won't expect silent gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/inner/in_memory_inner.rs` around lines 561 - 589, The
broadcast send calls (e.g.,
block_subscription_tx.send(Arc::new(PubSubResult::Header(header)))) currently
ignore SendError and thus silently drop notifications under lag; change the send
handling to inspect the Result and if Err occurs and
block_subscription_tx.receiver_count() > 0 then log a warn/debug with the
SendError (include context like "block header pubsub send failed: {err}"), and
consider increasing the broadcast channel capacity where it is created
(test_config / test inner) to match expected fan-out; update all similar sends
in this module (the header/log publish sites that call .send) to follow the same
pattern.
crates/core/src/node/in_memory.rs (1)

87-94: Memory budget: each archived snapshot is a full storage clone — 1024 deep can be heavy.

apply_batch snapshots fork_storage.inner.read().unwrap().raw_storage.state.clone() for every block (and again for the virtual block), so the steady-state cost is ~min(blocks_produced, 1024) full copies of the raw storage HashMap<StorageKey, StorageValue>. For long-running test runs against forked mainnet, that can grow into hundreds of MB to multiple GB before any pruning kicks in. The doc-comment already calls out the proper fix ("make this a CLI/config option"); please at least track this as a follow-up before upstreaming, since the previous default (128) was visibly load-bearing for memory.

If memory shows up as an issue in CI, a cheap interim mitigation is to store only diffs vs. the previous snapshot rather than full copies — but the config knob is the right fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/in_memory.rs` around lines 87 - 94, The constant
MAX_PREVIOUS_STATES and the heavy cloning in apply_batch (which calls
fork_storage.inner.read().unwrap().raw_storage.state.clone()) make long-running
forks memory-heavy; change MAX_PREVIOUS_STATES to be configurable (e.g., pass
through a Node/Config struct or environment/CLI flag used by the code that
constructs the in-memory node) and have apply_batch/archival logic read that
config instead of the hardcoded constant; as an interim mitigation consider
switching the archive format to store diffs against the previous snapshot rather
than full clones (or cap archival during tests), and add a follow-up issue/TODO
referencing MAX_PREVIOUS_STATES and apply_batch so this design decision is
tracked before upstreaming.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/core/src/node/inner/in_memory_inner.rs`:
- Around line 226-253: resolve_historical_block currently returns "future" block
numbers and silently falls back to latest; change resolve_historical_block to
validate the resolved L2BlockNumber actually exists in storage (use storage
lookups rather than trusting utils::to_real_block_number) and return None/Err
when the block id cannot be resolved instead of returning a future number; in
read_storage_at_block, do not return Ok(self.fork_storage.clone()) for unknown
BlockHashObject paths or missing-hash branches—return an explicit
Web3Error::NoBlock (or PrunedBlock where appropriate) for unknown/future block
ids and reserve the warn+fallback behavior only for the previous_states miss
case (when the block existed but its snapshot is pruned), keeping references to
resolve_historical_block, read_storage_at_block, utils::to_real_block_number,
create_l1_batch_env_at_block, run_l2_call, previous_states, and fork_storage to
locate the changes.

---

Outside diff comments:
In `@crates/core/src/node/in_memory.rs`:
- Around line 418-469: In run_l2_call, avoid calling resolve_historical_block
and read_storage_at_block independently; instead call
resolve_historical_block(block).await once, store the resulting
Option<L2BlockNumber> (e.g. resolved_block) and use that same resolved_block
when invoking create_l1_batch_env_at_block and read_storage_at_block (or, if
create_l1_batch_env_at_block/read_storage_at_block cannot honor the historical
request, short-circuit to the current env/storage for both). Update the logic
around create_l1_batch_env_at_block, create_l1_batch_env, and
read_storage_at_block so they accept/consume the resolved_block decision and
ensure system_env (create_system_env) and storage_override (apply_state_override
/ StorageWithOverrides::new) are derived from the same resolved_block to
guarantee consistent historical vs latest semantics.

---

Nitpick comments:
In `@crates/core/src/node/in_memory.rs`:
- Around line 87-94: The constant MAX_PREVIOUS_STATES and the heavy cloning in
apply_batch (which calls
fork_storage.inner.read().unwrap().raw_storage.state.clone()) make long-running
forks memory-heavy; change MAX_PREVIOUS_STATES to be configurable (e.g., pass
through a Node/Config struct or environment/CLI flag used by the code that
constructs the in-memory node) and have apply_batch/archival logic read that
config instead of the hardcoded constant; as an interim mitigation consider
switching the archive format to store diffs against the previous snapshot rather
than full clones (or cap archival during tests), and add a follow-up issue/TODO
referencing MAX_PREVIOUS_STATES and apply_batch so this design decision is
tracked before upstreaming.

In `@crates/core/src/node/inner/in_memory_inner.rs`:
- Around line 259-307: In create_l1_batch_env_at_block: remove the redundant
cast on (historical_batch + 1) as u32 since historical_batch is already a u32,
and change the handling of block.l1_batch_number so that if l1_batch_number is
None the function returns None instead of using unwrap_or_default; use early
return None when block.l1_batch_number.is_none() and otherwise set
historical_batch from the Some value, then compute BlockContext.batch =
historical_batch + 1 and construct L1BatchEnv as before.
- Around line 561-589: The broadcast send calls (e.g.,
block_subscription_tx.send(Arc::new(PubSubResult::Header(header)))) currently
ignore SendError and thus silently drop notifications under lag; change the send
handling to inspect the Result and if Err occurs and
block_subscription_tx.receiver_count() > 0 then log a warn/debug with the
SendError (include context like "block header pubsub send failed: {err}"), and
consider increasing the broadcast channel capacity where it is created
(test_config / test inner) to match expected fan-out; update all similar sends
in this module (the header/log publish sites that call .send) to follow the same
pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a411095-7870-42e1-b9c6-ef23ec615c13

📥 Commits

Reviewing files that changed from the base of the PR and between 3f51e75 and 67d1998.

📒 Files selected for processing (7)
  • crates/api_server/src/impls/eth.rs
  • crates/core/src/node/in_memory.rs
  • crates/core/src/node/inner/in_memory_inner.rs
  • crates/zksync_error/src/error/definitions.rs
  • crates/zksync_error/src/error/domains.rs
  • crates/zksync_error/src/identifier.rs
  • crates/zksync_error/src/lib.rs
✅ Files skipped from review due to trivial changes (3)
  • crates/zksync_error/src/lib.rs
  • crates/zksync_error/src/error/domains.rs
  • crates/zksync_error/src/identifier.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/api_server/src/impls/eth.rs

Comment on lines +226 to +253
/// Resolve an optional [api::BlockIdVariant] to an [L2BlockNumber], returning
/// `None` when the block is the current head (caller should fall back to
/// latest) or when the block cannot be resolved.
pub async fn resolve_historical_block(
&self,
block: Option<api::BlockIdVariant>,
) -> Option<L2BlockNumber> {
let block = block?;
let storage = self.blockchain.read().await;
let current = storage.current_block;
let number_u64 = match block {
BlockIdVariant::BlockNumber(bn) => {
utils::to_real_block_number(bn, U64::from(current.0))
}
BlockIdVariant::BlockNumberObject(o) => {
utils::to_real_block_number(o.block_number, U64::from(current.0))
}
BlockIdVariant::BlockHashObject(o) => {
storage.blocks.get(&o.block_hash).map(|b| b.number)?
}
};
let resolved = L2BlockNumber(number_u64.as_u32());
if resolved == current {
None
} else {
Some(resolved)
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent fallback to current head for unresolved historical blocks hides correctness issues.

Both helpers degrade to "use latest" without telling the caller, and run_l2_call then layers another silent fallback on top of that:

  • resolve_historical_block (Lines 229-253): for BlockNumber/BlockNumberObject, utils::to_real_block_number does not validate that the resolved number actually exists. If the caller passes a number greater than current_block, the function returns Some(future_number). create_l1_batch_env_at_block then fails (no hash), run_l2_call warns and uses the latest batch env — but read_storage_at_block is invoked with the same future block id and also silently returns current storage. Net effect: a request pinned to a non-existent block executes against latest state with only a debug-log breadcrumb.
  • read_storage_at_block (Lines 1395-1445): the BlockHashObject branch (Line 1410-1413) and the missing-hash branch (Line 1422-1425) Ok(Box::new(self.fork_storage.clone())) for unknown blocks rather than returning Web3Error::NoBlock/PrunedBlock. Only the truly-pruned-but-known case (previous_states.get miss after we found the hash) returns an error. From the caller's point of view, "block I never produced" and "block at the head" are indistinguishable.

For unknown / future block ids the correct behavior is to surface an error (Web3Error::NoBlock or similar) so JSON-RPC clients see a real failure. Only the genuinely-pruned case (block existed but its snapshot has aged out of MAX_PREVIOUS_STATES) should warn-and-fallback per the PR description.

♻️ Suggested direction
-        let number_u64 = match block {
-            BlockIdVariant::BlockNumber(bn) => {
-                utils::to_real_block_number(bn, U64::from(current.0))
-            }
-            BlockIdVariant::BlockNumberObject(o) => {
-                utils::to_real_block_number(o.block_number, U64::from(current.0))
-            }
-            BlockIdVariant::BlockHashObject(o) => {
-                storage.blocks.get(&o.block_hash).map(|b| b.number)?
-            }
-        };
-        let resolved = L2BlockNumber(number_u64.as_u32());
-        if resolved == current {
-            None
-        } else {
-            Some(resolved)
-        }
+        let number_u64 = match block {
+            BlockIdVariant::BlockNumber(bn) => utils::to_real_block_number(bn, U64::from(current.0)),
+            BlockIdVariant::BlockNumberObject(o) => {
+                utils::to_real_block_number(o.block_number, U64::from(current.0))
+            }
+            BlockIdVariant::BlockHashObject(o) => storage.blocks.get(&o.block_hash).map(|b| b.number)?,
+        };
+        let resolved = L2BlockNumber(number_u64.as_u32());
+        // Reject ids past the current head; they are not "latest".
+        if resolved > current { return None; }
+        if resolved == current { return None; }
+        // Verify we actually have the block locally before claiming we can serve it.
+        storage.hashes.get(&resolved)?;
+        Some(resolved)

And in read_storage_at_block, return Web3Error::NoBlock for the unknown-hash and missing-hash paths instead of Ok(fork_storage.clone()), leaving only the previous_states miss as the warn+fallback case (which is what run_l2_call's comment on Lines 448-453 of in_memory.rs actually intends to handle).

This is the same class of issue called out in past review (commit 6992e15 addressed env construction); the resolver/storage selectors still silently mask "block does not exist" cases.

Also applies to: 1384-1445

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/core/src/node/inner/in_memory_inner.rs` around lines 226 - 253,
resolve_historical_block currently returns "future" block numbers and silently
falls back to latest; change resolve_historical_block to validate the resolved
L2BlockNumber actually exists in storage (use storage lookups rather than
trusting utils::to_real_block_number) and return None/Err when the block id
cannot be resolved instead of returning a future number; in
read_storage_at_block, do not return Ok(self.fork_storage.clone()) for unknown
BlockHashObject paths or missing-hash branches—return an explicit
Web3Error::NoBlock (or PrunedBlock where appropriate) for unknown/future block
ids and reserve the warn+fallback behavior only for the previous_states miss
case (when the block existed but its snapshot is pruned), keeping references to
resolve_historical_block, read_storage_at_block, utils::to_real_block_number,
create_l1_batch_env_at_block, run_l2_call, previous_states, and fork_storage to
locate the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants