Skip to content

Conversation

MarcosNicolau
Copy link
Contributor

@MarcosNicolau MarcosNicolau commented Mar 7, 2025

Motivation
Accelerate syncing!

Description
This PR introduces block batching during full sync:

  1. Instead of storing and computing the state root for each block individually, we now maintain a single state tree for the entire batch, committing it only at the end. This results in one state trie per n blocks instead of one per block (we'll need less storage also).
  2. The new full sync process:
    • Request 1024 headers
    • Request 1024 block bodies and collect them
    • Once all blocks are received, process them in batches using a single state trie, which is attached to the last block.
  3. Blocks are now stored in a single transaction.
  4. State root, receipts root, and request root validation are only required for the last block in the batch.
  5. The new add_blocks_in_batch function includes a flag, should_commit_intermediate_tries. When set to true, it stores the tries for each block. This functionality is added to make the hive test pass. Currently, this is handled by verifying if the block is within the STATE_TRIES_TO_KEEP range. In a real syncing scenario, my intuition is that it would be better to wait until we are fully synced and then we would start storing the state of the new blocks and pruning when we reach STATE_TRIES_TO_KEEP.
  6. Throughput when syncing is now measured per batches.
  7. A new command was added to import blocks in batch

Considerations:

  1. Optimize account updates: Instead of inserting updates into the state trie after each block execution, batch them at the end, merging repeated accounts to reduce insertions and improve performance (see Optimize account updates in add_blocks_in_batch #2216) Closes Optimize account updates in add_blocks_in_batch #2216.
  2. Improve transaction handling: Avoid committing storage tries to the database separately. Instead, create a single transaction for storing receipts, storage tries, and blocks. This would require additional abstractions for transaction management (see Write batch of blocks in a single transaction in add_blocks_in_batch #2217).
  3. This isn't working for levm backend we need it to cache the executions state and persist it between them, as we don't store anything until the final of the batch (see Make add_blocks_in_batch work for LEVM #2218).
  4. In ci(core): benchmark for batch block import  #2210 a new ci is added to run a bench comparing main and head branch using import-in-batch

Closes None

@MarcosNicolau MarcosNicolau requested a review from a team as a code owner March 7, 2025 12:13
Copy link

github-actions bot commented Mar 7, 2025

Lines of code report

Total lines added: 361
Total lines removed: 0
Total lines changed: 361

Detailed view
+---------------------------------------------+-------+------+
| File                                        | Lines | Diff |
+---------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs      | 495   | +109 |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/trie.rs           | 812   | +4   |
+---------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync.rs        | 551   | +67  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/api.rs                | 197   | +6   |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs              | 1137  | +6   |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/in_memory.rs | 521   | +35  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/libmdbx.rs   | 1163  | +52  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/redb.rs      | 957   | +71  |
+---------------------------------------------+-------+------+
| ethrex/crates/vm/backends/mod.rs            | 321   | +11  |
+---------------------------------------------+-------+------+

// todo only execute transactions
// batch account updates to merge the repeated accounts
self.storage
.apply_account_updates_to_trie(&account_updates, &mut state_trie)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I understand now. I was hoping we could just call: https://github.com/lambdaclass/lambda_ethereum_rust/blob/0acc5e28b861f88c30cebb6cbfe0230970df25ed/crates/vm/backends/revm.rs#L96 get_state_transitions only once.

We would need to add a execute_blocks inside vm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can discuss this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this approach but it won't work without making larger modifications to the vm backend.

@MarcosNicolau MarcosNicolau marked this pull request as draft March 11, 2025 13:23
@MarcosNicolau MarcosNicolau force-pushed the feat/store-state-trie-n-blocks branch from 19e3593 to 6c037c7 Compare March 25, 2025 20:15
@MarcosNicolau MarcosNicolau added this pull request to the merge queue Mar 25, 2025
Merged via the queue into main with commit cdbfbe9 Mar 25, 2025
25 of 26 checks passed
@MarcosNicolau MarcosNicolau deleted the feat/store-state-trie-n-blocks branch March 25, 2025 21:59
pedrobergamini pushed a commit to pedrobergamini/ethrex that referenced this pull request Aug 24, 2025
…aclass#2174)

**Motivation**
Accelerate syncing!

**Description**
This PR introduces block batching during full sync:
1. Instead of storing and computing the state root for each block
individually, we now maintain a single state tree for the entire batch,
committing it only at the end. This results in one state trie per `n`
blocks instead of one per block (we'll need less storage also).
2. The new full sync process:
    - Request 1024 headers
    - Request 1024 block bodies and collect them
- Once all blocks are received, process them in batches using a single
state trie, which is attached to the last block.
3. Blocks are now stored in a single transaction.
4. State root, receipts root, and request root validation are only
required for the last block in the batch.
5. The new add_blocks_in_batch function includes a flag,
`should_commit_intermediate_tries`. When set to true, it stores the
tries for each block. This functionality is added to make the hive test
pass. Currently, this is handled by verifying if the block is within the
`STATE_TRIES_TO_KEEP` range. In a real syncing scenario, my intuition is
that it would be better to wait until we are fully synced and then we
would start storing the state of the new blocks and pruning when we
reach `STATE_TRIES_TO_KEEP`.
6. Throughput when syncing is now measured per batches.
7. A new command was added to import blocks in batch

Considerations:
1. ~Optimize account updates: Instead of inserting updates into the
state trie after each block execution, batch them at the end, merging
repeated accounts to reduce insertions and improve performance (see
lambdaclass#2216)~ Closes lambdaclass#2216.
2. Improve transaction handling: Avoid committing storage tries to the
database separately. Instead, create a single transaction for storing
receipts, storage tries, and blocks. This would require additional
abstractions for transaction management (see lambdaclass#2217).
3. This isn't working for `levm` backend we need it to cache the
executions state and persist it between them, as we don't store anything
until the final of the batch (see lambdaclass#2218).
4. In lambdaclass#2210 a new ci is added to run a bench comparing main and `head`
branch using `import-in-batch`

Closes None

---------

Co-authored-by: Martin Paulucci <[email protected]>
Co-authored-by: fmoletta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize account updates in add_blocks_in_batch
4 participants