Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(l1): metrics for syncing #2080

Closed
wants to merge 26 commits into from
Closed

Conversation

MarcosNicolau
Copy link
Member

@MarcosNicolau MarcosNicolau commented Feb 25, 2025

Motivation
Full syncing on Holesky.

Description
Introduce SyncMetrics for full-syncing that logs performance metrics, including:

  • Total sync time: the time taken to fetch headers, bodies, and execute blocks.
  • Average block execution time: calculated as total_time / executed_blocks.
  • Add block time: the time spent executing add_block and its percentage over the total sync time.
  • Sync range details: the starting and final block numbers along with their corresponding hashes.
  • Execution blocks: the total number of blocks executed.
  • Performance comparison: the time difference from the previous execution.

Logging intervals are configured at 100, 1,000, 10,000, 100,000, and 1,000,000 blocks.

Example
Here is an example of what the logs look like:

INFO ethrex_p2p::sync::metrics: [SYNCING PERF] Last 1007 blocks performance:
    Total time (takes into account network requests): 329 seconds
    Time spent adding blocks: 185 seconds ~= 56.23% of total time
    Average block in total time: 0.327 seconds
    Blocks per second: 3.061
    Throughput: 0.093 Gigagas/s
    Started at block: 923437 (hash: 0xee769a45b1692be5cdbbf010bc84150e66dfc91d7ae7880412e248d57bceb3c0)
    Finished at block: 924453 (hash: 0x5da5e84b94a7ece5523e854ce6d5b3dc36783ec053eee0fb325053a4b98ecc8b)
    Execution count: 1007
    ======= Overall, this cycle took 329 seconds with respect to the previous one =======

Closes None

@MarcosNicolau MarcosNicolau requested a review from a team as a code owner February 25, 2025 15:35
Copy link

github-actions bot commented Feb 25, 2025

| File                                                                  | Lines | Diff |
+-----------------------------------------------------------------------+-------+------+
| /home/runner/work/ethrex/ethrex/crates/blockchain/blockchain.rs       | 378   | +8   |
+-----------------------------------------------------------------------+-------+------+
| /home/runner/work/ethrex/ethrex/crates/networking/p2p/sync.rs         | 502   | +29  |
+-----------------------------------------------------------------------+-------+------+
| /home/runner/work/ethrex/ethrex/crates/networking/p2p/sync/metrics.rs | 151   | +151 |
+-----------------------------------------------------------------------+-------+------+

Total lines added: +188
Total lines removed: 0
Total lines changed: 188

@MarcosNicolau MarcosNicolau self-assigned this Feb 25, 2025
Base automatically changed from fix/full-sync to main February 28, 2025 16:31
let throughput =
self.current_cycle.throughput / self.current_cycle.executed_blocks_count as f64;

tracing::info!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but the metric we care the most is gigagas per second

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metrics is indeed gigagas per second, I've added the unit in 322ab25

Copy link
Collaborator

@mpaulucci mpaulucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.get_block_header_by_hash(current_head)?
.unwrap_or_default()
.number;
self.metrics = SyncMetrics::new(current_head_block_num, current_head);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to persist this struct as part of the SyncManager if we will be overwriting it on each cycle? Why not have it as a separate temporary entity?

Comment on lines -329 to +340
// Full sync stores and executes blocks as it asks for the headers
// Full sync stores and executes blocks as it ask for the headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert

"[SYNCING PERF] Last {} blocks performance:\n\
\tTotal time (takes into account network requests): {} seconds\n\
\tTime spent adding blocks: {} seconds ~= {:.2}% of total time\n\
\tAverage block in total time: {:.3} seconds\n\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rephrase this? It is not clear what the number represents

Comment on lines +113 to +114
\tBlocks per second over total time: {:.3}\n\
\tBlocks per second add block: {:.3}\n\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, It is not clear enough

\tThroughput: {:.3} Gigagas/s\n\
\tStarted at block: {} (hash: {:?})\n\
\tFinished at block: {} (hash: {:?})\n\
\tExecution count: {}\n\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Total blocks executed ?

\tStarted at block: {} (hash: {:?})\n\
\tFinished at block: {} (hash: {:?})\n\
\tExecution count: {}\n\
\t======= Overall, this cycle took {} seconds with respect to the previous one =======",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not clear, is this the time this cycle took? is it the time difference between this cycle and the previous one? the wording is confusing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I looked correctly it's the time the cycle took, as it was the same as Total time (takes into account network requests), agree the wording seems confusing mentioning the previous cycle.

Comment on lines +116 to +119
Ok(AddBlockResultStats {
throughput,
time_spent_as_ms: interval,
})
Copy link
Collaborator

@rodrigo-o rodrigo-o Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a question given I'm coming from outside rust, is this an idiomatic way to handle metrics? Usually I try to avoid modifying the type of functions to accommodate to metrics and instead have a way to send the metrics inside it without affecting the outside world, something like:

struct SyncManager {
    metrics: Arc<dyn SomeKindOfMetricsCollector>,
    // other fields
}

impl SyncManager {
    fn new(metrics: Arc<dyn SomeKindOfMetricsCollector>) -> Self {
        Self {
            metrics,
            // initialize other fields
        }
    }
    
    [...]
    
    fn add_block(&self, block: &Block) -> Result<(), ChainError> {
        let start = Instant::now();
        // block processing logic

        let duration = start.elapsed().as_millis();
        let throughput = calculate_throughput(block, duration);
        
        self.metrics.record("throughput", throughput);
        self.metrics.record("duration", duration);

        Ok(())
    }
}

This is just an example, but wanted to show what I mean. Anyway, maybe it's more idiomatic to just return this kind of info in rust, just wanted to be sure.

@MarcosNicolau
Copy link
Member Author

Closing this pr as it is getting replaced by #2174

@MarcosNicolau MarcosNicolau deleted the feat/syncing-benchmark branch March 20, 2025 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants