Skip to content

feat: improve ledger metrics gathering + updated ledger size management #410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

thlorenz
Copy link
Collaborator

@thlorenz thlorenz commented Jun 24, 2025

Summary

Ledger metrics no longer rely on counting entries in ledger columns.

Ledger size management was redone to not rely on column counting either.

Details

Metrics

Previously metrics relied on counting entries in ledger columns which is very slow for larger
ledgers.
We also had tried to improve perf by caching counts, but that proved not sufficient.

Instead now we update prometheus counters directly when an entry is added. We no longer use any
methods that count entries in ledger columns in order to update metrics.

The only piece that still relies on these methods is the ledger_stats command. However this
only runs during inspection/diagnostics and thus performance is not as much of a concern. For
that reason I also removed all counter caching in order to not incurr any extra overhead while
the validator is running.

Renaming Metrics

Since ledger metrics where previously tracked as gauges and now are counters, the metric names
are changed as follows:

  • ledger_blocktimes_gauge -> ledger_blocktimes_count
  • ledger_blockhashes_gauge -> ledger_blockhashes_count
  • ledger_slot_signatures_gauge -> ledger_slot_signatures_count
  • ledger_address_signatures_gauge -> ledger_address_signatures_count
  • ledger_transaction_status_gauge -> ledger_transaction_status_count
  • ledger_transaction_successful_status_gauge -> ledger_transaction_successful_status_count
  • ledger_transaction_failed_status_gauge -> ledger_transaction_failed_status_count
  • ledger_transactions_gauge -> ledger_transactions_count
  • ledger_transaction_memos_gauge -> ledger_transaction_memos_count
  • ledger_perf_samples_gauge -> ledger_perf_samples_count
  • ledger_account_mod_data_gauge -> ledger_account_mod_data_count

Ledger Size Management

We now use watermarks to keep track of the ledger size at particular slots and thus predict
more accurately how many slots to truncate to bring the ledger size below max size.

The default strategy truncates to 75% max size whenever we go above max size.
The code ended up more complex than the original truncator also due to handling lots of
edgecases related to restarts with lower max size and the finality slot.

Unit tests for each of those ensure that they are all handled correctly.

The original truncator and tests were removed.

NOTE: we track the accounts mod id at the watermark boundaries which we can use to truncate
these columns correctly as well, however we do not do that yet.

Truncation via Compaction

The updated truncation implementation was cherry picked from the
fix/ledger/delete-using-compaction-filter
branch
.

The main difference is that we don't create tombstones anymore since we use a filter in order
to delete rows during manual (triggered by the ledger size manager) or automatic compaction.
After cherry picking I added a guard to avoid deleting account mod ids since they don't have a
key starting with a slot. Otherwise we'd potentially delete account mod ids that are actually
still needed.

Greptile Summary

Major overhaul of ledger metrics and size management system, replacing slow column entry counting with direct Prometheus counter updates and introducing a watermark-based ledger size management system.

  • Replaced gauge-based metrics with counters (e.g. ledger_blocktimes_gaugeledger_blocktimes_count) for more efficient tracking
  • Implemented watermark-based ledger size tracking to accurately predict truncation points and target 75% max size
  • Removed entry counter caching from LedgerColumn to eliminate memory overhead
  • Added RocksDB compaction filter system to efficiently purge old slots without tombstones
  • Added safeguards in compaction to protect account mod IDs from accidental deletion

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

5 files reviewed, no comments
Edit PR Review Bot Settings | Greptile

@thlorenz thlorenz marked this pull request as draft June 24, 2025 04:55
@thlorenz thlorenz changed the title feat: improve ledger metrics gathering feat: improve ledger metrics gathering + updated ledger size management Jul 2, 2025
@thlorenz thlorenz marked this pull request as ready for review July 2, 2025 03:17
* master:
  release: v0.1.7 (#409)
  Feat: simpler gasless implementation (#406)
  perf: run ensure accounts concurrently
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 files reviewed, 7 comments
Edit PR Review Bot Settings | Greptile

@thlorenz thlorenz requested review from bmuddha and taco-paco July 2, 2025 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant