Skip to content

feat: committor service #366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 104 commits into from
Jul 10, 2025
Merged

feat: committor service #366

merged 104 commits into from
Jul 10, 2025

Conversation

thlorenz
Copy link
Collaborator

@thlorenz thlorenz commented May 15, 2025

Summary

Adding a full fledged committor service to support the following kind of commits:

  • commits up to 10Kb as long as the size increase per commit is <=1Kb
  • max accounts per bundle is 3 without lookup tables and 12 with lookup tables
  • commit results are persisted into a SQlite database in the ledger directory which makes it
    easy to manually retry them or do this automatically (next committor version will support this)

Details

The following crates were developed
here and integrated
into the validator:

magicblock-rpc-client

  • thin wrapper around the solana rpc client with added features like improved transaction
    signature status checking (including sane retries)
  • lookup table convenience methods
  • multi account fetch with known max limit
  • waiting for slots

magicblock-table-mania

  • a lookup table manager that allows to create, extend and deactivate + close lookup tables
  • the API supports reserving pubkeys which will be added to a table and when they need to be
    used in a transaction the needed table addresses are provided
  • once a pubkey is released this is noted and a table is deactivated and closed on chain once
    all its pubkeys are released

magicblock-committor-program

  • allows the committor service to upload account data to be committed in chunks
  • supports creation/resizing of buffers and filling them with transactions running in parallel
    (out of order is ok)

magicblock-committor-service

  • commits a changeset as fast and efficiently as possible
  • first prefers transactions that don't require lookup tables since those take time to become
    available
  • then prefers args only commits to the use of buffers (which require preparation)

Sample Flow

  1. Validator starts up and reserves common pubkeys (used in all commits) with the committor
    service
  2. Account are cloned and for each account the pubkeys for commits are reserved to prepare the
    lookup tables which we might need as fast as possible
  3. Account(s) commit is scheduled and registered as we did before
  4. The commits are processed via the committor service and the on chain transaction signature
    is provided to the user via logs in a transaction (as we did before, just now we have to wait
    for the commit to complete)

For those commits the committor service ensures that accounts with the same commit (bundle) id
are always processed atomically.

The committor service also picks the best possible strategy to commit each changeset,
preferring speed and then cost.

It also inserts configurable (via code) compute budget instructions to each transaction.

The outcome of each commit is persisted to a database which allows manual (and in the next
version) automated retries of failed commits.
Succeessful commits are also persisted to the database and can be used for diagnostics. In the
future they should be removed since no retry is necessary.

On chain signatures for process-commit/finaliz/undelegate transactions are also persisted to
the database in a separate table.

This table is queried by the validator to log those signatures as part of a transaction that
the user waits for.

Performance

I made sure that there was no performance degradation and

redline config using 1 account per transaction
# Example configuration file for Redline application

# Inidicates how many concurrent benchmarks to run
#
# NOTE: each provided unit of parallelism will start its own thread of
# execution for benchmark, dramatically increasing the load on target
# validator. I.e. each benchmark will run in parallel with others on their own
# OS thread, so for example providing 10 for parallelism, has 10x load in
# comparison to 1. This might negatively affect validator performance running
# on the same host as the REDLINE will compete for compute resources with
# validator process
parallelism = 1

# Connection settings for main chain and ephemeral URLs
[connection]
# URL of the chain node to connect to
chain-url = "http://127.0.0.1:7799"
# chain-url = "http://api.devnet.solana.com"
# URL of the ephemeral node to connect to
ephem-url = "http://127.0.0.1:8899"
# Type of HTTP connection: "http1" or "{ http2 = { streams = <NUM> } }"
# http-connection-type = { http2 = { streams = 128 } }
http-connection-type = "http1"
# Maximum number of HTTP connections
http-connections-count = 16
# Maximum number of WebSocket connections
ws-connections-count = 16

# RPS Benchmark settings (getX Requests)
[rps-benchmark]
# whether to perform RPS benchmark, either RPS or TPS benchmark should be enabled or both
enabled = false
# Number of iterations for the benchmark
iterations = 1500
# The desired throughput for transaction submission to the target ER node,
# expressed in transactions per second.
# Note: This serves as a hint rather than a strict limit. For instance,
# specifying a rate of 10,000 RPS while the ER node's request handling
# rate is 3,000 RPS, will not increase the validator's capacity to handle the
# specified throughput. Consequently, any value exceeding the validator's
# saturation point is ineffective.
rps = 50
# Number of concurrent executions
concurrency = 50
# Number of accounts in the pool to be used getX requests
accounts-count = 8

# Mode of rps benchmark, options:
mode = "get-account-info"
# mode = "get-multiple-accounts"
# mode = "get-token-account-balance"
# mode = "get-balance"
# mode = { combined = [ "get-account-info", "get-balance" ] }

# TPS Benchmark settings
[tps-benchmark]
enabled = true
# Number of iterations for the benchmark
iterations = 2000

# The desired throughput for transaction submission to the target ER node,
# expressed in transactions per second.
# Note: This serves as a hint rather than a strict limit. For instance,
# specifying a rate of 10,000 TPS while the ER node's transaction ingestion
# rate is 3,000 TPS, will not increase the validator's capacity to handle the
# specified throughput. Consequently, any value exceeding the validator's
# saturation point is ineffective.
tps = 40
# Number of concurrent executions
concurrency = 40
# Perform a preflight check for each transaction: true or false
preflight-check = false


# Mode of tps benchmark, options:
# mode = "simple-byte-set"

# Alternative modes are:
#--------------------------------------------------------------------------------
#--------------------------------------------------------------------------------
# every clone-frequency-secs, and airdrop will be performed on of the readonly
# accounts on chain, thus triggering clone of account on ER

# mode = { trigger-clones = { clone-frequency-secs = 1, accounts-count = 16 } }

#--------------------------------------------------------------------------------
# performs an expensive hash compute in loop (iters times), 28 iterations
# consume almost the entirety of 200K CUs

# mode = { high-cu-cost = { iters = 8 } }

#--------------------------------------------------------------------------------
# performs data copy operation between accounts in the pool, each transaction
# involves 2 accounts, one readable, one writable, thus high number of
# transactions using intersecting set of accounts will create pressure on
# scheduler due to account locking

# mode = { read-write = { accounts-count = 16 } }

#--------------------------------------------------------------------------------
# Uses provided accounts in the pool to read the length of the data field an log it,
# all of the accounts are used in read only mode, so multiple transactions of this
# kind can run in parallel on multi-threaded scheduler

# mode = { read-only = { accounts-count = 32, accounts-per-transaction = 8 } }

#-------------------------------### COMMIT ###------------------------------------
# Sends commit transactions to the ER choosinng accounts-per-transaction from
# the pool of accounts-count

mode = { commit = { accounts-count = 16, accounts-per-transaction = 1 } }

#--------------------------------------------------------------------------------
# a combination of various benchmarking modes, to simulate more real world scenarios

# mode = { mixed = [ { read-write = { accounts-count = 16 } }, "simple-byte-set",  { commit = { accounts-count = 16, accounts-per-transaction = 2 } } ] }

[confirmations]
# Subscription settings
# Whether to subscribe to account notifications: true or false
subscribe-to-accounts = false
# Whether to subscribe to signature notifications: true or false
subscribe-to-signatures = true
# Enforce total synchronization: true or false
# this ensures that a transaction is considered completed only if account
# update and signature update has been received, while preventing other
# transactions from running, thus significantly decreasinng throughput.
# If disabled, transaction will reserve concurrency slot only for as long as
# it's required to receive HTTP response
enforce-total-sync = true
# TODO: unused
get-signature-status = false

[data]
# Data settings for account encoding and size
# Encoding type of the account data: "base58", "base64", "base64+zstd"
account-encoding = "base64+zstd"
# Size of the account data: bytes128, bytes512, bytes2048, bytes8192
account-size = "bytes512"

master

+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 2690   | 1116 | 12516 | 3340 | 7159      | 1801   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 2756   | 1125 | 12559 | 3433 | 7494      | 1901   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+

new committor

+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max     | Avg   | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 2219   | 1104 | 5525915 | 39628 | 3647      | 329493 |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0       | 0     | 0         | 0      |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 2247   | 1116 | 5525967 | 39668 | 3730      | 329498 |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Transactions Per Second (TPS) | 52           | 40     | 13   | 40      | 39    | 17        | 4      |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+

We can see that there is a huge deviation in this branch due to the first clone taking a lot
longer as it reserves the pubkeys needed to commit the cloned account in a lookup table.

I improved this delay a bit by initiating the reserve transaction before running the
clone transaction, thus performing both steps in parallel. However the reserve transaction takes a
magnitude longer than the cloning into our validator and thus the overall time didn't improve
much.

Redline Configuration same as above with below change

#-------------------------------### COMMIT ###------------------------------------
# Sends commit transactions to the ER choosinng accounts-per-transaction from
# the pool of accounts-count

mode = { commit = { accounts-count = 16, accounts-per-transaction = 8 } }

master

Lots of committor transactions fail here since the created transactions are too large since we
commit up to 8 accounts in each transaction.

+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 1863   | 1010 | 17892 | 1959 | 2716      | 554    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 1901   | 1239 | 37572 | 2048 | 2949      | 1052   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+

new committor

NOTE: that here lots of commit transactions fail since they are created with the same ephemeral
blockhash and thus are identical to transactions that ran already.

Others fail due to commits being requested without the finalize of the previous commit having
completed.

This is an issue that we won't see in real life since identical account bundles won't be
committeed in such quick succession in practice.

+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max     | Avg   | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 1578   | 1110 | 7361215 | 55643 | 2932      | 592707 |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0       | 0     | 0         | 0      |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 1615   | 1145 | 7361283 | 55699 | 3010      | 592711 |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 17   | 40      | 39    | 40        | 3      |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+

thlorenz added 30 commits May 7, 2025 11:10
- at this point schedule commit tests pass with maximum concurrency
@thlorenz thlorenz force-pushed the thlorenz/committor branch from 629d0a2 to 6449d59 Compare July 8, 2025 10:01
@thlorenz thlorenz merged commit c24571a into master Jul 10, 2025
4 checks passed
@thlorenz thlorenz deleted the thlorenz/committor branch July 10, 2025 07:48
thlorenz added a commit that referenced this pull request Jul 11, 2025
* master:
  feat: committor service (#366)
thlorenz added a commit that referenced this pull request Jul 17, 2025
## Summary

This PR fixes the performance regression introduced in the previous
[this
PR](#366),
by not waiting for the lookup table transaction to succeed.
Instead it just logs the result and ensures right before committing an
account that the necessary pubkeys are in place.

## Details

### Table Mania Enhancements

- added ensure_pubkeys_table() method to guarantee pubkeys exist in
lookup tables without increasing reference counts
- implemented get_pubkey_refcount() method for querying refcount of
pubkeys across tables
- ix test for the new ensure functionality

### Committor Service Optimizations

- Modified commit process to ensure all pubkeys have tables before
proceeding with transactions
- Improved parallel processing by moving table reservation to async
spawned tasks

### General Improvements

#### Integration Testing Improvements

- Added individual make targets for all integration tests
(test-schedulecommit, test-cloning, test-committor, etc.)
- Renamed list-tasks to list in the Makefile
- Enhanced test runner output with better test name reporting for
clearer failure diagnostics

### Performance

Performance is back to what it was on master, i.e. the first clone no
longer takes much longer
than subsequent clones. For comparison here are the performance results,
for more details see
[this
PR](#366)

#### master

```
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 2690   | 1116 | 12516 | 3340 | 7159      | 1801   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 2756   | 1125 | 12559 | 3433 | 7494      | 1901   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
```

#### new committor

```
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max     | Avg   | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 2219   | 1104 | 5525915 | 39628 | 3647      | 329493 |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0       | 0     | 0         | 0      |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 2247   | 1116 | 5525967 | 39668 | 3730      | 329498 |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
| Transactions Per Second (TPS) | 52           | 40     | 13   | 40      | 39    | 17        | 4      |
+-------------------------------+--------------+--------+------+---------+-------+-----------+--------+
```

We can see that there is a huge deviation in this branch due to the
first clone taking a lot
longer as it reserves the pubkeys needed to commit the cloned account in
a lookup table.


#### new committor on this branch

```
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 1827   | 1250 | 19942 | 1929 | 2537      | 705    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 1867   | 1272 | 21436 | 2003 | 2768      | 848    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
```

We can see that the deviation is back to a _sane_ amount.
In this case the max is still higher than on master, but that could be
an outlier.

I ran the perf test another time and confirmed this:

```
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Metric                        | Observations | Median | Min  | Max   | Avg  | 95th Perc | Stddev |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| sendTransaction Response (μs) | 2000         | 1889   | 1221 | 11291 | 1977 | 2656      | 471    |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Account Update (μs)           | 0            | 0      | 0    | 0     | 0    | 0         | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Signature Confirmation (μs)   | 2000         | 1922   | 1354 | 40313 | 2064 | 2847      | 1047   |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
| Transactions Per Second (TPS) | 51           | 40     | 40   | 40    | 40   | 40        | 0      |
+-------------------------------+--------------+--------+------+-------+------+-----------+--------+
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants