Skip to content

perf(db): disable SQLite memory accounting#1988

Merged
kkovaacs merged 18 commits intomainfrom
krisztian/improve-sqlite-malloc-performance
Apr 24, 2026
Merged

perf(db): disable SQLite memory accounting#1988
kkovaacs merged 18 commits intomainfrom
krisztian/improve-sqlite-malloc-performance

Conversation

@kkovaacs
Copy link
Copy Markdown
Contributor

While doing some GetAccount benchmarking on the 0.13 node implementation with a snapshot of the testnet database I've noticed that a a significant amount of CPU cycles are spent on mutex contention:

flamegraph

Lock contention profiling results are showing that SQLite memory allocation (sqlite3Malloc) is serialized using a global mutex: futex_syscall_profile_short.txt

As described in #1966 some of our SQLite queries are not cacheable, leading to the queries being re-parsed by SQLite on every invocation. Making sure we're using cached prepared statements as much as we can would be the best, but unfortunately that's not trivial.

A short-term improvement is making sure SQLite memory allocation is not using a global mutex. It turns out that the mutex is required because SQLite by default maintains some statistics about allocations. Disabling that in the runtime configuration leads to a measurable improvement in GetAccount performance (which on 0.13 is using a query that's not cacheable).

Baseline performance measured with a local 0.13 node using a Grafana K6 script doing 1M GetAccount queries (using random account IDs that are in the DB snapshot), with 500 concurrent virtual users:

➤ k6 run -u 500 -i 2000 testnet_get_account.js

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: testnet_get_account.js
        output: -

     scenarios: (100.00%) 1 scenario, 500 max VUs, 10m30s max duration (incl. graceful stop):
              * default: 2000 iterations shared among 500 VUs (maxDuration: 10m0s, gracefulStop: 30s)



  █ TOTAL RESULTS

    checks_total.......: 1000001 6296.189678/s
    checks_succeeded...: 77.04%  770454 out of 1000001
    checks_failed......: 22.95%  229547 out of 1000001

    ✗ status is OK
      ↳  77% — ✓ 770454 / ✗ 229547

    EXECUTION
    iteration_duration...: avg=39.67s  min=15.84s   med=46.59s  max=48.4s    p(90)=47.38s   p(95)=47.61s
    iterations...........: 2000   12.592367/s
    vus..................: 76     min=76      max=500
    vus_max..............: 500    min=500     max=500

    NETWORK
    data_received........: 6.8 GB 43 MB/s
    data_sent............: 90 MB  563 kB/s

    GRPC
    grpc_req_duration....: avg=78.56ms min=360.66µs med=85.25ms max=471.93ms p(90)=121.74ms p(95)=132.41ms




running (02m38.8s), 000/500 VUs, 2000 complete and 0 interrupted iterations
default ✓ [======================================] 500 VUs  02m38.8s/10m0s  2000/2000 shared iters

With SQLite memory accounting disabled:

➤ k6 run -u 500 -i 2000 testnet_get_account.js

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: testnet_get_account.js
        output: -

     scenarios: (100.00%) 1 scenario, 500 max VUs, 10m30s max duration (incl. graceful stop):
              * default: 2000 iterations shared among 500 VUs (maxDuration: 10m0s, gracefulStop: 30s)



  █ TOTAL RESULTS

    checks_total.......: 1000001 9897.223174/s
    checks_succeeded...: 95.70%  957064 out of 1000001
    checks_failed......: 4.29%   42937 out of 1000001

    ✗ status is OK
      ↳  95% — ✓ 957064 / ✗ 42937

    EXECUTION
    iteration_duration...: avg=25.07s min=24.02s   med=25.09s  max=26.29s   p(90)=25.48s  p(95)=25.59s
    iterations...........: 2000   19.794427/s
    vus..................: 7      min=7       max=500
    vus_max..............: 500    min=500     max=500

    NETWORK
    data_received........: 8.5 GB 84 MB/s
    data_sent............: 95 MB  945 kB/s

    GRPC
    grpc_req_duration....: avg=49.3ms min=330.05µs med=46.69ms max=444.51ms p(90)=64.88ms p(95)=70.93ms




running (01m41.0s), 000/500 VUs, 2000 complete and 0 interrupted iterations
default ✓ [======================================] 500 VUs  01m41.0s/10m0s  2000/2000 shared iters

@kkovaacs kkovaacs added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Apr 23, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve SQLite-heavy read performance (notably GetAccount) by disabling SQLite’s memory allocation statistics, which otherwise introduce a global mutex and significant contention under concurrency.

Changes:

  • Add a one-time SQLite global configuration initializer.
  • Disable SQLite memory accounting via sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0).
  • Wire the initializer into Db::new and add the libsqlite3-sys dependency.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File Description
crates/db/src/lib.rs Calls the new SQLite global initializer during DB pool creation.
crates/db/src/init.rs Implements Once-guarded SQLite global configuration and disables MEMSTATUS.
crates/db/Cargo.toml Adds libsqlite3-sys for calling sqlite3_config.
Cargo.lock Locks the new direct dependency for the db crate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/db/src/init.rs Outdated
Comment thread crates/db/src/init.rs Outdated
Comment thread crates/db/src/lib.rs
@Mirko-von-Leipzig
Copy link
Copy Markdown
Collaborator

Mirko-von-Leipzig commented Apr 23, 2026

iiuc, the core bench results can be summarized by the grpc_req_duration,

before: avg=78.56ms min=360.66µs med=85.25ms max=471.93ms p(90)=121.74ms p(95)=132.41ms
after:  avg=49.3ms  min=330.05µs med=46.69ms max=444.51ms p(90)= 64.88ms p(95)= 70.93ms

for a total runtime change from 2m38s to 1m41s, or a 36% decrease.

That's pretty nuts..

Copy link
Copy Markdown
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks correct, though the stress-test failure makes me wonder why the Once is being set twice? Or why is the assert tripping..

Comment thread crates/db/src/init.rs Outdated
Comment thread crates/db/src/init.rs Outdated
kkovaacs and others added 2 commits April 23, 2026 10:31
Some places are creating SQLiteConnections directly so we have to explicitly
call the initialization function there.
Co-authored-by: Mirko <48352201+Mirko-von-Leipzig@users.noreply.github.com>
@kkovaacs
Copy link
Copy Markdown
Contributor Author

I think this looks correct, though the stress-test failure makes me wonder why the Once is being set twice? Or why is the assert tripping..

We're apparently using "naked" diesel::sqlite::SqliteConnection in some places (like store's bootstrap()) before creating a Db instance. That won't work because configuration changes have to be performed before making any SQLite operations.

I've now added an explicit call to the config init function to bootstrap. (BTW: is there a good reason we can't use Db for the migration step?)

@kkovaacs kkovaacs marked this pull request as ready for review April 23, 2026 09:27
@Mirko-von-Leipzig
Copy link
Copy Markdown
Collaborator

We're apparently using "naked" diesel::sqlite::SqliteConnection in some places (like store's bootstrap()) before creating a Db instance. That won't work because configuration changes have to be performed before making any SQLite operations.

I've now added an explicit call to the config init function to bootstrap. (BTW: is there a good reason we can't use Db for the migration step?)

No, in general we need a better database interface.

Copy link
Copy Markdown
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add in the additional compile-time flag where appropriate? Or separate PR?

@kkovaacs
Copy link
Copy Markdown
Contributor Author

Should we also add in the additional compile-time flag where appropriate? Or separate PR?

I'd rather have that as a separate PR.

@kkovaacs
Copy link
Copy Markdown
Contributor Author

Hmm, wondering if we should just disable memory accounting compile time instead of this runtime setup.

The advantage would be that we don't need code changes at all. The disadvantage is that it's easy to miss setting the environment variable and then you'll get bad performance... Maybe we should just add a runtime check and print a warning on startup?

@Mirko-von-Leipzig
Copy link
Copy Markdown
Collaborator

Hmm, wondering if we should just disable memory accounting compile time instead of this runtime setup.

The advantage would be that we don't need code changes at all. The disadvantage is that it's easy to miss setting the environment variable and then you'll get bad performance... Maybe we should just add a runtime check and print a warning on startup?

Something we could inject in a build.rs maybe? Or is that too late?

@kkovaacs
Copy link
Copy Markdown
Contributor Author

kkovaacs commented Apr 23, 2026

Something we could inject in a build.rs maybe? Or is that too late?

That's too late. This env var has to be present when the build.rs of libsqlite3-sys is running. We could add this to our Makefile and Dockerfile instead.

I'm now quite convinced that this is what we should be doing... This matters only for our release builds after all, so why complicate the code with ugly initializers?

@Mirko-von-Leipzig
Copy link
Copy Markdown
Collaborator

Something we could inject in a build.rs maybe? Or is that too late?

That's too late. This env var has to be present when the build.rs of libsqlite3-sys is running. We could add this to our Makefile and Dockerfile instead.

In theory we could also wrap all these crates into one, with a build.rs that exports this env var or no?

And then re-exports the relevant things.. sounds invasive though

@kkovaacs
Copy link
Copy Markdown
Contributor Author

I've removed all code changes from this PR and have just added setting the LIBSQLITE3_FLAGS compile-time environment variable to our Makefile and Dockerfile.

(Please note that the Dockerfile was actually broken: the node wouldn't even start up in that image because of the mismatching runner base image.)

The results are the same, and we're not littering our code with these settings.

@Mirko-von-Leipzig
Copy link
Copy Markdown
Collaborator

Mirko-von-Leipzig commented Apr 24, 2026

I've removed all code changes from this PR and have just added setting the LIBSQLITE3_FLAGS compile-time environment variable to our Makefile and Dockerfile.

(Please note that the Dockerfile was actually broken: the node wouldn't even start up in that image because of the mismatching runner base image.)

The results are the same, and we're not littering our code with these settings.

We may have additional places where this is required. Could you check the debian workflows?

We also need to inform our infra runners who currently have their own docker images.
imo we should just switch to only docker images, at least then we can just reuse a base image.

@kkovaacs
Copy link
Copy Markdown
Contributor Author

Applying these build options on my local 0.13 node and re-running the K6 GetAccount stress:

➤ k6 run -u 500 -i 2000 testnet_get_account.js

         /\      Grafana   /‾‾/  
    /\  /  \     |\  __   /  /   
   /  \/    \    | |/ /  /   ‾‾\ 
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/ 


     execution: local
        script: testnet_get_account.js
        output: -

     scenarios: (100.00%) 1 scenario, 500 max VUs, 10m30s max duration (incl. graceful stop):
              * default: 2000 iterations shared among 500 VUs (maxDuration: 10m0s, gracefulStop: 30s)



  █ TOTAL RESULTS 

    checks_total.......: 1000001 11554.29693/s
    checks_succeeded...: 95.73%  957367 out of 1000001
    checks_failed......: 4.26%   42634 out of 1000001

    ✗ status is OK
      ↳  95% — ✓ 957367 / ✗ 42634

    EXECUTION
    iteration_duration...: avg=21.45s  min=20.76s   med=21.44s  max=22.26s   p(90)=21.79s  p(95)=21.88s 
    iterations...........: 2000   23.108571/s
    vus..................: 272    min=272     max=500
    vus_max..............: 500    min=500     max=500

    NETWORK
    data_received........: 8.5 GB 98 MB/s
    data_sent............: 95 MB  1.1 MB/s

    GRPC
    grpc_req_duration....: avg=42.03ms min=442.07µs med=42.58ms max=325.61ms p(90)=52.08ms p(95)=57.26ms




running (01m26.5s), 000/500 VUs, 2000 complete and 0 interrupted iterations
default ✓ [======================================] 500 VUs  01m26.5s/10m0s  2000/2000 shared iters

That is, gRPC request median duration is down to 42.58ms, total runtime is 1:26.5.

@kkovaacs
Copy link
Copy Markdown
Contributor Author

As @Mirko-von-Leipzig pointed out, we can just add the environment variable to .cargo/config.toml.

@kkovaacs
Copy link
Copy Markdown
Contributor Author

kkovaacs commented Apr 24, 2026

stress-test sync-notes baseline (store seeded with seed-store -d /tmp/data --num-accounts 100000 --public-accounts-percentage 50):

➤ RUST_LOG=warn RUST_BACKTRACE=1 target/release/miden-node-stress-test benchmark-store -d /tmp/data --iterations 10000 --concurrency 16 sync-notes
Average request latency: 291.460051ms
P50 request latency: 289.198458ms
P95 request latency: 320.842778ms
P99 request latency: 338.393193ms
P99.9 request latency: 355.222449ms

With this change applied:

➤ RUST_LOG=warn RUST_BACKTRACE=1 target/release/miden-node-stress-test benchmark-store -d /tmp/data --iterations 10000 --concurrency 16 sync-notes
Average request latency: 68.528478ms
P50 request latency: 67.669285ms
P95 request latency: 77.917171ms
P99 request latency: 93.051844ms
P99.9 request latency: 107.414673ms

This is an extreme edge case though due to the SQLite queries involved.

@kkovaacs kkovaacs merged commit a44d980 into main Apr 24, 2026
18 checks passed
@kkovaacs kkovaacs deleted the krisztian/improve-sqlite-malloc-performance branch April 24, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog This PR does not require an entry in the `CHANGELOG.md` file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants