draft: store seed refactor #657

SantiagoPittella · 2025-01-29T22:17:07Z

@bobbinth i'm applying your suggestion of a new structure (your comment here). The only thing I changed is that I create all the accounts before hand. But my numbers doesn't look that great.

I'm getting like 1.56 blocks/second. I'm now researching what can be the bottleneck.

Also, I have this info related to the store size:

Store file size every 1k batches:
0: 4096 bytes
1000: 4096 bytes
2000: 78749696 bytes
3000: 160059392 bytes
4000: 238931968 bytes
5000: 318918656 bytes
6000: 397672448 bytes
7000: 477810688 bytes
Average growth rate: 1215792.854961832 bytes per batch

This numbers are from running the binary for 100k accounts.

bobbinth · 2025-01-30T08:49:57Z

I'm getting like 1.56 blocks/second. I'm now researching what can be the bottleneck.

Do you know where the bottlenecks are? Taking a brief look at the code it seems like we are instantiating a lot of random number generators - these could be quite expensive (especially the RPO ones). So, I'd switch to lighter versions and also try to use the same instance as much as possible.

Store file size every 1k batches:
0: 4096 bytes
1000: 4096 bytes
2000: 78749696 bytes
3000: 160059392 bytes
4000: 238931968 bytes
5000: 318918656 bytes
6000: 397672448 bytes
7000: 477810688 bytes
Average growth rate: 1215792.854961832 bytes per batch

Something doesn't seem right here:

Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.
7000 blocks with 256 accounts created per block should result in 1.8M accounts. Where does the 100K accounts number come from?
Related to the above, if there are only 100K accounts, database size of almost 500MB doesn't really make a lot of sense (this would imply almost for 5MB per account). If it is more like 1.8M then it seems a bit low (about 270 bytes per account) - though, maybe it is possible.

Mirko-von-Leipzig · 2025-01-30T09:34:55Z

Something doesn't seem right here:

Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.

I didn't check how size is measured; but the write could be stuck in the WAL file still?

SantiagoPittella · 2025-01-30T14:33:09Z

Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.

I was checking only the size of miden-store.sqlite3; I'm now running it again for 1M accounts using the size of both files (as Mirko mentioned) combined as the total size.

Though I want to clarify that it is each 1000 batches, each block is 16 batches in the implementation so we keep track of the increase of size every 62.5 blocks.

7000 blocks with 256 accounts created per block should result in 1.8M accounts. Where does the 100K accounts number come from?

The 7k are batches, so it is like ~440 blocks. We are using 255 accounts per block + 1 tx to mint assets to each one of the accounts.

Related to the above, if there are only 100K accounts, database size of almost 500MB doesn't really make a lot of sense (this would imply almost for 5MB per account). If it is more like 1.8M then it seems a bit low (about 270 bytes per account) - though, maybe it is possible.

I will come back with the results of this new run with 1M accounts and with the store size fixed. Currently it is taking like ~1 hour to run for 1M accounts (for the total process).

SantiagoPittella · 2025-01-30T14:56:50Z

here you can visualize a flamegraph for 1M accounts (removed the preview because it was too big):
https://github.com/user-attachments/assets/f7639a8c-63cc-4384-99cb-fedc50e8cc58

SantiagoPittella · 2025-01-30T20:17:56Z

I added a couple more of metrics, and re run a couple times with different number of accounts.

I'm consistently getting 4550~ish bytes/account

SantiagoPittella · 2025-01-30T20:41:40Z

I'm consistently getting 4550~ish bytes/account

It is worth mention that this number is the result of doing total_db_size / total_number_account so it includes blocks and any other stuff that we store in the DB.

I ran the following query in the DB and got this results:

SELECT
     name,
     SUM(pgsize) AS size_bytes,
     (SUM(pgsize) * 1.0) / (SELECT COUNT(*) FROM accounts) AS bytes_per_row
 FROM dbstat
 WHERE name = 'accounts';

And got this results:
accounts| 6025216 bytes | 60.2750645245193 bytes per account

bobbinth · 2025-01-30T20:50:37Z

I added a couple more of metrics, and re run a couple times with different number of accounts.

I'm consistently getting 4550~ish bytes/account

4.5KB is better than 5MB - but still looks pretty high. Part of this is nullifiers and notes - but I don't see how these contribute more than 1KB (about 40 bytes for nullifiers + 80 bytes for notes + 500 bytes for note authentication paths). But it is also possible I'm missing something.

Another possibility is that SQLite doesn't do compaction, and maybe there is a lot of "slack" in the file.

igamigo · 2025-01-30T21:17:00Z

There's also the overhead of block headers and indices as well, but those are also probably too small to consider.
One test that you can do is run a VACUUM command to see if the file sizes change considerably after the initial seeding finishes.

wip: refactor

a90577c

fix: lint

814458a

SantiagoPittella added 2 commits January 30, 2025 12:26

fix: apply taplo

6b392ba

use blocks intead of batches for info

58cac73

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: store seed refactor #657

draft: store seed refactor #657

SantiagoPittella commented Jan 29, 2025

bobbinth commented Jan 30, 2025

Mirko-von-Leipzig commented Jan 30, 2025

SantiagoPittella commented Jan 30, 2025

SantiagoPittella commented Jan 30, 2025

SantiagoPittella commented Jan 30, 2025

SantiagoPittella commented Jan 30, 2025

bobbinth commented Jan 30, 2025

igamigo commented Jan 30, 2025

draft: store seed refactor #657

Are you sure you want to change the base?

draft: store seed refactor #657

Conversation

SantiagoPittella commented Jan 29, 2025

bobbinth commented Jan 30, 2025

Mirko-von-Leipzig commented Jan 30, 2025

SantiagoPittella commented Jan 30, 2025

SantiagoPittella commented Jan 30, 2025

SantiagoPittella commented Jan 30, 2025

SantiagoPittella commented Jan 30, 2025

bobbinth commented Jan 30, 2025

igamigo commented Jan 30, 2025