Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: store seed refactor #657

Draft
wants to merge 4 commits into
base: tomasarrachea-stress-test
Choose a base branch
from

Conversation

SantiagoPittella
Copy link
Collaborator

@bobbinth i'm applying your suggestion of a new structure (your comment here). The only thing I changed is that I create all the accounts before hand. But my numbers doesn't look that great.

I'm getting like 1.56 blocks/second. I'm now researching what can be the bottleneck.

Also, I have this info related to the store size:

Store file size every 1k batches:
0: 4096 bytes
1000: 4096 bytes
2000: 78749696 bytes
3000: 160059392 bytes
4000: 238931968 bytes
5000: 318918656 bytes
6000: 397672448 bytes
7000: 477810688 bytes
Average growth rate: 1215792.854961832 bytes per batch

This numbers are from running the binary for 100k accounts.

@bobbinth
Copy link
Contributor

I'm getting like 1.56 blocks/second. I'm now researching what can be the bottleneck.

Do you know where the bottlenecks are? Taking a brief look at the code it seems like we are instantiating a lot of random number generators - these could be quite expensive (especially the RPO ones). So, I'd switch to lighter versions and also try to use the same instance as much as possible.

Store file size every 1k batches:
0: 4096 bytes
1000: 4096 bytes
2000: 78749696 bytes
3000: 160059392 bytes
4000: 238931968 bytes
5000: 318918656 bytes
6000: 397672448 bytes
7000: 477810688 bytes
Average growth rate: 1215792.854961832 bytes per batch

Something doesn't seem right here:

  • Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.
  • 7000 blocks with 256 accounts created per block should result in 1.8M accounts. Where does the 100K accounts number come from?
  • Related to the above, if there are only 100K accounts, database size of almost 500MB doesn't really make a lot of sense (this would imply almost for 5MB per account). If it is more like 1.8M then it seems a bit low (about 270 bytes per account) - though, maybe it is possible.

@Mirko-von-Leipzig
Copy link
Contributor

Something doesn't seem right here:

  • Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.

I didn't check how size is measured; but the write could be stuck in the WAL file still?

@SantiagoPittella
Copy link
Collaborator Author

Why does the first batch of 1000 blocks does not affect storage size? I guess after that we get consistent growth of about 75MB per 1000 blocks.

I was checking only the size of miden-store.sqlite3; I'm now running it again for 1M accounts using the size of both files (as Mirko mentioned) combined as the total size.

Though I want to clarify that it is each 1000 batches, each block is 16 batches in the implementation so we keep track of the increase of size every 62.5 blocks.

7000 blocks with 256 accounts created per block should result in 1.8M accounts. Where does the 100K accounts number come from?

The 7k are batches, so it is like ~440 blocks. We are using 255 accounts per block + 1 tx to mint assets to each one of the accounts.

Related to the above, if there are only 100K accounts, database size of almost 500MB doesn't really make a lot of sense (this would imply almost for 5MB per account). If it is more like 1.8M then it seems a bit low (about 270 bytes per account) - though, maybe it is possible.

I will come back with the results of this new run with 1M accounts and with the store size fixed. Currently it is taking like ~1 hour to run for 1M accounts (for the total process).

@SantiagoPittella
Copy link
Collaborator Author

here you can visualize a flamegraph for 1M accounts (removed the preview because it was too big):
https://github.com/user-attachments/assets/f7639a8c-63cc-4384-99cb-fedc50e8cc58

@SantiagoPittella
Copy link
Collaborator Author

I added a couple more of metrics, and re run a couple times with different number of accounts.

I'm consistently getting 4550~ish bytes/account

@SantiagoPittella
Copy link
Collaborator Author

I'm consistently getting 4550~ish bytes/account

It is worth mention that this number is the result of doing total_db_size / total_number_account so it includes blocks and any other stuff that we store in the DB.

I ran the following query in the DB and got this results:

SELECT
     name,
     SUM(pgsize) AS size_bytes,
     (SUM(pgsize) * 1.0) / (SELECT COUNT(*) FROM accounts) AS bytes_per_row
 FROM dbstat
 WHERE name = 'accounts';

And got this results:
accounts| 6025216 bytes | 60.2750645245193 bytes per account

@bobbinth
Copy link
Contributor

I added a couple more of metrics, and re run a couple times with different number of accounts.

I'm consistently getting 4550~ish bytes/account

4.5KB is better than 5MB - but still looks pretty high. Part of this is nullifiers and notes - but I don't see how these contribute more than 1KB (about 40 bytes for nullifiers + 80 bytes for notes + 500 bytes for note authentication paths). But it is also possible I'm missing something.

Another possibility is that SQLite doesn't do compaction, and maybe there is a lot of "slack" in the file.

@igamigo
Copy link
Collaborator

igamigo commented Jan 30, 2025

There's also the overhead of block headers and indices as well, but those are also probably too small to consider.
One test that you can do is run a VACUUM command to see if the file sizes change considerably after the initial seeding finishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants