-
Notifications
You must be signed in to change notification settings - Fork 369
refactor(tests): Use pytest collection to load JSON fixtures #1666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: forks/osaka
Are you sure you want to change the base?
Conversation
d18197e to
8503878
Compare
| # Remove any python files in the downloaded files to avoid | ||
| # importing them. | ||
| for python_file in glob( | ||
| os.path.join(fixture_path, "**/*.py"), recursive=True | ||
| ): | ||
| try: | ||
| os.unlink(python_file) | ||
| except FileNotFoundError: | ||
| # Not breaking error, another process deleted it first | ||
| pass | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels... strange? I can't quite put my finger on why.
Like, why do the fixtures contain python files at all? Is there another way we could accomplish the same thing (like excluding a directory)?
I dunno, this just triggers my spidey sense 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the culprit: https://github.com/ethereum/legacytests/tree/1f581b8ccdc4c63acf5f2c5c1b155c690c32a8eb/src/LegacyTests/Cancun/GeneralStateTestsFiller/Pyspecs
Checking out ethereum/tests at this commit, when submodules are included, results in these python files being checked out too, and when collecting ./tests/json_infra/fixtures for JSON files, pytest tries to collect these files too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we exclude that directory on the command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that because with this approach the files are collected directly by pytest, as opposed to doing a glob in the test itself.
| ALL_FIXTURE_TYPES.append(BlockchainTestFixture) | ||
| ALL_FIXTURE_TYPES.append(StateTestFixture) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these get executed when importing only, for example, .load_state_tests? From my limited knowledge of Python's import machinery, I would guess yes, but I'm just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's correct, it gets executed only when importing from .helpers. If we were to, for example, import directly from .helpers.fixtures, this logic would not be executed and ALL_FIXTURE_TYPES would be empty, so it is indeed a bit brittle if being honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh really? I thought parent modules were implicitly imported. I'm glad I checked!
| big_memory: Tuple[Pattern[str], ...] | ||
|
|
||
|
|
||
| @lru_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often is this called to require an lru_cache? O.o
Depending on when the cache is populated (in worker vs. in master), using lru_cache can explode memory: each worker has its own cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it thinking it might reduce the memory footprint and it did by half a GB, but it still consumes around 30GB+ because all fixtures are in memory when running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* zkevm: add BLOBHASH benchs Signed-off-by: Ignacio Hagopian <[email protected]> * generalize params Signed-off-by: Ignacio Hagopian <[email protected]> * improvements Signed-off-by: Ignacio Hagopian <[email protected]> --------- Signed-off-by: Ignacio Hagopian <[email protected]>
|
I was thinking briefly about this. I also know next to nothing about pytest, so this might not make any sense at all, but... What if we use an LRU cache for the JSON files (one per worker), and loadgroup all the tests that come from the same file? So you'd read once during collection, find all the tests and group them by file, then while running the tests you minimize the number of times you need to re-read the same file. |
fix(tests): Don't cache fixtures Try to implement cache Fix caching feat(tests): Manage cache during execution
53e92c6 to
c6408c9
Compare

🗒️ Description
This PR refactors the blockchain and state test infrastructure to leverage pytest's native collection mechanism via pytest_collect_file, eliminating redundant JSON file reads and improving test execution efficiency.
Key Improvements
Performance Impact
This refactoring significantly reduces I/O overhead for large test suites where the same JSON files contain multiple test cases across different forks.
Open Issues
Some failing tests still that need to be investigated, for now I'd like to start running this in CI and see how it improves execution speed.
🔗 Related Issues or PRs
N/A.
✅ Checklist
toxchecks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx --with=tox-uv tox -e statictype(scope):.Cute Animal Picture