Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial version of benchmark experiment runner #1266

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

dannycjones
Copy link
Contributor

In order to investigate performance in Mountpoint, we want to be able to vary different parameters. In fact, it can be very useful to vary these parameters together to see how performance (such as sequential read throughput) changes as we vary two parameters together.

This change introduces a new benchmark running script which uses the Python framework Hydra to enumerate combinations of parameters, and then execute some function with each combination. The script manages the lifecycle of the mount-s3 file system and collecting data into an output folder.

The change currently does not reuse the FIO definitions used by our regression benchmarks. In the mid-term, these should be reconciled.

This pull request (PR) supersedes a previous PR: #986.

Does this change impact existing behavior?

No, this adds a new benchmark runner and benchmark definitions. This does not impact the Mountpoint file system.

Does this change need a changelog entry? Does it require a version change?

No, no impact to Mountpoint file system or crates.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@dannycjones dannycjones temporarily deployed to PR integration tests February 11, 2025 17:16 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 11, 2025 17:16 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 11, 2025 17:16 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 11, 2025 17:16 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 11, 2025 17:16 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 11, 2025 17:16 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 11, 2025 17:16 — with GitHub Actions Inactive
@dannycjones dannycjones marked this pull request as ready for review February 12, 2025 09:18
@dannycjones dannycjones requested review from mansi153 and a team February 12, 2025 09:18
try:
# TODO: Add resource monitoring during FIO job
_run_fio(cfg, mount_dir)
success = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try/except/else also is probably wanted here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what we want the else for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a section on benchmarks and how to create a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a short desc on how the benchmark works, where its defined

Comment on lines 194 to 196
mp_version = _mount_mp(cfg, metadata, mount_dir)
mounted = True
metadata["mp_version"] = mp_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to parameterise mountpoint flags? If so, we could also add them to metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here, can you rephrase?

Comment on lines 42 to 43
This will run the default experiment, including many different configuration combinations.
Output is written to `multirun/` within directories for the date, time, and job run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could be nice to include an example output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I described the output a bit more. I'm worried it'll be hard to maintain and become outdated quickly here.

]
subprocess_env = {}

if cfg['s3_prefix'] is not None:
Copy link

@sahityadg sahityadg Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use mountpoint args as they are than have a different mapping? For example, we could just use prefix instead of s3_prefix, debug-crt instead of mountpoint_debug_crt etc.,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that it doesn't map very well as a straight forward mapping. Right now, it wraps both Mountpoint and FIO - maybe we might extend either (more benchmarking tools or more layers of MP, though I'm not confident on how well it can be generalized).

Additionally, there's the parameters mentioned in this script as well as when they are read in Pandas when analysing results. I diverged from MP naming to try and make the results clearer (for example, using fuse_threads instead of max-threads).

I'm leaning more to keep them diverged for now.

@dannycjones dannycjones temporarily deployed to PR integration tests February 20, 2025 08:22 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 20, 2025 08:22 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 20, 2025 08:22 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 20, 2025 08:22 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 20, 2025 08:22 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 20, 2025 08:22 — with GitHub Actions Inactive
@dannycjones dannycjones temporarily deployed to PR integration tests February 20, 2025 08:22 — with GitHub Actions Inactive
@muddyfish
Copy link
Contributor

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants