Parallel I/O Autotuning

Model-driven autotuning of MPI-IO hints and Lustre striping parameters for HPC applications. The project combines active learning with gradient-boosted and extra-trees regressors to discover high-throughput configurations for two widely used proxy applications: S3D-IO and BT-IO.

Why this project?

Manual exploration of the parallel I/O stack is slow and error-prone. We automate it by steering benchmarks with Bayesian optimisation.
Three complementary models shorten the tuning cycle: an active-learning loop (model 1), a fast extra-trees regressor for runtime prediction (model 2), and an XGBoost model for bandwidth prediction (model 3).
Results, plots, and the full final report are available in final/ (see final/code/final.png, final/report.pdf).

Repository layout

final/code/ – polished autotuning pipeline, notebooks, trained models, benchmark harness, and figures used in the final report.
final/readme – legacy notes kept for provenance.
progress/ – earlier experiments, notebooks, and scripts preserved for reference.
scripts/ – helper shell scripts that were used to sync data during development.

Key files inside final/code/:

active/ – Jupyter notebooks implementing the three models plus saved scalers and trained regressors.
S3D-IO/ & btio-pnetcdf-1.1.1/ – benchmark sources and PBS wrappers used to collect measurements.
read_config_general.py, bt_read_config_general.py – non-notebook runners that execute a benchmark using the parameters stored in confex.json.
stats.txt, BTIOstats.txt, S3DIOstats.txt – consolidated results captured during active-learning runs.

Prerequisites

Access to a cluster with a PBS-compatible scheduler (qsub) and Lustre (lfs) commands.
PnetCDF installed and visible via PNETCDF_DIR.
Python 3.6.7 (tested version), virtualenv, Jupyter, and compiler toolchain needed by the benchmarks.

Set up the Python environment:

python3 -m venv env
source env/bin/activate
pip install -r final/code/requirements.txt

Build the benchmarks (adjust PNETCDF_DIR if required):

export PNETCDF_DIR=/path/to/pnetcdf
cd final/code/S3D-IO
mkdir -p output
make
cd ../btio-pnetcdf-1.1.1
make
mkdir -p output

Running the autotuners

All notebooks expect the repository path to be assigned to project_dir (include the trailing /) and assume the benchmarks were built as above.

Model 1 – Active learning loop

cd final/code/active
jupyter notebook
Open either S3D-IO active learning.ipynb or BTIO active learning.ipynb.
Update project_dir in the second cell, adjust command-line arguments for your node/PPN/grid configuration, and execute the notebook (restart-and-run-all).
The notebook iteratively updates confex.json, launches benchmark runs via read_config_general.py / bt_read_config_general.py, and appends measurements to stats.txt or BTIOstats.txt.

Model 2 – Extra Trees time predictor (S3D-IO)

From the same notebook server, open predicting_time.ipynb and ensure project_dir is correct.
Run the notebook to regenerate the Extra Trees model artefacts.
Open predicted_model.ipynb, comment out cells 3–4 and enable cells 5–6 as instructed inside the notebook to load the time model.
Update the os.chdir paths and confex.json locations before executing the notebook to obtain predicted optimal parameters.

Model 3 – XGBoost bandwidth predictor

S3D-IO: run predicting Write Bandwidth-XGB-BOOST.ipynb, then execute predicted_model.ipynb with cells 3–4 active and paths updated to your environment.
BT-IO: run predicting Write Bandwidth-XGB-BOOST-BTIO.ipynb, then execute predicted_model-BTIO.ipynb with the correct paths.

Both notebooks emit the best-performing configuration into confex.json, which you can immediately evaluate with the CLI runners:

cd final/code
python3 read_config_general.py -c "<nx ny nz npx npy npz restart>" -n <nodes> -p <ppn>
python3 bt_read_config_general.py -c "<grid points>" -n <nodes> -p <ppn>

The scripts submit a PBS job, wait for completion, parse the generated output, and append the measurements to the corresponding *stats.txt file.

Baselines and plotting utilities

default_S3D.py, bt_default.py, and default_run.sh reproduce baseline runs with stock MPI/Lustre settings.
Plotting helpers (default-best-plotscript.py, btio-default-best-plotscript.py, plotcombine.py) compare default throughput to tuned results. Generated figures are stored in plots/, bt_plots/, somemoreplots/, and summarised in final/code/final.png.

Logs and artefacts

app.log captures all benchmark submissions made through the Python wrappers.
active/result/gbm_trials-*.csv records every configuration explored by the Bayesian optimiser.
Intermediate CSVs, trained model pickles (*.sav), and scaler dumps (*.save) are kept in active/ for reproducibility.

Tips & troubleshooting

If the benchmarks fail to build, hardcode PNETCDF_DIR inside the respective Makefiles.
Ensure Lustre striping commands (lfs setstripe) succeed; otherwise adjust permissions or run against a Lustre-backed directory.
When adding new parameters to the search space, update both the notebooks and the confex.json schema accordingly.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
final		final
progress		progress
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parallel I/O Autotuning

Why this project?

Repository layout

Prerequisites

Running the autotuners

Model 1 – Active learning loop

Model 2 – Extra Trees time predictor (S3D-IO)

Model 3 – XGBoost bandwidth predictor

Baselines and plotting utilities

Logs and artefacts

Tips & troubleshooting

Further reading

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

divyanshsinghvi/parallel-IO-autotuning

Folders and files

Latest commit

History

Repository files navigation

Parallel I/O Autotuning

Why this project?

Repository layout

Prerequisites

Running the autotuners

Model 1 – Active learning loop

Model 2 – Extra Trees time predictor (S3D-IO)

Model 3 – XGBoost bandwidth predictor

Baselines and plotting utilities

Logs and artefacts

Tips & troubleshooting

Further reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages