Standardization, extension of visualizations, and general updates #3

fjwillemsen · 2025-07-07T15:11:25Z

This PR introduces substantial updates to further standardize and extend this software for evaluation of auto-tuning algorithms, in particular by the following:

Visualization and reporting of results are modular, scores can be obtained programmatically using report_experiments.
New benchmark_hub repository and submodule hosting brute-forced benchmarking resources.
Extended and improved visualizations:
- Heatmaps for comparison per search space and / or over time
- Head-to-head plots for direct comparisons on practical impact (beta)
- New color handling based on existing color maps
Improved Experiments Schema:
- Strategy grouping and coloring
- Extended visualization settings such as visual minima and maxima
Further improvements:
- Switched to NumPy 2.0
- Python 3.12 support
- Auto-retry on missing data, smart cutoff handling, better error messages
- Execution Engine Enhancements
- Time-based cutoffs and runtime conversion
- Flexible support for optimization direction, cutoffs, objectives, and valid result thresholds
- Reformatted code with Ruff, improved test coverage

…hema

However, not working.

Integration of the new schema and KTT support

…s file

…error handling

…, T1 and T4 formats

Copilot

Pull Request Overview

This PR standardizes NaN usage in tests, refactors integration test paths and parameterization, and extends visualization/reporting with richer schema and format support.

Updated tests to use consistent np.nan, improved file path management, and enhanced parametric plotting tests.
Refactored core modules (runner, searchspace_statistics, visualize_experiments, report_experiments) to support T4 format, JSON schema validation, and new visualization scopes (heatmaps, head-to-head).
Added comprehensive JSON schemas (experiments.json, T4.json) and default experiment definitions.

Reviewed Changes

Copilot reviewed 51 out of 53 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/autotuning_methodology/unit/test_curves.py	Standardize NaN literal usage in unit tests
tests/autotuning_methodology/integration/test_visualization.py	Parametrize and enhance integration tests for plots
tests/autotuning_methodology/integration/test_run_experiment.py	Refactor experiment path setup and CLI argument tests
tests/autotuning_methodology/integration/test_report.py	Switch from `get_strategies` to `get_experimental_groups`
src/autotuning_methodology/visualize_experiments.py	Extend visualization settings and plotting logic
src/autotuning_methodology/validators.py	Add JSON schema validation helpers
src/autotuning_methodology/searchspace_statistics.py	Support T4 format loading and time-unit conversion
src/autotuning_methodology/runner.py	Refactor tuning interfaces and fix path logic
src/autotuning_methodology/report_experiments.py	Update aggregation key to use application names
src/autotuning_methodology/formats_interface.py	Introduce T4 format loader
src/autotuning_methodology/schemas/experiments.json	Major schema update for experiment setup files
src/autotuning_methodology/schemas/T4.json	Add T4 results schema
src/autotuning_methodology/experiments_defaults.json	Add default experiment configuration

Comments suppressed due to low confidence (1)

tests/autotuning_methodology/integration/test_run_experiment.py:64

In teardown_module, _remove_dir asserts experiment_path exists but it may not have been created, causing teardown failures. Guard with if experiment_path.exists() or pass ignore_permission_error=True.

    assert mockfiles_path_source.exists()

Copilot · 2025-07-07T15:31:12Z

tests/autotuning_methodology/integration/test_visualization.py

    for plot_filepath in plot_filepaths:
-        assert plot_filepath.exists(), f"{plot_filepath} does not exist"
+        assert (
+            plot_filepath.exists()
+        ), f"{plot_filepath} does not exist, files in folder: {[f.name for f in plot_filepath.parent.iterdir() if f.is_file()]}"


The inner loop overrides the parameterized plot_filepath and tests all files each time. Remove the loop and assert existence only on the passed-in plot_filepath.

Copilot · 2025-07-07T15:31:13Z

tests/autotuning_methodology/integration/test_report.py

    experiment_filepath = str(experiment_filepath_test)
    experiment = get_experiment(experiment_filepath)
-    strategies = get_strategies(experiment)
+    strategies = get_experimental_groups(experiment)  # TODO fix this test that used to use get_strategies


[nitpick] There's a stale TODO indicating this test needs fixing. Either implement the intended behavior or remove the comment to keep tests up to date.

Suggested change

strategies = get_experimental_groups(experiment) # TODO fix this test that used to use get_strategies

strategies = get_experimental_groups(experiment)

Copilot · 2025-07-07T15:31:14Z

src/autotuning_methodology/runner.py

    filename_results: str = f"{folder}../last_run/_tune_configuration-results.json",
    filename_metadata: str = f"{folder}../last_run/_tune_configuration-metadata.json",


The default path literal is missing a path separator before ... It should be f"{folder}/../last_run/..." to resolve the directory correctly.

Suggested change

filename_results: str = f"{folder}../last_run/_tune_configuration-results.json",

filename_metadata: str = f"{folder}../last_run/_tune_configuration-metadata.json",

filename_results: str = str(folder / "../last_run/_tune_configuration-results.json"),

filename_metadata: str = str(folder / "../last_run/_tune_configuration-metadata.json"),

…, much improving performance

…red)

…tion

…ipx is recommended

…roughout

…tuning_methodology into standardization

jhozzova and others added 30 commits August 20, 2024 11:56

Update the experiments setup json schema

a3e1a15

Update experiments.py to work with a new schema

594a87b

Update caching.py to work with a new schema

f4fdac7

Add KTT tuning to runner.py

f34a7f0

Update searchspace_statistics.py to work with a common output json sc…

aad790d

…hema

Update visualization scripts regarding new schema

77289b0

However, not working.

Fix version number for kernel_tuner in pyproject.toml

13a8ffc

Add example experiment files for KTT

a28a240

Merge pull request #2 from jhozzova/new_schema_integration

94c8193

Integration of the new schema and KTT support

Update

6810ede

Wrote conversion script to new experiments file

25397e8

Updated to new experiments schema

0ee3bb5

Ensure a usable error is given in the case of an incorrect experiment…

90c3790

…s file

Fixed an issue where cachefile patterns weren't recognized, improved …

e6edebc

…error handling

Several minor improvements

9e0ae03

Generated T1 input files now have absolute kernel paths

4b5f376

Improved warnings, fixed order of tunable parameters

05e2e50

Fixed an issue that led to groups being aliased instead of copied

30c82c3

Improved the paths figures are saved to

ac9ed40

Various minor changes enabling compatibility with the new experiments…

a8d6506

…, T1 and T4 formats

Silence warnings when tuning, added experiments file for comparing BO

64036bb

Implemented experiments file generation

92ad96f

Generate an experiments file using provided values and defaults

d1b4310

Will infer display name from strategy name if not specifed

761f2c9

Cutoff percentiles are properly read from new format

7b29cfd

Simplified validation of experiments file format

53b1e70

Implemented T4 schema validator

71f0cbb

Implemented conversion of time units based on T4 metadata

b9259a1

Time unit conversion supports lists

c647b87

Generalized several useful Searchspace statistics functions

69a98af

fjwillemsen added 5 commits July 3, 2025 19:45

Minor improvements based on tests

348e91c

Adjusted tests to recent changes

a0774d4

Adjusted head2head plot generation to be reselient to NaNs

47e11b4

Added head2head plot to tests

8229311

Minor improvements to head2head plots

90d471e

fjwillemsen requested a review from Copilot July 7, 2025 15:28

Copilot AI reviewed Jul 7, 2025

View reviewed changes

fjwillemsen added 17 commits July 18, 2025 15:49

Implemented quick and optional full validation of T4 files on loading…

f53955e

…, much improving performance

Improved to_valid_array performance

6081542

Updated benchmarkhub

c8a1f04

Improvements to docstrings

9312c4c

Remove the second color (orange) to avoid confusion with the fourth (…

f78802b

…red)

Implemented color index override for parent strategies

161f093

Implemented extensive comparison for hyperparameter tuning paper

b9aac41

Automatic compression and memory pressure reduction in results collec…

0feecd2

…tion

Implemented optional printing means of columns and rows of heatmaps

c81fd74

Experiment files for the upcoming LLaMEA paper with Niki

43067cf

Improved the color index resolution and assertions

5536c5a

Updated experiment files for upcoming paper

5543e8a

Updated dependency version

60e9e12

Removed publication-specific experiment files to other repository

b86d569

Add note to readme that installing in a virtual environment or with p…

30e6950

…ipx is recommended

Dropped python3.9 support, ensured python 3.12 and 3.13 are tested th…

f4edf3a

…roughout

Various improvements to linting

5e7b92c

fjwillemsen marked this pull request as ready for review September 3, 2025 10:31

fjwillemsen added 4 commits September 3, 2025 12:35

Merge remote-tracking branch 'origin/main' into standardization

d167a0d

Updated benchmark hub submodule

3b60e4f

Merge branch 'standardization' of https://github.com/fjwillemsen/auto…

e216cc7

…tuning_methodology into standardization

Updated Kernel Tuner dependency version

0a83604

fjwillemsen merged commit 5c6cf6c into main Sep 3, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Standardization, extension of visualizations, and general updates #3

Standardization, extension of visualizations, and general updates #3

Uh oh!

fjwillemsen commented Jul 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 7, 2025

Uh oh!

Copilot AI Jul 7, 2025

Uh oh!

Copilot AI Jul 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	strategies = get_experimental_groups(experiment) # TODO fix this test that used to use get_strategies
	strategies = get_experimental_groups(experiment)

		filename_results: str = f"{folder}../last_run/_tune_configuration-results.json",
		filename_metadata: str = f"{folder}../last_run/_tune_configuration-metadata.json",

Standardization, extension of visualizations, and general updates #3

Standardization, extension of visualizations, and general updates #3

Uh oh!

Conversation

fjwillemsen commented Jul 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants