Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

script to export benchmark information as Line Protocol format #14662

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

logan-keede
Copy link
Contributor

@logan-keede logan-keede commented Feb 14, 2025

Which issue does this PR close?

Rationale for this change

a step towards #5504

What changes are included in this PR?

addition of python script that converts some **benchmark**.json to lineformat

Are these changes tested?

on some files in result.zip

[nix-shell:~/dev/datafusion/benchmarks]$ python3 lineformat.py ~/Downloads/results/alamb_sort-merge-accounting/sort.json 
benchmark,name=sort,version=28.0.0,datafusion_version=28.0.0,num_cpus=8 query="sort utf8",iteration=0,row_count=10838832,elapsed_ms=85626006 1691105678000000000
benchmark,name=sort,version=28.0.0,datafusion_version=28.0.0,num_cpus=8 query="sort utf8",iteration=1,row_count=10838832,elapsed_ms=68694468 1691105678000000000
benchmark,name=sort,version=28.0.0,datafusion_version=28.0.0,num_cpus=8 query="sort utf8",iteration=2,row_count=10838832,elapsed_ms=63392883 1691105678000000000
benchmark,name=sort,version=28.0.0,datafusion_version=28.0.0,num_cpus=8 query="sort utf8",iteration=3,row_count=10838832,elapsed_ms=66388367 1691105678000000000

Are there any user-facing changes?

No

@logan-keede logan-keede changed the title benchmark information as Lineformat script to export benchmark information as Lineformat Feb 14, 2025
@logan-keede
Copy link
Contributor Author

@alamb Check this out.

do we have some documentation on json format? that would be helpful if we want to show more arguments in tag set.

@alamb
Copy link
Contributor

alamb commented Feb 14, 2025

@alamb Check this out.

do we have some documentation on json format? that would be helpful if we want to show more arguments in tag set.

I believe the JSON is a serialized version of this structure:

pub struct RunContext {
/// Benchmark crate version
pub benchmark_version: String,
/// DataFusion crate version
pub datafusion_version: String,
/// Number of CPU cores
pub num_cpus: usize,
/// Start time
#[serde(serialize_with = "serialize_start_time")]
pub start_time: SystemTime,
/// CLI arguments
pub arguments: Vec<String>,
}

This looks amazing @logan-keede -- I'll check it out later today

@alamb alamb changed the title script to export benchmark information as Lineformat script to export benchmark information as Line Protocol format Feb 14, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @logan-keede -- this is a great. I tried it on some of my own data and it works well ❤️

I am thinking through to the next steps (of trying to gather historical performance measurements, but I think I just need to try and gather that information and see what additional data is needed

compare_parser.add_argument(
"baseline_path",
type=Path,
help="Path to the baseline summary file.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is just a conversion, I don't think there is any idea of baseline? As in maybe this comment needs to be adjusted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remnants of compare.py 😅

# specific language governing permissions and limitations
# under the License.

from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add some comments here explaining what this script does along with the example you have in your PR description?

It would also be great to add some documentation to https://github.com/apache/datafusion/tree/main/benchmarks#comparing-performance-of-main-and-a-branch explaining how to use this script.

Copy link
Contributor Author

@logan-keede logan-keede Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some, check it out.

@@ -0,0 +1,130 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can rename this file from benchmarks/lineformat.py to benchmarks/lineprotocol.py as a way to have its name be a bit more self describing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export benchmark information as line protocol
2 participants