script to export benchmark information as Line Protocol format #14662

logan-keede · 2025-02-14T08:36:38Z

Which issue does this PR close?

Closes Export benchmark information as line protocol #6107

Rationale for this change

a step towards #5504

What changes are included in this PR?

addition of python script that converts some **benchmark**.json to lineformat

Are these changes tested?

on some files in result.zip

[nix-shell:~/dev/datafusion/benchmarks]$ python3 lineformat.py ~/Downloads/results/alamb_sort-merge-accounting/sort.json 
benchmark,name=sort,version=28.0.0,datafusion_version=28.0.0,num_cpus=8 query="sort utf8",iteration=0,row_count=10838832,elapsed_ms=85626006 1691105678000000000
benchmark,name=sort,version=28.0.0,datafusion_version=28.0.0,num_cpus=8 query="sort utf8",iteration=1,row_count=10838832,elapsed_ms=68694468 1691105678000000000
benchmark,name=sort,version=28.0.0,datafusion_version=28.0.0,num_cpus=8 query="sort utf8",iteration=2,row_count=10838832,elapsed_ms=63392883 1691105678000000000
benchmark,name=sort,version=28.0.0,datafusion_version=28.0.0,num_cpus=8 query="sort utf8",iteration=3,row_count=10838832,elapsed_ms=66388367 1691105678000000000

Are there any user-facing changes?

No

logan-keede · 2025-02-14T08:45:42Z

@alamb Check this out.

do we have some documentation on json format? that would be helpful if we want to show more arguments in tag set.

alamb · 2025-02-14T11:05:41Z

@alamb Check this out.

do we have some documentation on json format? that would be helpful if we want to show more arguments in tag set.

I believe the JSON is a serialized version of this structure:

datafusion/benchmarks/src/util/run.rs

Lines 47 to 59 in 68306ac

    
           pub struct RunContext { 
        
               /// Benchmark crate version 
        
               pub benchmark_version: String, 
        
               /// DataFusion crate version 
        
               pub datafusion_version: String, 
        
               /// Number of CPU cores 
        
               pub num_cpus: usize, 
        
               /// Start time 
        
               #[serde(serialize_with = "serialize_start_time")] 
        
               pub start_time: SystemTime, 
        
               /// CLI arguments 
        
               pub arguments: Vec<String>, 
        
           }

This looks amazing @logan-keede -- I'll check it out later today

alamb

Thank you @logan-keede -- this is a great. I tried it on some of my own data and it works well ❤️

I am thinking through to the next steps (of trying to gather historical performance measurements, but I think I just need to try and gather that information and see what additional data is needed

alamb · 2025-02-15T10:23:52Z

benchmarks/lineformat.py

+    compare_parser.add_argument(
+        "baseline_path",
+        type=Path,
+        help="Path to the baseline summary file.",


Given this is just a conversion, I don't think there is any idea of baseline? As in maybe this comment needs to be adjusted

remnants of compare.py 😅

alamb · 2025-02-15T10:26:31Z

benchmarks/lineformat.py

+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations


Would it be possible to add some comments here explaining what this script does along with the example you have in your PR description?

It would also be great to add some documentation to https://github.com/apache/datafusion/tree/main/benchmarks#comparing-performance-of-main-and-a-branch explaining how to use this script.

I have added some, check it out.

alamb · 2025-02-15T10:28:04Z

benchmarks/lineformat.py

@@ -0,0 +1,130 @@
+#!/usr/bin/env python


Perhaps we can rename this file from benchmarks/lineformat.py to benchmarks/lineprotocol.py as a way to have its name be a bit more self describing

alamb

Thank you again @logan-keede

logan-keede added 3 commits February 14, 2025 13:42

initial version

9d79c8a

small changes

4f760a1

getting rid of rich + using stdout instead of python print

5ed26d3

logan-keede changed the title ~~benchmark information as Lineformat~~ script to export benchmark information as Lineformat Feb 14, 2025

alamb changed the title ~~script to export benchmark information as Lineformat~~ script to export benchmark information as Line Protocol format Feb 14, 2025

alamb approved these changes Feb 15, 2025

View reviewed changes

tweaks

5ab53df

alamb added the performance Make DataFusion faster label Feb 16, 2025

alamb approved these changes Feb 16, 2025

View reviewed changes

alamb merged commit ba2b4c3 into apache:main Feb 16, 2025
24 checks passed

alamb mentioned this pull request Mar 7, 2025

Run DataFusion benchmarks regularly and track performance history over time #5504

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

script to export benchmark information as Line Protocol format #14662

script to export benchmark information as Line Protocol format #14662

Uh oh!

logan-keede commented Feb 14, 2025 •

edited

Loading

Uh oh!

logan-keede commented Feb 14, 2025

Uh oh!

alamb commented Feb 14, 2025

Uh oh!

alamb left a comment

Uh oh!

alamb Feb 15, 2025

Uh oh!

logan-keede Feb 15, 2025

Uh oh!

alamb Feb 15, 2025

Uh oh!

logan-keede Feb 15, 2025 •

edited

Loading

Uh oh!

alamb Feb 15, 2025

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

Uh oh!

script to export benchmark information as Line Protocol format #14662

script to export benchmark information as Line Protocol format #14662

Uh oh!

Conversation

logan-keede commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

logan-keede commented Feb 14, 2025

Uh oh!

alamb commented Feb 14, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

logan-keede Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

logan-keede Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

logan-keede commented Feb 14, 2025 •

edited

Loading

logan-keede Feb 15, 2025 •

edited

Loading