Skip to content

Datafusion v28 #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 5 additions & 11 deletions datafusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,18 @@ DataFusion is an extensible query execution framework, written in Rust, that use

We use parquet file here and create an external table for it; and then do the queries.

## Generate benchmark results

### To solve

q32 (line 33 in queries.sql) out of memory in my 32GB memory vm, it output null now since it's killed


### to generate benchmark results

The benchmark should be completed in under an hour. On-demand pricing is $0.6 per hour while spot pricing is only $0.2 per hour.
The benchmark should be completed in under an hour. On-demand pricing is $0.6 per hour while spot pricing is only $0.2 to $0.3 per hour (us-east-2).

1. manually start a AWS EC2 instance
- `c6a.4xlarge`
- Amazon Linux 2 AMI
- Root 500GB gp2 SSD
- no EBS optimized
- no instance store
1. wait for status check passed, then ssh to EC2
1. `sudo yum update -y` and `sudo yum install gcc -y`
1. wait for status check passed, then ssh to EC2 `ssh ec2-user@{ip}`
1. `sudo yum update -y` and `sudo yum install gcc git -y`
1. `git clone https://github.com/ClickHouse/ClickBench`
1. `cd ClickBench/datafusion`
1. `vi benchmark.sh` and modify following line to target Datafusion version
Expand All @@ -37,7 +31,7 @@ The benchmark should be completed in under an hour. On-demand pricing is $0.6 pe
3. `comparing binary with utf-8` and `group by binary` don't work in mac, if you run these quries in mac, you'll get some errors for quries contain binary format apache/arrow-datafusion#3050


### to generate full human readable results (for debugging)
## Generate full human readable results (for debugging)

1. install datafusion-cli
2. download the parquet ```wget --no-verbose --continue https://datasets.clickhouse.com/hits_compatible/hits.parquet```
Expand Down
92 changes: 46 additions & 46 deletions datafusion/results/single.json
Original file line number Diff line number Diff line change
@@ -1,58 +1,58 @@
{
"system": "DataFusion (single parquet)",
"date": "2023-04-11",
"system": "DataFusion (Parquet, single)",
"date": "2023-07-29",
"machine": "c6a.4xlarge, 500gb gp2",
"cluster_size": 1,
"comment": "v22.0.0 (34c9bce)",
"comment": "v28.0.0 (51b4392)",

"tags": ["Rust", "column-oriented", "embedded", "stateless"],

"load_time": 0,
"data_size": 14779976446,

"result": [
[2.646, 0.225, 0.226],
[0.099, 0.079, 0.079],
[0.173, 0.138, 0.139],
[0.349, 0.126, 0.125],
[2.790, 6.718, 2.717],
[1.911, 1.753, 1.770],
[0.090, 0.079, 0.076],
[0.106, 0.081, 0.081],
[3.032, 3.054, 3.675],
[3.634, 3.564, 3.647],
[0.508, 0.397, 0.402],
[0.529, 0.424, 0.423],
[2.100, 2.057, 2.040],
[4.415, 4.169, 3.707],
[2.198, 2.143, 2.080],
[4.043, 3.516, 5.126],
[5.065, 6.106, 8.249],
[4.524, 4.420, 4.347],
[10.981, 11.070, 11.203],
[0.348, 0.112, 0.116],
[9.968, 1.594, 1.619],
[11.178, 1.917, 1.900],
[22.120, 4.337, 4.352],
[56.098, 12.159, 12.128],
[2.581, 0.582, 0.583],
[0.756, 0.478, 0.475],
[2.556, 0.604, 0.570],
[9.592, 2.525, 2.479],
[9.067, 5.925, 5.927],
[0.638, 0.563, 0.585],
[3.278, 3.119, 2.940],
[7.710, 4.401, 4.398],
[null, null, null],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Notably, apache/datafusion#6904 and related PRs for faster grouping performance show 2-3x performance improvements

[12.566, 8.695, 8.954],
[12.827, 9.199, 11.148],
[3.623, 3.514, 3.526],
[0.506, 0.413, 0.406],
[0.245, 0.217, 0.214],
[0.249, 0.194, 0.194],
[0.892, 0.772, 0.774],
[0.175, 0.107, 0.106],
[0.116, 0.095, 0.093],
[0.142, 0.112, 0.117]
[2.641, 0.232, 0.216],
[0.092, 0.074, 0.073],
[0.159, 0.121, 0.117],
[0.351, 0.122, 0.120],
[1.142, 0.874, 0.869],
[1.399, 1.303, 1.329],
[0.096, 0.075, 0.076],
[0.094, 0.074, 0.076],
[1.516, 1.481, 1.475],
[2.711, 2.582, 2.583],
[0.429, 0.332, 0.327],
[0.558, 0.357, 0.360],
[1.374, 1.323, 1.336],
[3.559, 2.670, 2.733],
[1.519, 1.487, 1.477],
[1.041, 0.981, 0.988],
[3.248, 2.748, 2.801],
[3.139, 2.696, 2.688],
[7.014, 5.770, 5.775],
[0.271, 0.111, 0.109],
[9.975, 1.558, 1.578],
[11.163, 1.926, 1.881],
[22.053, 4.208, 4.194],
[56.007, 12.132, 12.113],
[2.559, 0.602, 0.586],
[0.737, 0.479, 0.489],
[2.548, 0.589, 0.589],
[9.550, 2.299, 2.286],
[9.076, 5.399, 5.388],
[0.579, 0.568, 0.584],
[2.214, 1.136, 1.136],
[5.734, 1.593, 1.598],
[8.357, 7.896, 8.053],
[11.556, 7.358, 7.369],
[12.051, 7.878, 7.956],
[1.866, 1.820, 1.807],
[0.448, 0.347, 0.358],
[0.231, 0.197, 0.195],
[0.248, 0.198, 0.188],
[0.843, 0.725, 0.722],
[0.138, 0.091, 0.095],
[0.115, 0.089, 0.088],
[0.124, 0.097, 0.092],
]
}
Loading