-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] ClickBench Improvements (Vanity Benchmark) #14586
Comments
I took a brief look at some results ![]() Q24 and Q26 I think this is Q24:
Both have "ORDER BY to_timestamp_seconds("EventTime")` as a part of the query |
Here is Q24:
Here is 26:
Both have "ORDER BY to_timestamp_seconds("EventTime")` as a part of the query |
A low hanging fruit #13617, i plan to finish it in this week. And maybe it is time to push #11943 forward... I am trying a poc about support It is really horrible if we need to implement |
If the performance gains are worth it I can potentially help organize a larger refactoring effort too (to incrementally port over the code). We are in much better shape test-wise now. If you have a good approach I'll find time to help coordinate |
I will find a query to measurement the performance in old implementation in #11943 and in the new implementation. And if approach about supporting this only by |
On optimizer side, I am not sure if |
Is your feature request related to a problem or challenge?
The ClickBench Benchmark measures the performance of filtering and aggregation
Being on top of ClickBench is somewhat of a vanity benchmark, as in my opinion I think all the engines within a factor of 2 of likely have similar user experiences (and the exact speed will depends on real user queries, etc)
That being said, the engine at the top of the benchmark is certainly good for publicity and DataFusion has used it as (see see our blog here Apache DataFusion is now the fastest single node engine for querying Apache Parquet files)
So this ticket tracks improving the ClickBench peformance even more
Recently, as @Dandandan has pointed out on #14246 (comment), DuckDB slipped past us in the most recent results
Describe the solution you'd like
Get DataFusion back on top
Describe alternatives you've considered
While we could clearly implement ClickBench specific optimizations, I don't think that is really a valuable exercise for users. I would very much like to focus our efforts on actually useful optimization
Some ideas of real improvements:
What I would like is of people profile queries and try and find ways to improve the queries
Additional context
See related discussions on
45.0.0
(When Published) #1424645.0.0
#14008The text was updated successfully, but these errors were encountered: