Skip to content

[DISCUSSION] DataFusion Road Map: Q3-Q4 2025 #15878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alamb opened this issue Apr 28, 2025 · 12 comments
Open

[DISCUSSION] DataFusion Road Map: Q3-Q4 2025 #15878

alamb opened this issue Apr 28, 2025 · 12 comments
Labels
discussion Project Discussion

Comments

@alamb
Copy link
Contributor

alamb commented Apr 28, 2025

I suggest we use this ticket to discuss and coordinate the "roadmap" for DataFusion over the next 3-6 months

DataFusion is a community project, and unlike many other open source projects there is no "DataFusion Company" that pays contributors and determines their priority. The project's evolution and priorities is determined by those community members able and willing to devote their time and energy to driving the project forward

In the past it has been helpful to have a public discussion about features in order to:

  1. Find others interested in features you may be thinking of working on
  2. See who is actively looking for certain features / capabilities

So please feel free to post comments about:

  1. Features you plan to work on
  2. Features you would like others to work on
  3. Features you are willing / able / planning to help others with

See previous discussions:

@alamb alamb added the discussion Project Discussion label Apr 28, 2025
@alamb alamb pinned this issue Apr 28, 2025
@alamb
Copy link
Contributor Author

alamb commented Apr 28, 2025

My plans:

🚀 performance (dynamic filter pushdown)

Variant / semi structured data support

I plan to help @PinkCrow007, and others work on variant. This likely requires doing so in arrow-rs / parquet and then moving on to features in datafusion like expression pushdown and user defined types

I also plan to continue releases, help review code for important projects, blog writing, etc

@Rachelint
Copy link
Contributor

Rachelint commented Apr 28, 2025

My list (still mainly about aggregation performance, it has falled much behind duckdb on clickbench...):

@alamb
Copy link
Contributor Author

alamb commented Apr 28, 2025

My list (still mainly about aggregation performance, it has falled much behind duckdb on clickbench...):

Welcome back!

@jonathanc-n
Copy link
Contributor

I would like to help push forward the CMU effort for variant types.

@skyzh
Copy link

skyzh commented Apr 28, 2025

I think #14595 is in a decent shape and could be merged :) Though it cannot unnest all queries and might produce wrong result for some lateral joins, that would be a good starting point for us to slowly add lateral join and full unnesting support to the codebase.

@alamb
Copy link
Contributor Author

alamb commented Apr 28, 2025

@alamb
Copy link
Contributor Author

alamb commented Apr 28, 2025

I think #14595 is in a decent shape and could be merged :) Though it cannot unnest all queries and might produce wrong result for some lateral joins, that would be a good starting point for us to slowly add lateral join and full unnesting support to the codebase.

@skyzh -- I think the conversation on #5492 might be relevant to you as well

@comphead
Copy link
Contributor

comphead commented Apr 28, 2025

@xudong963
Copy link
Member

My lists:

@Omega359
Copy link
Contributor

Omega359 commented Apr 30, 2025

My list includes:

@Dandandan
Copy link
Contributor

I am currently interested in the following subjects where I'll probably experiment with some things or help out others.

  • Window Functions (profiling, implementing improvements / optimizations)
  1. ** Vectorize window functions #15607
  2. Possible (logical plan) optimizations - limit pushdown?
  • Sorting performance ideas / helping out, examples:
  1. Reuse Rows allocation in SortPreservingMergeStream / RowCursorStream #15720
  2. Perf: Optimize in memory sort #15380 (review)
  • Streamlining arrow-rs kernels in terms of speed, consistency, reduce use of unsafe, etc.

@niebayes
Copy link
Contributor

niebayes commented May 7, 2025

Do we have blogs about these epics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Project Discussion
Projects
None yet
Development

No branches or pull requests

9 participants