-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release DataFusion 45.0.0
#14008
Comments
@andygrove would you like to coordinate this release or would you like me to? (or does anyone else want to do so?) |
I also added some issues to the description above that I think would be worth fixing |
I don't have a preference. I will traveling around this time though, so perhaps it would make sense for someone else to be release manager for this one. |
I am happy to do it again for 45 if no one else would like the opportunity (see what I did there 😆 ) |
Thanks, alamb, I booked 46 in advance! |
Awesome -- I filed #14123 to track 46 |
I plan to start assembing the release candidate and test on the week of Jan 27 (in about 2 weeks time() |
As promised, Sail is working on porting relevant tests into DataFusion. A good starting point is a regression our tests caught in DataFusion 43, which still seems to persist in DataFusion 44. A regression was introduced in DataFusion 43.0.0 related to casting to UTF8 in various places. Upgrading to DataFusion 43.0.0 required adding explicit casting in several areas as a workaround. This PR (lakehq/sail#355) comments out those changes to expose the regression through the 12 additional failed tests compared to the main branch. Once I’ve pinpointed the root cause(s) of the regression, I’ll create an issue in DataFusion to track the work. I want to ensure the issue accurately reflects the problem before filing it. I’m happy to address these regressions and port over the tests that cover them in the same PR. Hopefully, we can get this resolved in time for the DataFusion 45 release! |
Thank you very much @shehabgamin 🙏 I strongly suspect this is related to switching to Utf8View by default in Parquet; You can validate this theory by disabling the following config setting: https://datafusion.apache.org/user-guide/configs.html
I think we are pretty close to closing out the Utf8View epic (now that we have upgraded to the latest arrow): I'll add that to the list for 45 too |
I plan to start preparing / testing / pushing this release the week of Jan 27, aiming to get an release candidate early the next week |
Thanks for the pointer @alamb! I tried setting I'll take a deeper look into the issue after the weekend. Hope you have a great rest of your weekend! |
Most of the regressions are related to this issue: #14230. I should be able to resolve them well before the While testing my local Sail code with the latest commit on DataFusion's main branch, I encountered several breaking changes that may make DataFusion 45 a jarring upgrade for some users. Given the previous discussion about wanting to make releases less jarring (#13334 (comment)), I wanted to bring this to your attention, @alamb. Aside from that, there is one remaining regression I haven't investigated yet, which seems to be related to Parquet. |
Thanks @shehabgamin -- Can you enumerate these changes (or point me at a PR) so we can see if there is some way to make jarring |
Yeah I'll work on that right now! |
My apologies @alamb, the DataFusion upgrade from the latest main branch commit is smoother than I initially thought. After investigating the flood of errors, I discovered that many were resolved by simply updating Sail's PyO3 DataFusion If you'd like to see these changes, they're in my PR that's testing the regression fixes: lakehq/sail#355 |
To replace |
Some people currently use |
I see. |
I just upgraded my project to latest main from DF 42. The primary compilation and test suite issues I encountered after setting utf8view, i32 comparison no longer worked
switching to the obvious change below worked
i32 -> utf8view coercion in regexp_like udf stopped working. For example: Auto coercion of date/timestamp to utf8 no longer worked. I had to update my code to use I'll likely do a full data run with this early next week. |
Thanks @Omega359 The delta-rs upgrade seems to have gone pretty smootly: delta-io/delta-rs#3175 In terms of releasing DataFusion 45 I think it is better than 44 so I was planning to make the RC tomorrow, but I can wait a day or two if we want to make a push to finish up a few of those issues
I believe this is fixed in the following PR (I am waiting on another committer to approve and if they do I can backport it)
I believe this is tracked by But is waiting on the next arrow release (which I just need to scrounge up one more PMC vote to approve and release). It might "just work" after that |
hey @alamb quick question about the release process, are the release for datafusion and datafusion-python in locksteps? I see the next release for datafusion is 45.0.0 meanwhile datafusion-python's main branch is currently on 43.0.0 is the plan to release datafusion 45.0.0 and then upgrade datafusion-python to 45.0.0 too and take on the new datafusion library version? |
Derived TPC-DS Query 66 (https://github.com/lakehq/sail/blob/main/python/pysail/data/tpcds/queries/q66.sql) fails on Sail, when it previously did not. I'm out for the day, but will look more into this tonight. |
Error is:
@alamb I created another PR that's dedicated only to the DF 45 upgrade. Could you please update the issue with this PR: lakehq/sail#365 Additionally, I created a tracking issue: #14408 |
Another regression is that implementing the
Digging into the code, I see that it's deprecated (it should still work even though it's deprecated). What's strange, however, is that the deprecation warning is not propagating to me as a downstream user. I only found this out due to a failed test. |
Hi @kevinjqliu -- I am not sure what the plan for datafusion-python is The main branch seems to be on 44 to me 🤔 @timsaucer updated it a few weeks ago Any chance you can make a PR to test upgrading datafusion-python to 45? Update: filed ticket to track: |
Ok, given the feedback what I think we should do is:
Thanks @shehabgamin for the testing and reports. I'll take a more careful look tomorrow |
I found that we can't make this deprecated but a breaking change if the user override the |
I am currently working on building the 44 release. I hope to get that done this morning or tomorrow. It's generally run about a month behind this upstream repo |
In case anyone is interested, here is the release thread: |
Update on the release:
|
Ok, I think we are ready with a release candidate: tag here: https://github.com/apache/datafusion/releases/tag/45.0.0-rc1 Release voting thread: https://lists.apache.org/thread/g20ywc9yto8xp07lcllmvgyn8g5z4420 Content This release candidate is based on commit: 26058ac 1 The standard verification procedure is documented at https://github.com/apache/datafusion/blob/main/dev/release/README.md#verifying-release-candidates. |
@alamb I'll test on Sail again sometime today! |
I updated #14408 (comment), but same issues as before with: |
I think we came to a temporary solution here #14268 (comment) |
I don't think I'll have the bandwidth to test a type coercion fix for UDF's myself this week to be honest. I'm about to fire off a full run of my application against the 45 branch but I likely can only do that once this week. |
Beyond the type coercion issues that I can work around my testing is working ... as long as I don't compile with 'release' target. That seems to segfault I think somewhere in DF code but I haven't yet been able to get a core dump from the crashing nodes to investigate further. I thought it was Rust 1.84.1 dependent but I retested with 1.83.0 and had the same issue. I don't think it's a blocker but is something I'm going to continue to try and narrow down. |
@alamb -- release candidate re-tested on InfluxData, and is good. |
Is your feature request related to a problem or challenge?
Tracking ticket for next release, also a place to track desired inclusions
Last release was https://crates.io/crates/datafusion/440.0 December 31, 2024 so next major release would be around Feb 1, 2025
Steps:
45.0.0
release: Version and Changelog #1439746.0.0
#14123Pre-relese testing
54.0.0
delta-io/delta-rs#3175Prior release tickets:
44.0.0
#13334Please let me know if you would like to add any items on this list or move the categorization
Items to fix before release
54.0.0
#14114DataFrame::schema
returns incorrect schema for NATURAL JOIN #14058Invalid comparison operation: Utf8 == Utf8View
error during LEFT ANTI JOIN #13510encode(..., "hex")
errors on non-UTF-8 binaries since Datafusion v43 #14055Items maybe to complete (not sure if they are blockers)
CREATE TABLE AS SELECT
... insertingVALUES
#13124EnforceDistribution
generates invalid plan #14150Nice to Have (but non blockers -- e.g. bugs but not regressions)
UNION
andORDER BY
queries #13748median
by implementing specialGroupsAccumulator
#13681FULL OUTER JOIN
andLIMIT
produces wrong results #14335The text was updated successfully, but these errors were encountered: