-
Notifications
You must be signed in to change notification settings - Fork 119
feat!: arrow 57, MSRV 1.85+ #1424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,11 +5,12 @@ edition = "2021" | |
| publish = false | ||
|
|
||
| [dependencies] | ||
| arrow = { version = "56", features = ["prettyprint", "chrono-tz"] } | ||
| arrow = { version = "57", features = ["prettyprint", "chrono-tz"] } | ||
| clap = { version = "4.5", features = ["derive"] } | ||
| # common pulls in arrow latest so we have to keep all these in sync here | ||
| common = { path = "../common" } | ||
| delta_kernel = { path = "../../../kernel", features = [ | ||
| "arrow-56", | ||
| "arrow", | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're intentionally pulling in whatever arrow the user configured, instead of forcing arrow-56 (which was latest)? Why do that, instead of forcing arrow-57 (which is now latest)? (again below) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update: I see... we just pull generic "arrow" from kernel (with the indirection it implies), but take actual arrow-57 for ourselves. I think that will lead to incompatible arrow versions in practice, as we pass record batches from arrow-56 kernel to arrow-57 example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep but I was actually using that as a desired behavior: this way, we pull in arrow-57 here in cargo.toml (which we need to do since we enable those other feature flags) and then we just say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So just to confirm (been a while, I'm forgetting): |
||
| "default-engine-rustls", | ||
| "internal-api", | ||
| ] } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,25 +1,25 @@ | ||
| //! This module re-exports the different versions of arrow, parquet, and object_store we support. | ||
|
|
||
| #[cfg(feature = "arrow-56")] | ||
| #[cfg(feature = "arrow-57")] | ||
| mod arrow_compat_shims { | ||
| pub use arrow_56 as arrow; | ||
| pub use parquet_56 as parquet; | ||
| pub use arrow_57 as arrow; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should probably turn off these warnings if they are spurious There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The compiler is usually pretty darn accurate. IMO we should figure out why it emitted the warnings instead of just assuming they're spurious. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hm maybe they disappeared now? sorry i don't know what messages this comment was linked to haha |
||
| pub use parquet_57 as parquet; | ||
| } | ||
|
|
||
| #[cfg(all(feature = "arrow-55", not(feature = "arrow-56")))] | ||
| #[cfg(all(feature = "arrow-56", not(feature = "arrow-57")))] | ||
| mod arrow_compat_shims { | ||
| pub use arrow_55 as arrow; | ||
| pub use parquet_55 as parquet; | ||
| pub use arrow_56 as arrow; | ||
| pub use parquet_56 as parquet; | ||
| } | ||
|
|
||
| // if nothing is enabled but we need arrow because of some other feature flag, throw compile-time | ||
| // error | ||
| #[cfg(all( | ||
| feature = "need-arrow", | ||
| not(feature = "arrow-55"), | ||
| not(feature = "arrow-56") | ||
| not(feature = "arrow-56"), | ||
| not(feature = "arrow-57") | ||
| ))] | ||
| compile_error!("Requested a feature that needs arrow without enabling arrow. Please enable the `arrow-55` or `arrow-56` feature"); | ||
| compile_error!("Requested a feature that needs arrow without enabling arrow. Please enable the `arrow-56` or `arrow-57` feature"); | ||
|
|
||
| #[cfg(any(feature = "arrow-55", feature = "arrow-56"))] | ||
| #[cfg(any(feature = "arrow-56", feature = "arrow-57"))] | ||
| pub use arrow_compat_shims::*; | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,6 +6,10 @@ use crate::action_reconciliation::{ | |
| use crate::actions::{Add, Metadata, Protocol, Remove}; | ||
| use crate::arrow::array::{ArrayRef, StructArray}; | ||
| use crate::arrow::datatypes::{DataType, Schema}; | ||
| use crate::arrow::{ | ||
| array::{create_array, RecordBatch}, | ||
| datatypes::Field, | ||
| }; | ||
| use crate::checkpoint::create_last_checkpoint_data; | ||
| use crate::engine::arrow_data::ArrowEngineData; | ||
| use crate::engine::default::{executor::tokio::TokioBackgroundExecutor, DefaultEngine}; | ||
|
|
@@ -14,11 +18,6 @@ use crate::schema::{DataType as KernelDataType, StructField, StructType}; | |
| use crate::utils::test_utils::Action; | ||
| use crate::{DeltaResult, FileMeta, LogPath, Snapshot}; | ||
|
|
||
| use arrow_56::{ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. huh, I wonder why it used to hard-wire the version like this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea and actually this may highlight a test gap? i don't see how this could have previously compiled with arrow 55..? |
||
| array::{create_array, RecordBatch}, | ||
| datatypes::Field, | ||
| }; | ||
|
|
||
| use object_store::{memory::InMemory, path::Path, ObjectStore}; | ||
| use serde_json::{from_slice, json, Value}; | ||
| use test_utils::delta_path_for_version; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,7 +14,6 @@ version.workspace = true | |
| release = false | ||
|
|
||
| [dependencies] | ||
| arrow = "56" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should change to "57"? I think the dependency was originally there to ensure we build mem-test with the latest available arrow version? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since kernel re-exports arrow I elected to remove this, and we just use the kernel There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't kernel pull in arrow-56 OR arrow-57 depending on the flags it was configured with, tho? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the |
||
| delta_kernel = { path = "../kernel", features = ["arrow", "default-engine-rustls"] } | ||
| dhat = "0.3" | ||
| object_store = "0.12.3" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -45,9 +45,9 @@ fn write_large_parquet_to(path: &Path) -> Result<(), Box<dyn std::error::Error>> | |
| let metadata = std::fs::metadata(&path)?; | ||
| let file_size = metadata.len(); | ||
| let total_row_group_size: i64 = parquet_metadata | ||
| .row_groups | ||
| .row_groups() | ||
| .iter() | ||
| .map(|rg| rg.total_byte_size) | ||
| .map(|rg| rg.total_byte_size()) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did these accessor methods already exist in arrow-56, and arrow-57 just took the corresponding fields private? Or do we need conditional compilation here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think they existed - looks like they were just fields in parquet 56. But since this is in the separate (and internal) mem-test crate which selects its own version of arrow (57 now) this doesn't have to support both so I think we are good! |
||
| .sum(); | ||
| println!("File size (compressed file size): {} bytes", file_size); | ||
| println!( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to #1424 (comment) --
If these were just fields before, and now they're accessors (that didn't exist before), doesn't that mean this FFI code will fail to compile against arrow-56? The other example was hard-wired to arrow-57, but I thought FFI allowed either one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep correct this won't work for arrow-56 but FFI specifically takes a dependency on delta_kernel/arrow = latest version of arrow, so we directly switch to the 57 code here without issue