Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,5 @@ keywords = ["deltalake", "delta", "datalake"]
license = "Apache-2.0"
repository = "https://github.com/delta-io/delta-kernel-rs"
readme = "README.md"
rust-version = "1.84"
rust-version = "1.85"
version = "0.16.0"
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Delta Kernel (rust)   [![build-status]][actions] [![latest-version]][crates.io] [![docs]][docs.rs] [![rustc-version-1.84+]][rustc]
# Delta Kernel (rust)   [![build-status]][actions] [![latest-version]][crates.io] [![docs]][docs.rs] [![rustc-version-1.85+]][rustc]

[build-status]: https://img.shields.io/github/actions/workflow/status/delta-io/delta-kernel-rs/build.yml?branch=main
[actions]: https://github.com/delta-io/delta-kernel-rs/actions/workflows/build.yml?query=branch%3Amain
[latest-version]: https://img.shields.io/crates/v/delta_kernel.svg
[crates.io]: https://crates.io/crates/delta\_kernel
[rustc-version-1.84+]: https://img.shields.io/badge/rustc-1.84+-lightgray.svg
[rustc]: https://blog.rust-lang.org/2025/01/09/Rust-1.84.0/
[rustc-version-1.85+]: https://img.shields.io/badge/rustc-1.85+-lightgray.svg
[rustc]: https://blog.rust-lang.org/2025/02/20/Rust-1.85.0/
[docs]: https://img.shields.io/docsrs/delta_kernel
[docs.rs]: https://docs.rs/delta_kernel/latest/delta_kernel/

Expand Down Expand Up @@ -85,8 +85,8 @@ arrow versions as we can.
We allow selecting the version of arrow to use via feature flags. Currently we support the following
flags:

- `arrow-55`: Use arrow version 55
- `arrow-56`: Use arrow version 56
- `arrow-57`: Use arrow version 57
- `arrow`: Use the latest arrow version. Note that this is an _unstable_ flag: we will bump this to
the latest arrow version at every arrow version release. Only removing old arrow versions will
cause a breaking change for kernel. If you require a specific version N of arrow, you should
Expand Down
2 changes: 1 addition & 1 deletion ffi/src/transaction/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ mod tests {
// writer must be closed to write footer
let res = writer.close().unwrap();

create_file_metadata(file_path, res.num_rows, metadata_schema)
create_file_metadata(file_path, res.file_metadata().num_rows(), metadata_schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to #1424 (comment) --

If these were just fields before, and now they're accessors (that didn't exist before), doesn't that mean this FFI code will fail to compile against arrow-56? The other example was hard-wired to arrow-57, but I thought FFI allowed either one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep correct this won't work for arrow-56 but FFI specifically takes a dependency on delta_kernel/arrow = latest version of arrow, so we directly switch to the 57 code here without issue

}

#[tokio::test]
Expand Down
24 changes: 12 additions & 12 deletions kernel/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,27 +66,27 @@ object_store = { version = "0.12.3", optional = true, features = ["aws", "azure"
# TODO: Remove this once https://github.com/apache/arrow-rs/pull/8244 ships
comfy-table = { version = "~7.1", optional = true }

# arrow 55
[dependencies.arrow_55]
# arrow 56
[dependencies.arrow_56]
package = "arrow"
version = "55"
version = "56"
features = ["chrono-tz", "ffi", "json", "prettyprint"]
optional = true
[dependencies.parquet_55]
[dependencies.parquet_56]
package = "parquet"
version = "55"
version = "56"
features = ["async", "object_store"]
optional = true

# arrow 56
[dependencies.arrow_56]
# arrow 57
[dependencies.arrow_57]
package = "arrow"
version = "56"
version = "57"
features = ["chrono-tz", "ffi", "json", "prettyprint"]
optional = true
[dependencies.parquet_56]
[dependencies.parquet_57]
package = "parquet"
version = "56"
version = "57"
features = ["async", "object_store"]
optional = true

Expand All @@ -99,11 +99,11 @@ internal-api = []
integration-test = ["hdfs-native-object-store/integration-test"]

# The default versions for arrow/parquet/object_store
arrow = ["arrow-56"] # latest arrow version
arrow = ["arrow-57"] # latest arrow version
need-arrow = [] # need-arrow is a marker that the feature needs arrow dep

arrow-55 = ["dep:arrow_55", "dep:parquet_55", "object_store", "comfy-table"]
arrow-56 = ["dep:arrow_56", "dep:parquet_56", "object_store", "comfy-table"]
arrow-57 = ["dep:arrow_57", "dep:parquet_57", "object_store", "comfy-table"]
arrow-conversion = ["need-arrow"]
arrow-expression = ["need-arrow"]

Expand Down
5 changes: 3 additions & 2 deletions kernel/examples/read-table-multi-threaded/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@ edition = "2021"
publish = false

[dependencies]
arrow = { version = "56", features = ["prettyprint", "chrono-tz"] }
arrow = { version = "57", features = ["prettyprint", "chrono-tz"] }
clap = { version = "4.5", features = ["derive"] }
# common pulls in arrow latest so we have to keep all these in sync here
common = { path = "../common" }
delta_kernel = { path = "../../../kernel", features = [
"arrow-56",
"arrow",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're intentionally pulling in whatever arrow the user configured, instead of forcing arrow-56 (which was latest)? Why do that, instead of forcing arrow-57 (which is now latest)?

(again below)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I see... we just pull generic "arrow" from kernel (with the indirection it implies), but take actual arrow-57 for ourselves. I think that will lead to incompatible arrow versions in practice, as we pass record batches from arrow-56 kernel to arrow-57 example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep but I was actually using that as a desired behavior: this way, we pull in arrow-57 here in cargo.toml (which we need to do since we enable those other feature flags) and then we just say arrow feature flag for kernel. as we upgrade kernel, this example will break and force us to e.g. bump to 58 whenever we default to arrow 58 in the future. this is instead of the old way which would require us to remember to go update the two numbers (which admittedly isn't that bad but this felt more explicit)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to confirm (been a while, I'm forgetting): arrow brings in latest arrow version, unless the older version is also specifically requested?

"default-engine-rustls",
"internal-api",
] }
Expand Down
4 changes: 2 additions & 2 deletions kernel/examples/read-table-single-threaded/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ edition = "2021"
publish = false

[dependencies]
arrow = { version = "56", features = ["prettyprint", "chrono-tz"] }
arrow = { version = "57", features = ["prettyprint", "chrono-tz"] }
clap = { version = "4.5", features = ["derive"] }
common = { path = "../common" }
delta_kernel = { path = "../../../kernel", features = [
"arrow-56",
"arrow",
"default-engine-rustls",
"internal-api",
] }
Expand Down
5 changes: 3 additions & 2 deletions kernel/examples/write-table/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@ edition = "2021"
publish = false

[dependencies]
arrow = { version = "56", features = ["prettyprint", "chrono-tz"] }
arrow = { version = "57", features = ["prettyprint", "chrono-tz"] }
clap = { version = "4.5", features = ["derive"] }
# NB: common depends on 'arrow' (latest) so have to match here
common = { path = "../common" }
delta_kernel = { path = "../../../kernel", features = [
"arrow-56",
"arrow",
"default-engine-rustls",
"internal-api",
] }
Expand Down
20 changes: 10 additions & 10 deletions kernel/src/arrow_compat.rs
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
//! This module re-exports the different versions of arrow, parquet, and object_store we support.

#[cfg(feature = "arrow-56")]
#[cfg(feature = "arrow-57")]
mod arrow_compat_shims {
pub use arrow_56 as arrow;
pub use parquet_56 as parquet;
pub use arrow_57 as arrow;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably turn off these warnings if they are spurious

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is usually pretty darn accurate. IMO we should figure out why it emitted the warnings instead of just assuming they're spurious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm maybe they disappeared now? sorry i don't know what messages this comment was linked to haha

pub use parquet_57 as parquet;
}

#[cfg(all(feature = "arrow-55", not(feature = "arrow-56")))]
#[cfg(all(feature = "arrow-56", not(feature = "arrow-57")))]
mod arrow_compat_shims {
pub use arrow_55 as arrow;
pub use parquet_55 as parquet;
pub use arrow_56 as arrow;
pub use parquet_56 as parquet;
}

// if nothing is enabled but we need arrow because of some other feature flag, throw compile-time
// error
#[cfg(all(
feature = "need-arrow",
not(feature = "arrow-55"),
not(feature = "arrow-56")
not(feature = "arrow-56"),
not(feature = "arrow-57")
))]
compile_error!("Requested a feature that needs arrow without enabling arrow. Please enable the `arrow-55` or `arrow-56` feature");
compile_error!("Requested a feature that needs arrow without enabling arrow. Please enable the `arrow-56` or `arrow-57` feature");

#[cfg(any(feature = "arrow-55", feature = "arrow-56"))]
#[cfg(any(feature = "arrow-56", feature = "arrow-57"))]
pub use arrow_compat_shims::*;
9 changes: 4 additions & 5 deletions kernel/src/checkpoint/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ use crate::action_reconciliation::{
use crate::actions::{Add, Metadata, Protocol, Remove};
use crate::arrow::array::{ArrayRef, StructArray};
use crate::arrow::datatypes::{DataType, Schema};
use crate::arrow::{
array::{create_array, RecordBatch},
datatypes::Field,
};
use crate::checkpoint::create_last_checkpoint_data;
use crate::engine::arrow_data::ArrowEngineData;
use crate::engine::default::{executor::tokio::TokioBackgroundExecutor, DefaultEngine};
Expand All @@ -14,11 +18,6 @@ use crate::schema::{DataType as KernelDataType, StructField, StructType};
use crate::utils::test_utils::Action;
use crate::{DeltaResult, FileMeta, LogPath, Snapshot};

use arrow_56::{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, I wonder why it used to hard-wire the version like this?
(your change seems like the better way)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea and actually this may highlight a test gap? i don't see how this could have previously compiled with arrow 55..?

array::{create_array, RecordBatch},
datatypes::Field,
};

use object_store::{memory::InMemory, path::Path, ObjectStore};
use serde_json::{from_slice, json, Value};
use test_utils::delta_path_for_version;
Expand Down
2 changes: 1 addition & 1 deletion kernel/src/engine/ensure_data_types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ mod tests {
&incorrect_variant_arrow_type(),
true,
),
"Invalid argument error: Incorrect datatype. Expected Struct(metadata Binary, value Binary), got Struct(field_1 Binary, field_2 Binary)",
"Invalid argument error: Incorrect datatype. Expected Struct(\"metadata\": Binary, \"value\": Binary), got Struct(\"field_1\": nullable Binary, \"field_2\": nullable Binary)",
)
}

Expand Down
2 changes: 1 addition & 1 deletion kernel/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ pub use log_path::LogPath;
mod row_tracking;

mod arrow_compat;
#[cfg(any(feature = "arrow-55", feature = "arrow-56"))]
#[cfg(any(feature = "arrow-56", feature = "arrow-57"))]
pub use arrow_compat::*;

pub mod kernel_predicates;
Expand Down
1 change: 0 additions & 1 deletion mem-test/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ version.workspace = true
release = false

[dependencies]
arrow = "56"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should change to "57"? I think the dependency was originally there to ensure we build mem-test with the latest available arrow version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since kernel re-exports arrow I elected to remove this, and we just use the kernel arrow feature flag to pull in the latest arrow version kernel supports

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't kernel pull in arrow-56 OR arrow-57 depending on the flags it was configured with, tho?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the arrow feature flag in kernel means "latest arrow" so was using that flag to get the latest arrow that we desire here

delta_kernel = { path = "../kernel", features = ["arrow", "default-engine-rustls"] }
dhat = "0.3"
object_store = "0.12.3"
Expand Down
4 changes: 2 additions & 2 deletions mem-test/tests/dhat_large_table_data.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ fn write_large_parquet_to(path: &Path) -> Result<(), Box<dyn std::error::Error>>
let metadata = std::fs::metadata(&path)?;
let file_size = metadata.len();
let total_row_group_size: i64 = parquet_metadata
.row_groups
.row_groups()
.iter()
.map(|rg| rg.total_byte_size)
.map(|rg| rg.total_byte_size())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did these accessor methods already exist in arrow-56, and arrow-57 just took the corresponding fields private? Or do we need conditional compilation here? 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they existed - looks like they were just fields in parquet 56. But since this is in the separate (and internal) mem-test crate which selects its own version of arrow (57 now) this doesn't have to support both so I think we are good!

.sum();
println!("File size (compressed file size): {} bytes", file_size);
println!(
Expand Down
Loading