Delta-kernel-rs is an experimental Delta implementation focused on interoperability with a wide range of query engines. It currently supports reads and (experimental) writes. Only blind appends are currently supported in the write path.
The Delta Kernel project is a Rust and C library for building Delta connectors that can read and write Delta tables without needing to understand the Delta protocol details. This is the Rust/C equivalent of Java Delta Kernel.
Delta-kernel-rs is split into a few different crates:
- kernel: The actual core kernel crate
- acceptance: Acceptance tests that validate correctness via the Delta Acceptance Tests
- derive-macros: A crate for our derive-macros to live in
- ffi: Functionality that enables delta-kernel-rs to be used from CorC++See the ffi directory for more information.
By default we build only the kernel and acceptance crates, which will also build derive-macros
as a dependency.
To get started, install Rust via rustup, clone the repository, and then run:
cargo test --all-featuresThis will build the kernel, run all unit tests, fetch the Delta Acceptance Tests data and run the acceptance tests against it.
In general, you will want to depend on delta-kernel-rs by adding it as a dependency to your
Cargo.toml, (that is, for rust projects using cargo) for other projects please see the FFI
module. The core kernel includes facilities for reading and writing delta tables, and allows the
consumer to implement their own Engine trait in order to build engine-specific implementations of
the various Engine APIs that the kernel relies on (e.g. implement an engine-specific
read_json_files() using the native engine JSON reader). If there is no need to implement the
consumer's own Engine trait, the kernel has a feature flag to enable a default, asynchronous
Engine implementation built with Arrow and Tokio.
# fewer dependencies, requires consumer to implement Engine trait.
# allows consumers to implement their own in-memory format
delta_kernel = "0.16.0"
# or turn on the default engine, based on arrow
delta_kernel = { version = "0.16.0", features = ["default-engine", "arrow-56"] }There are more feature flags in addition to the default-engine flag shown above. Relevant flags
include:
| Feature flag | Description | 
|---|---|
| default-engine | Turn on the 'default' engine: async, arrow-based Engineimplementation | 
| arrow-conversion | Conversion utilities for arrow/kernel schema interoperation | 
| arrow-expression | Expression system implementation for arrow | 
We intend to follow Semantic Versioning. However, in the 0.x line, the APIs
are still unstable. We therefore may break APIs within minor releases (that is, 0.1 -> 0.2), but
we will not break APIs in patch releases (0.1.0 -> 0.1.1).
If you enable the default-engine feature, you get an implementation of the Engine trait that
uses Arrow as its data format.
The arrow crate tends to release new major versions rather
frequently. To enable engines that already integrate arrow to also integrate kernel and not force
them to track a specific version of arrow that kernel depends on, we take as broad dependency on
arrow versions as we can.
We allow selecting the version of arrow to use via feature flags. Currently we support the following flags:
- arrow-55: Use arrow version 55
- arrow-56: Use arrow version 56
- arrow: Use the latest arrow version. Note that this is an unstable flag: we will bump this to the latest arrow version at every arrow version release. Only removing old arrow versions will cause a breaking change for kernel. If you require a specific version N of arrow, you should specify it directly with- arrow-N, e.g.- arrow-56.
Note that if more than one arrow-x feature is enabled, kernel will use the highest (latest)
specified flag. This also means that if you use --all-features you will get the latest version of
arrow that kernel supports.
You may also need to patch the object_store version used if the version of parquet you depend on
depends on a different version of object_store. This can be done by including object_store in
the patch list with the required version. You can find this out by checking the parquet docs.rs
page, switching to the version you want to use,
and then checking what version of object_store it depends on.
- API Docs
- architecture.md document describing the kernel architecture (currently wip)
There are some example programs showing how delta-kernel-rs can be used to interact with delta
tables. They live in the kernel/examples directory.
delta-kernel-rs is still under heavy development but follows conventions adopted by most Rust projects.
There are a few key concepts that will help in understanding kernel:
- The Enginetrait encapsulates all the functionality an engine or connector needs to provide to the Delta Kernel in order to read/write the Delta table.
- The DefaultEngineis our default implementation of the above trait. It lives inengine/default, and provides a reference implementation for allEnginefunctionality.DefaultEngineuses arrow as its in-memory data format.
- A Scanis the entrypoint for reading data from a table.
- A Transactionis the entrypoint for writing data to a table.
Some design principles which should be considered:
- async should live only in the Engineimplementation. The core kernel does not use async at all. We do not wish to impose the need for an entire async runtime on an engine or connector. TheDefaultEnginedoes use async quite heavily. It doesn't depend on a particular runtime however, and implementations could provide an "executor" based on tokio, smol, async-std, or whatever might be needed. Currently only atokiobased executor is provided.
- Prefer builder style APIs over object oriented ones.
- "Simple" set of default-features enabled to provide the basic functionality with the least necessary amount of dependencies possible. Putting more complex optimizations or APIs behind feature flags
- API conventions to make it clear which operations involve I/O, e.g. fetch or retrieve type verbiage in method signatures.
- When developing, rust-analyzeris your friend.rustup component add rust-analyzer
- If using emacs, both eglot and lsp-mode provide excellent integration withrust-analyzer. rustic is a nice mode as well.
- When also developing in VS Code it's sometimes convenient to configure rust-analyzer in
.vscode/settings.json.
{
  "editor.formatOnSave": true,
  "rust-analyzer.cargo.features": ["default-engine"]
}- The crate's documentation can be easily reviewed with: cargo docs --open
- Code coverage is available on codecov via cargo-llvm-cov. See their docs for instructions to install/run locally.