-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Introduce C FFI for iceberg rust #966
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has checked 261 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
221 | 1 | 39 | 0 |
Click to see the invalid file list
- bindings/c/.gitignore
Use this command to fix any missing license headers
```bash
docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
</details>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Xuanwo for this great pr, it looks like a good starting point!
|
||
#[repr(C)] | ||
#[derive(Debug, Copy, Clone)] | ||
pub enum PrimitiveLiteralTag { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worring about maintainability of this approach? Should we have things like:
pub extern "C" fn datatum_int(val: i32, datum_out: *mut *mut Datum) {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that we can't expose Datum
to C side directly.
Signed-off-by: Xuanwo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Xuanwo for this pr! LGTM as initial move toward c ffi, let's wait for a moment to have others to take a look.
@Xuanwo Just curious how would this interface work with iceberg-cpp? Will there be collaboration or duplication? |
Thanks for the question! I have to admit that there is some duplication for specific use cases, such as when users only need a C ABI, which both the iceberg-rust C binding and iceberg-cpp can provide—although neither of them is fully functional at the moment. However, I started this C binding with the intention of fostering collaboration:
|
I'd suggest notifying the dev list on this feature and open early discussion with iceberg-cpp folks for collaboration. |
bindings/c/.gitignore
Outdated
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
build/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a separate .gitignore
file needed instead of updating that under root directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep bindings/c
as a mostly separate project so that C developers can manage it as they see fit without impacting the entire project.
[package] | ||
name = "iceberg-c" | ||
publish = false | ||
edition = "2021" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this edition mean? why is it 2021
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, it's the rust edition, and 2021
is the latest one. The 2024
edition will be next, but it hasn't been released yet.
Signed-off-by: Xuanwo <[email protected]>
With regard to the possible integration with iceberg-cpp (or even replace it), I have some initial questions (might be dumb since I'm a newbie to the Rust world):
|
I'm also curious if there's any potential for linking issues if say, we use this (with OpenDAL enabled) and also OpenDAL's C++ bindings separately. (For another project, we have received reports of multiple Go-built shared objects being linked into one application causing problems on some platforms - presumably Rust, lacking a heavy language runtime, wouldn't cause such issues but it'd be good to think about). I'm also assuming (given Rust has no stable ABI) that this builds one shared object with all enabled features; given that maybe another thing to think about (down the line!) is integration with CMake etc. so consumers can choose which features to build and link more easily? |
For @wgtmac
It will be battery-included solution but can split into different modules so that users can use their own IO.
No limitation on mainstream platforms. (x86_64, ARM) × (Linux, macOS, Windows) are known to be supported. The complete list of supported platforms can be found at https://doc.rust-lang.org/nightly/rustc/platform-support.html.
cbindgen also supports generating C++ headers, but I haven't tried it yet. I'll take a look if needed. It's also possible to use
This FFI bridge operates at zero or negligible overhead, i.e. no copying, no serialization, no runtime checks needed. For @lidavidm
I believe they will work seamlessly since neither iceberg-rust nor opendal has runtime or global variables to share.
Yes, it is definitely possible. We have explored this in OpenDAL, where we can build it using only selected services or disable async support. Users can control these options through different CMake profiles. |
Good to know, thanks! I have run into As far as I am concerned, as long as it is possible to disable most/all of the IO then this would provide the same functionality I'm interested in from the C++ library. I'd like to parse snapshots/manifests (preferably with a C++-provided reader) but would want to do all the reading of data files myself (to leverage cuDF etc) |
Maybe we can prioritize on a C++ binding? |
|
||
# custom target for cargo build | ||
add_custom_target(cargo_build | ||
COMMAND sh -c "cargo build ${CARGO_BUILD_TYPE}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't quite seem to work for me. The following does:
COMMAND sh -c "cargo build ${CARGO_BUILD_TYPE}" | |
COMMAND cargo build ${CARGO_BUILD_TYPE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs install
commands so that the expectations of CMake users are met. Also, ideally it would generate CMake targets and possibly a pkg-config file as well.
set(CMAKE_CXX_STANDARD 14) | ||
set(CMAKE_CXX_STANDARD_REQUIRED ON) | ||
|
||
set(CARGO_DIST_DIR "${PROJECT_SOURCE_DIR}/target/debug") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, for me they end up under target/x86_64-unknown-linux-gnu/debug/
not target/debug
Ah, and the bindings are fairly minimal I see? I was going to try to experiment with them + cuDF but it seems more work is necessary. In any case it's neat that we can get a C/potentially C++ interface out of this! |
This PR introduces a C FFI for iceberg-rust, allowing users to interact with Iceberg from C.
This PR is a PoC demonstrating how this can be implemented and what it will look like. Most of the code is inspired by https://github.com/apache/opendal.
The
iceberg-c
project will be written in Rust but will expose a C ABI as.so
and.a
files. This design ensures that rust developers don't need to worry about the C part or handle C-related tasks. Everything related to the C API will be contained withinbindings/c
.We also use
cbindgen
to generate a header file based on our public API. This allows users to linkiceberg-c
just as they would with a C or C++ library.