-
Notifications
You must be signed in to change notification settings - Fork 1
Add sparrow pycapsule #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From sparrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From sparrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From sparrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From sparrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From sparrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From sparrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the sparrow-pycapsule library, which provides C++ utilities for interacting with Python's C API capsule system to export and import Apache Arrow arrays. The library enables seamless data exchange between C++ (using the Sparrow library) and Python through the Arrow C Data Interface.
- Implements bidirectional conversion functions between Sparrow arrays and Python capsules
- Adds comprehensive test suite covering export, import, and round-trip operations
- Sets up complete build infrastructure with CMake, CI workflows, and development tooling
Reviewed Changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
include/sparrow-pycapsule/pycapsule.hpp |
Defines public API for capsule export/import operations |
src/pycapsule.cpp |
Implements capsule conversion functions and destructors |
test/test_pycapsule.cpp |
Comprehensive test suite for all pycapsule functionality |
CMakeLists.txt |
Main build configuration defining the sparrow-pycapsule library target |
test/CMakeLists.txt |
Test build configuration and targets |
cmake/external_dependencies.cmake |
Dependency management for Sparrow, doctest, and Python |
cmake/compile_options.cmake |
Compiler flags and warnings configuration |
.github/workflows/*.yml |
CI workflows for Linux, macOS, and Windows builds |
environment-dev.yml |
Conda environment specification for development |
sparrow-pycapsuleConfig.cmake.in |
CMake package configuration template |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| - name: Run tests | ||
| working-directory: build | ||
| run: cmake --build . --target run_tests_with_junit_report |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target run_tests_with_junit_report is referenced in the workflow but is never defined in the test CMakeLists.txt. The actual test target defined is run_pycapsule_tests (line 47 in test/CMakeLists.txt). This will cause the workflow to fail when trying to run tests.
| if(LIBCPP) | ||
| message(STATUS "Using libc++") | ||
|
|
||
| # Allow the use of not visible yet availabile features, such |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: "availabile" should be "available".
| # Allow the use of not visible yet availabile features, such | |
| # Allow the use of not visible yet available features, such |
| endif() | ||
|
|
||
| if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 11.3) | ||
| set(compile_options ${compile_optoins} PRIVATE "-Wno-error=shift-negative-value") |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: compile_optoins should be compile_options. This will cause the variable to not be set correctly.
| set(compile_options ${compile_optoins} PRIVATE "-Wno-error=shift-negative-value") | |
| set(compile_options ${compile_options} PRIVATE "-Wno-error=shift-negative-value") |
|
|
||
| - name: Run tests | ||
| working-directory: build | ||
| run: cmake --build . --target run_tests_with_junit_report |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target run_tests_with_junit_report is referenced in the workflow but is never defined in the test CMakeLists.txt. The actual test target defined is run_pycapsule_tests (line 47 in test/CMakeLists.txt). This will cause the workflow to fail when trying to run tests.
|
|
||
| - name: Run tests | ||
| working-directory: build | ||
| run: cmake --build . --target run_tests_with_junit_report |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target run_tests_with_junit_report is referenced in the workflow but is never defined in the test CMakeLists.txt. The actual test target defined is run_pycapsule_tests (line 47 in test/CMakeLists.txt). This will cause the workflow to fail when trying to run tests.
Co-authored-by: Copilot <[email protected]>
…arrow-pycapsule into add_initiale_code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| message(STATUS "Using C++ compiler launcher: ${CMAKE_CXX_COMPILER_LAUNCHER}") | ||
| endif() | ||
|
|
||
| # Versionning |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Versionning should be Versioning (one 'n').
| message(STATUS "Building sparrow v${CMAKE_PROJECT_VERSION}") | ||
|
|
||
| # Binary version | ||
| # See the following URL for explanations about the binary versionning |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: versionning should be versioning (one 'n').
| # See the following URL for explanations about the binary versionning | |
| # See the following URL for explanations about the binary versioning |
| PyObject* schema_capsule = PyCapsule_New(schema_ptr, arrow_schema_str.data(), ReleaseArrowSchemaPyCapsule); | ||
| PyObject* array_capsule = PyCapsule_New(array_ptr, arrow_array_str.data(), ReleaseArrowArrayPyCapsule); |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Potential resource leak if PyCapsule_New fails. If the first PyCapsule_New call (line 104) fails and returns nullptr, the allocated schema_ptr would leak. Similarly, if the second PyCapsule_New call (line 105) fails, both schema_ptr and array_ptr could leak.
Consider checking for nullptr after each PyCapsule_New call and cleaning up allocated resources before returning on failure, or using RAII wrappers to ensure cleanup.
| PyObject* schema_capsule = PyCapsule_New(schema_ptr, arrow_schema_str.data(), ReleaseArrowSchemaPyCapsule); | |
| PyObject* array_capsule = PyCapsule_New(array_ptr, arrow_array_str.data(), ReleaseArrowArrayPyCapsule); | |
| PyObject* schema_capsule = PyCapsule_New(schema_ptr, arrow_schema_str.data(), ReleaseArrowSchemaPyCapsule); | |
| if (!schema_capsule) { | |
| delete schema_ptr; | |
| delete array_ptr; | |
| return {nullptr, nullptr}; | |
| } | |
| PyObject* array_capsule = PyCapsule_New(array_ptr, arrow_array_str.data(), ReleaseArrowArrayPyCapsule); | |
| if (!array_capsule) { | |
| delete schema_ptr; | |
| delete array_ptr; | |
| Py_DECREF(schema_capsule); | |
| return {nullptr, nullptr}; | |
| } |
| ArrowArray* arrow_array_ptr = new ArrowArray(); | ||
| *arrow_array_ptr = extract_arrow_array(std::move(arr)); | ||
|
|
||
| return PyCapsule_New(arrow_array_ptr, arrow_array_str.data(), ReleaseArrowArrayPyCapsule); |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Potential resource leak if PyCapsule_New fails. If PyCapsule_New returns nullptr (e.g., due to memory allocation failure), the allocated arrow_array_ptr would leak. Consider checking for nullptr and cleaning up the allocated array before returning, or document that callers must ensure Python is properly initialized.
| PyObject* schema_capsule = ExportArrowSchemaPyCapsule(arr); | ||
| PyObject* array_capsule = ExportArrowArrayPyCapsule(arr); |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Using a moved-from object. ExportArrowSchemaPyCapsule(arr) on line 443 moves from arr, making it invalid. The subsequent call to ExportArrowArrayPyCapsule(arr) on line 444 is then operating on a moved-from object, which leads to undefined behavior.
The correct approach is to use export_array_to_capsules(arr) which exports both capsules from a single array object, as shown in other test cases.
| PyObject* schema_capsule = ExportArrowSchemaPyCapsule(arr); | |
| PyObject* array_capsule = ExportArrowArrayPyCapsule(arr); | |
| auto [schema_capsule, array_capsule] = export_array_to_capsules(arr); |
| SUBCASE("destructor_handles_already_released_schema") | ||
| { | ||
| // Create a schema with null release callback | ||
| ArrowSchema* schema = new ArrowSchema(); |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Uninitialized memory. The ArrowSchema is allocated but not fully initialized. While release is set to nullptr, other fields of the ArrowSchema structure remain uninitialized. The destructor ReleaseArrowSchemaPyCapsule may attempt to read these fields. Use new ArrowSchema{} to zero-initialize all fields or explicitly initialize the structure.
| ArrowSchema* schema = new ArrowSchema(); | |
| ArrowSchema* schema = new ArrowSchema{}; |
| SUBCASE("destructor_handles_already_released_array") | ||
| { | ||
| // Create an array with null release callback | ||
| ArrowArray* array = new ArrowArray(); |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Uninitialized memory. The ArrowArray is allocated but not fully initialized. While release is set to nullptr, other fields of the ArrowArray structure remain uninitialized. The destructor ReleaseArrowArrayPyCapsule may attempt to read these fields. Use new ArrowArray{} to zero-initialize all fields or explicitly initialize the structure.
| ArrowArray* array = new ArrowArray(); | |
| ArrowArray* array = new ArrowArray{}; |
| ArrowSchema* arrow_schema_ptr = new ArrowSchema(); | ||
| *arrow_schema_ptr = extract_arrow_schema(std::move(arr)); | ||
|
|
||
| return PyCapsule_New(arrow_schema_ptr, arrow_schema_str.data(), ReleaseArrowSchemaPyCapsule); |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Potential resource leak if PyCapsule_New fails. If PyCapsule_New returns nullptr (e.g., due to memory allocation failure), the allocated arrow_schema_ptr would leak. Consider checking for nullptr and cleaning up the allocated schema before returning, or document that callers must ensure Python is properly initialized.
| auto schema = static_cast<ArrowSchema*>(PyCapsule_GetPointer(capsule, arrow_schema_str.data())); | ||
| if (schema->release != nullptr) | ||
| { | ||
| schema->release(schema); | ||
| } | ||
| delete schema; |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing error handling. If PyCapsule_GetPointer fails and returns nullptr (which can happen if the capsule is invalid), dereferencing schema on line 21 would cause undefined behavior. The function should check if schema is nullptr before accessing its members.
| auto array = static_cast<ArrowArray*>(PyCapsule_GetPointer(capsule, arrow_array_str.data())); | ||
| if (array->release != nullptr) | ||
| { | ||
| array->release(array); | ||
| } | ||
| delete array; |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing error handling. If PyCapsule_GetPointer fails and returns nullptr (which can happen if the capsule is invalid), dereferencing array on line 45 would cause undefined behavior. The function should check if array is nullptr before accessing its members.
No description provided.