-
Notifications
You must be signed in to change notification settings - Fork 161
Description
I am attempting to use Appender::append_record_batch
to append data from my arrow RecordBatch
into a table. I've found that when I attempt to push more than 2048 rows this way, I hit the following assertion:
thread 'main' panicked at .../.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/duckdb-1.2.2/src/core/vector.rs:119:9:
assertion failed: data.len() <= self.capacity()
Debugging at the point of failure confirms we have more than 2048 rows we're attempting to push here:
(lldb) p data
(&[u64]) size=2244 {
...
(lldb) p *self
(duckdb::core::vector::FlatVector) {
ptr = 0x00007fecc38a4fc0
capacity = 2048
}
Relevant part of the backtrace:
frame #7: 0x000000010048b046 prof2duckdb`duckdb::core::vector::FlatVector::copy::h3b8ba62694fbf9c3(self=0x00007ff7bfefd2b0, data=size=2244) at vector.rs:119:9
frame #8: 0x0000000100483d71 prof2duckdb`duckdb::vtab::arrow::primitive_array_to_flat_vector::he8b0c0d99f9a7a4a(array=0x00006000035bddd0, out_vector=0x00007ff7bfefd2b0) at arrow.rs:689:5
frame #9: 0x0000000100484a49 prof2duckdb`duckdb::vtab::arrow::primitive_array_to_vector::ha3cd4ab5953dcfd3(array=&dyn arrow_array::array::Array @ 0x00007ff7bfefd010, out=&mut dyn duckdb::core::vector::Vector @ 0x00007ff7bfefd020) at arrow.rs:728:13
frame #10: 0x0000000100482aaf prof2duckdb`duckdb::vtab::arrow::write_arrow_array_to_vector::hb51f0e66e71720f3(col=0x0000600001098120, chunk=&mut dyn duckdb::vtab::arrow::WritableVector @ 0x00007ff7bfefd4f8) at arrow.rs:563:13
frame #11: 0x0000000100483861 prof2duckdb`duckdb::vtab::arrow::record_batch_to_duckdb_data_chunk::h50784510996b5f2d(batch=0x00007ff7bfefde20, chunk=0x00007ff7bfefd8e0) at arrow.rs:681:9
frame #12: 0x000000010000c7cb prof2duckdb`duckdb::appender::arrow::_$LT$impl$u20$duckdb..appender..Appender$GT$::append_record_batch::ha157b8edd8898e54(self=0x00007ff7bfefdcd0, record_batch=RecordBatch @ 0x00007ff7bfefde20) at arrow.rs:40:9
The code in question:
duckdb-rs/crates/duckdb/src/core/vector.rs
Line 119 in ffa3f4e
assert!(data.len() <= self.capacity()); |
Walking up the stack I see this call, which seems to attempt to write into a data chunk:
duckdb-rs/crates/duckdb/src/vtab/arrow.rs
Line 681 in ffa3f4e
write_arrow_array_to_vector(col, &mut DataChunkHandleSlice::new(chunk, i))?; |
Based on the data chunk documentation here, I'm guessing we're attempting to push the entire RecordBatch into a single data chunk without properly splitting it up: https://duckdb.org/docs/stable/clients/c/data_chunk.html
This seems like something that ought to get handled at the duckdb-rs level, because otherwise users would need to manually split up their data before appending.
Cargo.toml entry for duckdb-rs:
duckdb = { version = "1.2.2", features = ["appender-arrow", "bundled"], optional = true }
$ rustc --version
rustc 1.86.0 (05f9846f8 2025-03-31)