Skip to content

Appender::append_record_batch fails with more than VECTOR_SIZE rows #503

@elliottslaughter

Description

@elliottslaughter

I am attempting to use Appender::append_record_batch to append data from my arrow RecordBatch into a table. I've found that when I attempt to push more than 2048 rows this way, I hit the following assertion:

thread 'main' panicked at .../.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/duckdb-1.2.2/src/core/vector.rs:119:9:
assertion failed: data.len() <= self.capacity()

Debugging at the point of failure confirms we have more than 2048 rows we're attempting to push here:

(lldb) p data
(&[u64]) size=2244 {
...
(lldb) p *self
(duckdb::core::vector::FlatVector) {
  ptr = 0x00007fecc38a4fc0
  capacity = 2048
}

Relevant part of the backtrace:

    frame #7: 0x000000010048b046 prof2duckdb`duckdb::core::vector::FlatVector::copy::h3b8ba62694fbf9c3(self=0x00007ff7bfefd2b0, data=size=2244) at vector.rs:119:9
    frame #8: 0x0000000100483d71 prof2duckdb`duckdb::vtab::arrow::primitive_array_to_flat_vector::he8b0c0d99f9a7a4a(array=0x00006000035bddd0, out_vector=0x00007ff7bfefd2b0) at arrow.rs:689:5
    frame #9: 0x0000000100484a49 prof2duckdb`duckdb::vtab::arrow::primitive_array_to_vector::ha3cd4ab5953dcfd3(array=&dyn arrow_array::array::Array @ 0x00007ff7bfefd010, out=&mut dyn duckdb::core::vector::Vector @ 0x00007ff7bfefd020) at arrow.rs:728:13
    frame #10: 0x0000000100482aaf prof2duckdb`duckdb::vtab::arrow::write_arrow_array_to_vector::hb51f0e66e71720f3(col=0x0000600001098120, chunk=&mut dyn duckdb::vtab::arrow::WritableVector @ 0x00007ff7bfefd4f8) at arrow.rs:563:13
    frame #11: 0x0000000100483861 prof2duckdb`duckdb::vtab::arrow::record_batch_to_duckdb_data_chunk::h50784510996b5f2d(batch=0x00007ff7bfefde20, chunk=0x00007ff7bfefd8e0) at arrow.rs:681:9
    frame #12: 0x000000010000c7cb prof2duckdb`duckdb::appender::arrow::_$LT$impl$u20$duckdb..appender..Appender$GT$::append_record_batch::ha157b8edd8898e54(self=0x00007ff7bfefdcd0, record_batch=RecordBatch @ 0x00007ff7bfefde20) at arrow.rs:40:9

The code in question:

assert!(data.len() <= self.capacity());

Walking up the stack I see this call, which seems to attempt to write into a data chunk:

write_arrow_array_to_vector(col, &mut DataChunkHandleSlice::new(chunk, i))?;

Based on the data chunk documentation here, I'm guessing we're attempting to push the entire RecordBatch into a single data chunk without properly splitting it up: https://duckdb.org/docs/stable/clients/c/data_chunk.html

This seems like something that ought to get handled at the duckdb-rs level, because otherwise users would need to manually split up their data before appending.

Cargo.toml entry for duckdb-rs:

duckdb = { version = "1.2.2", features = ["appender-arrow", "bundled"], optional = true }
$ rustc --version
rustc 1.86.0 (05f9846f8 2025-03-31)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions