Skip to content

feat: basic BoolBuffer / BoolBufferMut #2456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Feb 21, 2025

It's very annoying to do random in-place setting of bits in Arrow BooleanBufferBuilder. It's very append focused. For something like SparseArray in particular, you want to create a BoolArray against uninitialized memory and only set the bits corresponding to positions of non-null values.

@gatesn
Copy link
Contributor

gatesn commented Feb 21, 2025

Can we call this BitBuffer?

@a10y a10y marked this pull request as ready for review February 21, 2025 17:10
@a10y a10y marked this pull request as draft February 21, 2025 17:12
Copy link
Contributor

@gatesn gatesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we'll also want iterators and vectorized constructors (i.e. BooleanBuffer::collect_bool)

Self {
buffer: self.buffer.clone(),
len,
offset: self.offset + start,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to guarantee that we only hold a bit_offset? i.e. offset < 8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does that help much?

@a10y a10y force-pushed the aduffy/bool-buffer branch 2 times, most recently from 25fc4bd to 67729e2 Compare February 24, 2025 22:51
@a10y a10y marked this pull request as ready for review February 24, 2025 22:52
@a10y
Copy link
Contributor Author

a10y commented Feb 25, 2025

Sadly new_uninit leads to UB when you try and do inplace updates on it. I can't think of a great way to work around that without doing something like having a second bitset, which sort of defeats the point of avoiding the initialization cost anyway, so for now I'm just removing it.

@gatesn
Copy link
Contributor

gatesn commented Mar 1, 2025

Where are we at with this? Need someone to takeover while you poke at FFI bindings?

@a10y
Copy link
Contributor Author

a10y commented Mar 2, 2025

Maybe? I'm no longer sure there's a strong path forward that gives us something better than Arrow's BooleanBufferBuilder.

The big issue is that mapping a bitset on top of uninit memory causes UB when you try and set bits in-place. While it generally seems harmless it is technically a violation of Rust's memory model so not guaranteed to work across platforms.

@gatesn
Copy link
Contributor

gatesn commented Mar 2, 2025

Why does the memory have to be uninitialized? Even the ability to into_mut and update bits is valuable from an API perspective.

@robert3005
Copy link
Contributor

I will take over this pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants