Add `HashTable` methods related to the raw bucket index #657

cuviper · 2025-10-23T01:29:03Z

On HashTable<T, A>:

pub fn num_buckets(&self) -> usize;
pub fn find_bucket_index(&self, hash: u64, eq: impl FnMut(&T) -> bool) -> Option<usize>;
pub fn get_bucket(&self, index: usize) -> Option<&T>;
pub fn get_bucket_mut(&mut self, index: usize) -> Option<&mut T>;
pub fn get_bucket_entry(&mut self, index: usize) -> Option<OccupiedEntry<'_, T, A>>;
pub unsafe fn get_bucket_unchecked(&self, index: usize) -> &T;
pub unsafe fn get_bucket_unchecked_mut(&mut self, index: usize) -> &mut T;

On OccupiedEntry<'_, T, A>:

pub fn bucket_index(&self) -> usize;

Closes #613, although I chose different names.

cuviper · 2025-10-23T01:30:03Z

although I chose different names.

I'm still willing to change if other names are preferred.

src/table.rs

clarfonthey

I think that get_bucket and get_bucket_mut are good, but buckets should be renamed to num_buckets to make it clear it's returning a count and not an iterator.

In terms of find_bucket_index and get_bucket_entry, I left some comments. But if you wanted to merge the smaller set first, I'm fine with that.

clarfonthey

Also, not required at all, but since a lot of code using the HashTable API is going to be unsafe anyway, it might be worth adding get_bucket_unchecked and get_bucket_unchecked_mut as well, which don't return Option.

moulins · 2025-10-23T15:50:13Z

Does get_bucket_entry really need to take a hash? The Entry structs currently store the full hash, but I suspect the code could be refactored to only store a raw::control::Tag, which can then be fetched from the hashtable storage when constructing an OccupiedEntry.

cuviper · 2025-10-23T17:15:44Z

Also, not required at all, but since a lot of code using the HashTable API is going to be unsafe anyway, it might be worth adding get_bucket_unchecked and get_bucket_unchecked_mut as well, which don't return Option.

I personally think HashTable should emphasize safe usage, but I guess that line has already been breached with get_disjoint_unchecked_mut. These would be easy to add if desired.

Does get_bucket_entry really need to take a hash? The Entry structs currently store the full hash, but I suspect the code could be refactored to only store a raw::control::Tag, which can then be fetched from the hashtable storage when constructing an OccupiedEntry.

Ooh, I like that! We couldn't support the full Option<Entry> suggested above, but this avoids all my concerns about breaking the hash probe sequence if there's no new hash involved. I think we also only need this Tag in VacantEntry, because OccupiedEntry::remove can just get it from the control byte.

cuviper · 2025-10-23T18:34:23Z

Calling out this additional benefit of the Tag change, as I wrote in the commit:

Also, since OccupiedEntry is now smaller, enum Entry will be the
same size as VacantEntry by using a niche for the discriminant.
(Although this is not guaranteed by the compiler.)

clarfonthey · 2025-10-23T21:29:46Z

re: unsafe methods, I guess the main concern is whether the bounds checks can easily be optimised out. While there isn't the same robust codegen testing as the compiler here, maybe it would be worth poking around on godbolt and seeing how find_bucket_index followed by one of the relevant methods is handled.

cuviper · 2025-10-23T23:04:46Z

I've added those two unchecked methods.

seeing how find_bucket_index followed by one of the relevant methods is handled.

Using cargo asm I can confirm that the optimizer does not see that the checks are redundant after find_bucket_index -- comparing get_bucket_unchecked(index) to get_bucket(index).unwrap() adds:

+       inc rdi
+       cmp r10, rdi
+       jae .LBB6_11
+       cmp byte ptr [rcx + r10], 0
+       js .LBB6_11
        add rax, -8
        pop rcx
        .cfi_def_cfa_offset 8
        ret

... where .LBB6_11 calls unwrap_failed.

clarfonthey · 2025-10-23T23:36:32Z

I guess that another benefit to bumping MSRV would be the ability to add assert_unchecked here to potentially help, but I'm not 100% sure it actually would in this case. It's very finicky in cases like this.

Even without an MSRV bump, I guess you could use the nightly feature as a temporary measure.

cuviper · 2025-10-24T00:04:38Z

I assume you mean adding that in find_bucket_index?

    pub fn find_bucket_index(&self, hash: u64, eq: impl FnMut(&T) -> bool) -> Option<usize> {
        match self.raw.find(hash, eq) {
            Some(bucket) => Some(unsafe {
                let index = self.raw.bucket_index(&bucket);
                core::hint::assert_unchecked(index < self.raw.buckets());
                core::hint::assert_unchecked(self.raw.is_bucket_full(index));
                index
            }),
            None => None,
        }
    }

It does remove the unwrap in my simple back-to-back test, but it also pessimizes the unchecked case a little, recomputing the final address. Different variations of that assertion had the same effect.

+        neg r9
+        lea rax, [rax + 8*r9]
         add rax, -8
         pop rcx
         .cfi_def_cfa_offset 8
         ret

In a real scenario I expect you'd also have other interim code, and who knows if the assertion could still be propagated. I don't think we should bother until there's a real measured benefit, and anyway even the unhinted checked case is still much less than a full hash probe that you would have needed before!

cuviper · 2025-10-24T00:05:50Z

RE MSRV, I recently bumped indexmap to 1.82, which is just over a year old.

src/raw/mod.rs

`VacantEntry` now stores a `Tag` instead of a full `hash: u64`. This means that `OccupiedEntry` doesn't need to store anything, because it can get that tag from the control byte for `remove -> VacantEntry`. The `get_bucket_entry` method doesn't need a hash argument either. Also, since `OccupiedEntry` is now smaller, `enum Entry` will be the same size as `VacantEntry` by using a niche for the discriminant. (Although this is not _guaranteed_ by the compiler.)

Add HashTable methods related to the raw bucket index

95a2cdd

cuviper requested a review from Amanieu October 23, 2025 01:29

clarfonthey reviewed Oct 23, 2025

View reviewed changes

src/table.rs Outdated Show resolved Hide resolved

clarfonthey reviewed Oct 23, 2025

View reviewed changes

src/table.rs Outdated Show resolved Hide resolved

clarfonthey suggested changes Oct 23, 2025

View reviewed changes

clarfonthey reviewed Oct 23, 2025

View reviewed changes

Rename HashTable::buckets to num_buckets

8ad61ce

cuviper force-pushed the table-bucket branch from 32d1dac to 8c30147 Compare October 23, 2025 18:30

clarfonthey reviewed Oct 24, 2025

View reviewed changes

src/raw/mod.rs Outdated Show resolved Hide resolved

cuviper added 2 commits October 23, 2025 23:35

Add get_bucket_unchecked and get_bucket_unchecked_mut

99176cc

cuviper force-pushed the table-bucket branch from 8f72446 to 99176cc Compare October 24, 2025 06:35

clarfonthey approved these changes Oct 24, 2025

View reviewed changes

cuviper mentioned this pull request Oct 24, 2025

Bucket-based API for HashTable #613

Open

Uh oh!

Add HashTable methods related to the raw bucket index #657

Are you sure you want to change the base?

Add HashTable methods related to the raw bucket index #657

Conversation

cuviper commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cuviper commented Oct 23, 2025

Uh oh!

Uh oh!

Uh oh!

clarfonthey left a comment

Choose a reason for hiding this comment

Uh oh!

clarfonthey left a comment

Choose a reason for hiding this comment

Uh oh!

moulins commented Oct 23, 2025

Uh oh!

cuviper commented Oct 23, 2025

Uh oh!

cuviper commented Oct 23, 2025

Uh oh!

clarfonthey commented Oct 23, 2025

Uh oh!

cuviper commented Oct 23, 2025

Uh oh!

clarfonthey commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cuviper commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cuviper commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add `HashTable` methods related to the raw bucket index #657

Add `HashTable` methods related to the raw bucket index #657

cuviper commented Oct 23, 2025 •

edited

Loading

clarfonthey commented Oct 23, 2025 •

edited

Loading

cuviper commented Oct 24, 2025 •

edited

Loading