-
Notifications
You must be signed in to change notification settings - Fork 38
fix(boxcar): current bucket pre-allocation code is over-allocating #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(boxcar): current bucket pre-allocation code is over-allocating #73
Conversation
I am inclined to agree this is a bug and this is the correct fix. This same code is in the orginal boxcar library I vendored this from. You seemed to do some benchmarking. I was wondering if this change had any performance impact (good or bad). This allocation is supposed to reduce contention when allocating the next shard so it's supposed to be an optimization |
Looking deeper into it, I believe this bug actually causes:
For the second point above, it's easy to convince yourself that is indeed the case by looking at an example:
which means no position inside bucket 4 (ranging from absolute index 480 to 991) can actually satisfy the condition. In a more general way:
and beyond N>=4, the following holds
which is equivalent to
And that phenomenon just gets worse as you increase the bucket number. What this means in the end is that the pre-allocation condition never seems to be met past bucket 4. |
Yeah I agree, J was just curious if you.had dine any benchmarking ti see what the performance impact is |
* feat(injector): add an `extend` method to Nucleo's injector * Update lib.rs Co-authored-by: Michael Davis <[email protected]> * remove benches and #73 patch * udpates following pascalkuthe's review * simplification * adding tests * remove unused method --------- Co-authored-by: Michael Davis <[email protected]>
So, did a bit of benchmarking. Surprisingly, the I'm assuming this is because the This led me to question whether that pre-allocation is doing anything useful at all and so I decided to benchmark and compare with no pre-allocation at all to see how that performed. Removing the pre-allocation code altogether actually improved performance, as shown below:
Here is the benchmark code (uses criterion): use std::{sync::Arc, thread::available_parallelism};
use criterion::{BenchmarkId, Criterion};
use nucleo::boxcar;
use rayon::prelude::*;
const TINY_LINE_COUNT: u32 = 100;
const SMALL_LINE_COUNT: u32 = 1_000;
const MEDIUM_LINE_COUNT: u32 = 50_000;
const LARGE_LINE_COUNT: u32 = 500_000;
const XLARGE_LINE_COUNT: u32 = 5_000_000;
const XXLARGE_LINE_COUNT: u32 = 20_000_000;
fn grow_boxcar(c: &mut Criterion) {
let mut group = c.benchmark_group("grow_boxcar");
for line_count in [
TINY_LINE_COUNT,
SMALL_LINE_COUNT,
MEDIUM_LINE_COUNT,
LARGE_LINE_COUNT,
//XLARGE_LINE_COUNT,
//XXLARGE_LINE_COUNT,
] {
// generate random strings
let lines = random_lines(line_count);
group.bench_with_input(BenchmarkId::new("push", line_count), &lines, |b, lines| {
b.iter(move || {
let v = Arc::new(boxcar::Vec::with_capacity(2 * 1024, 1));
for line in lines {
v.push(line, |_, _cols| {});
}
});
});
}
}
fn grow_boxcar_threaded(c: &mut Criterion) {
let mut group = c.benchmark_group("grow_boxcar_push_threaded");
for line_count in [
TINY_LINE_COUNT,
SMALL_LINE_COUNT,
MEDIUM_LINE_COUNT,
LARGE_LINE_COUNT,
//XLARGE_LINE_COUNT,
//XXLARGE_LINE_COUNT,
] {
// generate random strings
let lines = random_lines(line_count);
let available_parallelism = available_parallelism().unwrap();
let batch_size = lines.len() / usize::from(available_parallelism);
group.bench_with_input(BenchmarkId::new("push", line_count), &lines, |b, lines| {
b.iter(|| {
let v = Arc::new(boxcar::Vec::with_capacity(2 * 1024, 1));
lines
.chunks(batch_size)
.par_bridge()
.into_par_iter()
.for_each(|batch| {
for line in batch {
v.push(line, |_, _cols| {});
}
});
});
});
}
}
fn random_lines(count: u32) -> Vec<String> {
let count = i64::from(count);
let word_count = 1;
(0..count)
.map(|_| fakeit::words::sentence(word_count))
.collect()
}
criterion::criterion_group!(benches, grow_boxcar, grow_boxcar_threaded);
criterion::criterion_main!(benches); 🤔 |
Fixes #75
From my understanding, this code's intent is to eagerly allocate the next bucket if we're about to write the first entry of the last 1/8th of the current bucket.
The code currently checks if:
Where
index
is - if I'm not mistaken a global vector index and should really belocation.entry
instead which corresponds to the relative location inside the current bucket:NOTE: in practice, this would mean the current code was over-allocating, by allocating ahead of time for potentially unneeded buckets.