-
-
Notifications
You must be signed in to change notification settings - Fork 730
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
use optional index in multivalued index (#2439)
* use optional index in multivalued index For mostly empty multivalued indices there was a large overhead during creation when iterating all docids. This is alleviated by placing an optional index in the multivalued index to mark documents that have values. There's some performance overhead when accessing values in a multivalued index. The accessing cost is now optional index + multivalue index. The sparse codec performs relatively bad with the binary_search when accessing data. This is reflected in the benchmarks below. This changes the format of columnar to v2, but code is added to handle the v1 formats. ``` Running benches/bench_access.rs (/home/pascal/Development/tantivy/optional_multivalues/target/release/deps/bench_access-ea323c028db88db4) multi sparse 1/13 access_values_for_doc Avg: 42.8946ms (+241.80%) Median: 42.8869ms (+244.10%) [42.7484ms .. 43.1074ms] access_first_vals Avg: 42.8022ms (+421.93%) Median: 42.7553ms (+439.84%) [42.6794ms .. 43.7404ms] multi 2x access_values_for_doc Avg: 31.1244ms (+24.17%) Median: 30.8339ms (+23.46%) [30.7192ms .. 33.6059ms] access_first_vals Avg: 24.3070ms (+70.92%) Median: 24.0966ms (+70.18%) [23.9328ms .. 26.4851ms] sparse 1/13 access_values_for_doc Avg: 42.2490ms (+0.61%) Median: 42.2346ms (+2.28%) [41.8988ms .. 43.7821ms] access_first_vals Avg: 43.6272ms (+0.23%) Median: 43.6197ms (+1.78%) [43.4920ms .. 43.9009ms] dense 1/12 access_values_for_doc Avg: 8.6184ms (+23.18%) Median: 8.6126ms (+23.78%) [8.5843ms .. 8.7527ms] access_first_vals Avg: 6.8112ms (+4.47%) Median: 6.8002ms (+4.55%) [6.7887ms .. 6.8991ms] full access_values_for_doc Avg: 9.4073ms (-5.09%) Median: 9.4023ms (-2.23%) [9.3694ms .. 9.4568ms] access_first_vals Avg: 4.9531ms (+6.24%) Median: 4.9502ms (+7.85%) [4.9423ms .. 4.9718ms] ``` ``` Running benches/bench_merge.rs (/home/pascal/Development/tantivy/optional_multivalues/target/release/deps/bench_merge-475697dfceb3639f) merge_multi 2x_and_multi 2x Avg: 20.2280ms (+34.33%) Median: 20.1829ms (+35.33%) [19.9933ms .. 20.8806ms] merge_multi sparse 1/13_and_multi sparse 1/13 Avg: 0.8961ms (-78.04%) Median: 0.8943ms (-77.61%) [0.8899ms .. 0.9272ms] merge_dense 1/12_and_dense 1/12 Avg: 0.6619ms (-1.26%) Median: 0.6616ms (+2.20%) [0.6473ms .. 0.6837ms] merge_sparse 1/13_and_sparse 1/13 Avg: 0.5508ms (-0.85%) Median: 0.5508ms (+2.80%) [0.5420ms .. 0.5634ms] merge_sparse 1/13_and_dense 1/12 Avg: 0.6046ms (-4.64%) Median: 0.6038ms (+2.80%) [0.5939ms .. 0.6296ms] merge_multi sparse 1/13_and_dense 1/12 Avg: 0.9111ms (-83.48%) Median: 0.9063ms (-83.50%) [0.9047ms .. 0.9663ms] merge_multi sparse 1/13_and_sparse 1/13 Avg: 0.8451ms (-89.49%) Median: 0.8428ms (-89.43%) [0.8411ms .. 0.8563ms] merge_multi 2x_and_dense 1/12 Avg: 10.6624ms (-4.82%) Median: 10.6568ms (-4.49%) [10.5738ms .. 10.8353ms] merge_multi 2x_and_sparse 1/13 Avg: 10.6336ms (-22.95%) Median: 10.5925ms (-22.33%) [10.5149ms .. 11.5657ms] ``` * Update columnar/src/columnar/format_version.rs Co-authored-by: Paul Masurel <[email protected]> * Update columnar/src/column_index/mod.rs Co-authored-by: Paul Masurel <[email protected]> --------- Co-authored-by: Paul Masurel <[email protected]>
- Loading branch information
1 parent
511b027
commit 5908414
Showing
28 changed files
with
1,007 additions
and
366 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
extern crate tantivy_columnar; | ||
|
||
use core::fmt; | ||
use std::fmt::{Display, Formatter}; | ||
|
||
use tantivy_columnar::{ColumnarReader, ColumnarWriter}; | ||
|
||
pub enum Card { | ||
MultiSparse, | ||
Multi, | ||
Sparse, | ||
Dense, | ||
Full, | ||
} | ||
impl Display for Card { | ||
fn fmt(&self, f: &mut Formatter) -> fmt::Result { | ||
match self { | ||
Card::MultiSparse => write!(f, "multi sparse 1/13"), | ||
Card::Multi => write!(f, "multi 2x"), | ||
Card::Sparse => write!(f, "sparse 1/13"), | ||
Card::Dense => write!(f, "dense 1/12"), | ||
Card::Full => write!(f, "full"), | ||
} | ||
} | ||
} | ||
pub fn generate_columnar_with_name(card: Card, num_docs: u32, column_name: &str) -> ColumnarReader { | ||
let mut columnar_writer = ColumnarWriter::default(); | ||
|
||
if let Card::MultiSparse = card { | ||
columnar_writer.record_numerical(0, column_name, 10u64); | ||
columnar_writer.record_numerical(0, column_name, 10u64); | ||
} | ||
|
||
for i in 0..num_docs { | ||
match card { | ||
Card::MultiSparse | Card::Sparse => { | ||
if i % 13 == 0 { | ||
columnar_writer.record_numerical(i, column_name, i as u64); | ||
} | ||
} | ||
Card::Dense => { | ||
if i % 12 == 0 { | ||
columnar_writer.record_numerical(i, column_name, i as u64); | ||
} | ||
} | ||
Card::Full => { | ||
columnar_writer.record_numerical(i, column_name, i as u64); | ||
} | ||
Card::Multi => { | ||
columnar_writer.record_numerical(i, column_name, i as u64); | ||
columnar_writer.record_numerical(i, column_name, i as u64); | ||
} | ||
} | ||
} | ||
|
||
let mut wrt: Vec<u8> = Vec::new(); | ||
columnar_writer.serialize(num_docs, &mut wrt).unwrap(); | ||
ColumnarReader::open(wrt).unwrap() | ||
} |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.