Skip to content

Commit

Permalink
Deduplicate min and max term in single-term FieldReader (#13618)
Browse files Browse the repository at this point in the history
I noticed that single-term readers are an edge case but not that
uncommon in Elasticsearch heap dumps. It seems quite common to have a
constant value for some field across a complete segment (e.g. a version
value that is repeated endlessly in logs).
Seems simple enough to deduplicate here to save a couple MB of heap.
  • Loading branch information
original-brownbear authored Jul 31, 2024
1 parent ca098e6 commit 47650a4
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,11 @@ public Lucene90BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentRe
final int docCount = metaIn.readVInt();
BytesRef minTerm = readBytesRef(metaIn);
BytesRef maxTerm = readBytesRef(metaIn);
if (numTerms == 1) {
assert maxTerm.equals(minTerm);
// save heap for edge case of a single term only so min == max
maxTerm = minTerm;
}
if (docCount < 0
|| docCount > state.segmentInfo.maxDoc()) { // #docs with field must be <= #docs
throw new CorruptIndexException(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -598,8 +598,6 @@ private void append(
private final ByteBuffersDataOutput scratchBytes = ByteBuffersDataOutput.newResettableInstance();
private final IntsRefBuilder scratchIntsRef = new IntsRefBuilder();

static final BytesRef EMPTY_BYTES_REF = new BytesRef();

private static class StatsWriter {

private final DataOutput out;
Expand Down

0 comments on commit 47650a4

Please sign in to comment.