You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I stumbled across this one in a real-life application, where matches-API based highlighting of a query like this:
field:(a OR b OR c OR d OR ...)
took very long to complete, even though query execution itself is blazing fast. The reason is (I think!) in how the MultiTermQuery handles matches - the AbstractMultiTermQueryConstantScoreWrapper returns a disjunction of iterators from a terms enum:
@Override
public Matches matches(LeafReaderContext context, int doc) throws IOException {
final Terms terms = context.reader().terms(q.field);
if (terms == null) {
return null;
}
return MatchesUtils.forField(
q.field,
() ->
DisjunctionMatchesIterator.fromTermsEnum(
context, doc, q, q.field, q.getTermsEnum(terms)));
}
but for a large set of alternatives, the loop scan inside fromTermsEnum can take a long time until it hits the right document:
static MatchesIterator fromTermsEnum(
LeafReaderContext context, int doc, Query query, String field, BytesRefIterator terms)
throws IOException {
Objects.requireNonNull(field);
Terms t = Terms.getTerms(context.reader(), field);
TermsEnum te = t.iterator();
PostingsEnum reuse = null;
for (BytesRef term = terms.next(); term != null; term = terms.next()) {
if (te.seekExact(term)) {
PostingsEnum pe = te.postings(reuse, PostingsEnum.OFFSETS);
if (pe.advance(doc) == doc) {
return new TermsEnumDisjunctionMatchesIterator(
new TermMatchesIterator(query, pe), terms, te, doc, query);
} else {
reuse = pe;
}
}
}
return null;
}
I've no idea what the fix can be here, just mentioning the problem before I forget it.
Version and environment details
No response
The text was updated successfully, but these errors were encountered:
Perhaps this wasn't clear - the important bit here is the use of TermInSetQuery (the query parsed substitutes large boolean expressions to this type of query to prevent max-boolean-clauses-exceeded errors).
Description
I stumbled across this one in a real-life application, where matches-API based highlighting of a query like this:
field:(a OR b OR c OR d OR ...)
took very long to complete, even though query execution itself is blazing fast. The reason is (I think!) in how the MultiTermQuery handles matches - the AbstractMultiTermQueryConstantScoreWrapper returns a disjunction of iterators from a terms enum:
but for a large set of alternatives, the loop scan inside fromTermsEnum can take a long time until it hits the right document:
I've no idea what the fix can be here, just mentioning the problem before I forget it.
Version and environment details
No response
The text was updated successfully, but these errors were encountered: