Skip to content

Updating Dense#intoBitSet to take into account the provided bitset if upTo equals to NO_MORE_DOCS #14882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 7, 2025

Conversation

pmpailis
Copy link
Contributor

@pmpailis pmpailis commented Jul 1, 2025

When specifying a knn query with a custom filter, it is possible to be met with the following IOOB exception

Range [0, 0 + 65536) out of bounds for length 10083
java.lang.IndexOutOfBoundsException: Range [0, 0 + 65536) out of bounds for length 10083
	at __randomizedtesting.SeedInfo.seed([D5F220708924BDAF:C71838EC9B6B347C]:0)
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckFromIndexSize(Preconditions.java:118)
	at java.base/jdk.internal.util.Preconditions.checkFromIndexSize(Preconditions.java:397)
	at java.base/java.util.Objects.checkFromIndexSize(Objects.java:417)
	at org.apache.lucene.util.FixedBitSet.orRange(FixedBitSet.java:402)
	at org.apache.lucene.codecs.lucene90.IndexedDISI$Method$2.intoBitSetWithinBlock(IndexedDISI.java:731)
	at org.apache.lucene.codecs.lucene90.IndexedDISI.intoBitSet(IndexedDISI.java:477)
	at org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$21.intoBitSet(Lucene90DocValuesProducer.java:1022)
	at org.apache.lucene.index.SingletonSortedSetDocValues.intoBitSet(SingletonSortedSetDocValues.java:93)
	at org.apache.lucene.util.FixedBitSet.or(FixedBitSet.java:380)
	at org.apache.lucene.search.AbstractKnnVectorQuery.createBitSet(AbstractKnnVectorQuery.java:234)
	at org.apache.lucene.search.AbstractKnnVectorQuery.getLeafResults(AbstractKnnVectorQuery.java:194)
	at org.apache.lucene.search.AbstractKnnVectorQuery.searchLeaf(AbstractKnnVectorQuery.java:168)
	at org.apache.lucene.search.AbstractKnnVectorQuery.lambda$rewrite$0(AbstractKnnVectorQuery.java:108)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)

This seems to happen because when creating a FixedBitSet with maxDoc size in AbstractKnnVectorQuery#createBitSet

FixedBitSet bitSet = new FixedBitSet(maxDoc);

we end up providing an upTo value of DocIdSetIterator.NO_MORE_DOCS through the or. So, when we reach FixedBitSet.orRange we try to verify the dest bitset based on BLOCK_SIZE size, even though the actual size is usually less than that, causing the above IIOB exception.

The issue is reproducible for IndexedDISI.Method.Dense#intoBitSetWithinBlock only, and not for Sparse.

In this PR, we update the upper bound provided to FixedBitSet.orRange to take into account the bitSet.length() if upTo == NO_MORE_DOCS. If this seems too broad, we can narrow down the fix specifically to the AbstractKnnVectorQuery case, by providing a well defined upTo to or.

Copy link

github-actions bot commented Jul 1, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

doc.add(
new KnnFloatVectorField(
"vector", new float[] {i, i * 2}, VectorSimilarityFunction.DOT_PRODUCT));
if (i % 2 == 0) {
Copy link
Contributor Author

@pmpailis pmpailis Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried adding a reproducible case so that we can enforce DENSE for IndexedDISI. Happy to discuss alternative approaches to ensure this.

Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I left some minor comments

int destFrom = disi.doc - offset;
int disiTo = upTo == DocIdSetIterator.NO_MORE_DOCS ? bitSet.length() : upTo;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use Math.min(upTo, bitSet.length()) to be more robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in e03699a.

@@ -276,6 +277,40 @@ public void testDocAndScoreQueryBasics() throws IOException {
}
}

@Nightly
public void testKnnQueryWithFiltering() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if TestIndexedDISI is a better place to test this?

Copy link
Contributor Author

@pmpailis pmpailis Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another one in TestIndexedDisi in e03699a, that I believe captures the issue. Should we remove this one altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove this, the test in TestIndexedDISI looks good enough to me.

Copy link

github-actions bot commented Jul 1, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Copy link

github-actions bot commented Jul 1, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@@ -276,6 +277,40 @@ public void testDocAndScoreQueryBasics() throws IOException {
}
}

@Nightly
public void testKnnQueryWithFiltering() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove this, the test in TestIndexedDISI looks good enough to me.

@gf2121
Copy link
Contributor

gf2121 commented Jul 2, 2025

Maybe add a CHANGES entry under 10.3.0 as well

@gf2121 gf2121 added this to the 10.3.0 milestone Jul 2, 2025
@pmpailis
Copy link
Contributor Author

pmpailis commented Jul 2, 2025

Maybe add a CHANGES entry under 10.3.0 as well

Added something in 9864123. Let me know if you see anything wrong with the wording.

Thanks for reviewing @gf2121 !

Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit 96aaa7e into apache:main Jul 7, 2025
8 checks passed
@benwtrent
Copy link
Member

During the backport process, I noticed many tests failing. I wonder if CI didn't fully exercise paths here, or there were other changes since this PR was opened.

./gradlew :lucene:core:test --tests "org.apache.lucene.codecs.lucene90.TestIndexedDISI.testDenseMultiBlock" -Ptests.asserts=true -Ptests.file.encoding=UTF-8 -Ptests.forceintegervectors=true -Ptests.gui=false "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.jvms=5 -Ptests.locale=en-MW -Ptests.seed=5F1E0E17D760E62D -Ptests.timezone=America/El_Salvador -Ptests.vectorsize=512
org.apache.lucene.codecs.lucene90.TestIndexedDISI > test suite's output saved to /Users/benjamintrent/Projects/lucene-bench/baseline/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.codecs.lucene90.TestIndexedDISI.txt, copied below:
  2> Jul 07, 2025 12:33:35 PM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
  2> WARNING: Vector bitsize and/or integer vectors enforcement; using default vectorization provider outside of testMode
   >     java.lang.AssertionError
   >         at __randomizedtesting.SeedInfo.seed([5F1E0E17D760E62D:461BC729954A0539]:0)
   >         at org.apache.lucene.util.FixedBitSet.orRange(FixedBitSet.java:400)
   >         at org.apache.lucene.codecs.lucene90.IndexedDISI$Method$2.intoBitSetWithinBlock(IndexedDISI.java:731)
   >         at org.apache.lucene.codecs.lucene90.IndexedDISI.intoBitSet(IndexedDISI.java:477)
   >         at org.apache.lucene.codecs.lucene90.TestIndexedDISI.assertIntoBitsetRandomized(TestIndexedDISI.java:631)
   >         at org.apache.lucene.codecs.lucene90.TestIndexedDISI.doTest(TestIndexedDISI.java:563)
   >         at org.apache.lucene.codecs.lucene90.TestIndexedDISI.testDenseMultiBlock(TestIndexedDISI.java:387)
   >         at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:565)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   >         at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
   >         at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)

On 10x with the backport, I ran into things like:

java.lang.IndexOutOfBoundsException: Range [96, 96 + -32) out of bounds for length 1024

benwtrent added a commit that referenced this pull request Jul 7, 2025
…itset if upTo equals to NO_MORE_DOCS (#14882)"

This reverts commit 96aaa7e.
@benwtrent
Copy link
Member

@pmpailis @gf2121 I reverted on main as it was just failing so many tests immediately. Some edge case was missed.

@dweiss
Copy link
Contributor

dweiss commented Jul 7, 2025

I wonder if CI didn't fully exercise paths here, or there were other changes since this PR was opened.

The tests are randomized, it can happen that a PR run doesn't hit something that will surface later. I think it's ok. So is reverting to figure out what's going on.

@benwtrent
Copy link
Member

The tests are randomized, it can happen that a PR run doesn't hit something that will surface later. I think it's ok. So is reverting to figure out what's going on.

@dweiss for sure, its random. But it failed immediately when I tested locally. It just surprised me :).

I wonder if there was other work done in this area recently and this PR just needed to be updated to ensure it was fully tested.

@pmpailis
Copy link
Contributor Author

pmpailis commented Jul 8, 2025

I think that the issue seems to be with the following

int disiTo = Math.min(upTo, bitSet.length())

as the above computation does not take into account the initial offset, so we could end up reading less data than initially requested. So for example, we could have bitSize.length: 100, offset: 90, and upTo: 150. Instead of reading [90, 150], we would end up operating on just the [90, 100] range causing all sorts of issues later in the pipeline.
Switching back to

int disiTo = upTo == DocIdSetIterator.NO_MORE_DOCS ? bitSet.length() : upTo;

seems to address all failures (had 50 full test successful runs).

Would you suggest to correct only when upTo == NO_MORE_DOCS (or sth like upTo - offset >= bitSet.length()) or to maybe just restrict the fix for the initial knn filter case and provide a custom upTo at that point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants