Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UnwrappingReuseStrategy for AnalyzerWrapper #14154

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

import java.io.Reader;
import org.apache.lucene.util.AttributeFactory;
import org.apache.lucene.util.CloseableThreadLocal;

/**
* Extension to {@link Analyzer} suitable for Analyzers which wrap other Analyzers.
Expand Down Expand Up @@ -151,4 +152,57 @@ protected final Reader initReaderForNormalization(String fieldName, Reader reade
protected final AttributeFactory attributeFactory(String fieldName) {
return getWrappedAnalyzer(fieldName).attributeFactory(fieldName);
}

/**
* A {@link org.apache.lucene.analysis.Analyzer.ReuseStrategy} that checks the wrapped analyzer's
* strategy for reusability. If the wrapped analyzer's strategy returns null, components need to
* be re-created. During components creation, this analyzer must store the wrapped analyzer's
* components in {@code wrappedComponents} local thread variable.
*/
public static final class WrappingReuseStrategy extends ReuseStrategy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the better name is UnWrapping as if we find that the analyzer is a wrapped analyzer, we unwrap it and use the underlying analyzer?

private AnalyzerWrapper wrapper;
private Analyzer wrappedAnalyzer;
private CloseableThreadLocal<TokenStreamComponents> wrappedComponents;
private final ReuseStrategy fallbackStrategy;

public WrappingReuseStrategy(ReuseStrategy fallbackStrategy) {
this.fallbackStrategy = fallbackStrategy;
}

public void setUp(
AnalyzerWrapper wrapper,
Analyzer wrappedAnalyzer,
CloseableThreadLocal<TokenStreamComponents> wrappedComponents) {
this.wrapper = wrapper;
this.wrappedAnalyzer = wrappedAnalyzer;
this.wrappedComponents = wrappedComponents;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bugs me. We allow a reuse strategy to be mutated after construction. It seems ready to cause a bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that the "wrappedComponents" is key here. its tricky, as the analyzer provides access directly to storedValue and there is no way to override :(.

That seems like a bad API already, but we don't want to change all these long lived interfaces unnecessarily.


@Override
public TokenStreamComponents getReusableComponents(Analyzer analyzer, String fieldName) {
if (analyzer == wrapper) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to know the instance, or can we do Analyzer analyzer instanceof WrapperAnalyzer wrapperAnalyzer then get the strategy from wrapperAnalyzer.getWrappedAnalyzer(fieldName).getReuseStategy().getReusableComponents...?

if (wrappedAnalyzer.getReuseStrategy().getReusableComponents(wrappedAnalyzer, fieldName)
== null) {
return null;
} else {
return (TokenStreamComponents) getStoredValue(analyzer);
}
} else {
return fallbackStrategy.getReusableComponents(analyzer, fieldName);
}
}

@Override
public void setReusableComponents(
Analyzer analyzer, String fieldName, TokenStreamComponents components) {
if (analyzer == wrapper) {
setStoredValue(analyzer, components);
wrappedAnalyzer
.getReuseStrategy()
.setReusableComponents(wrappedAnalyzer, fieldName, wrappedComponents.get());
} else {
fallbackStrategy.setReusableComponents(analyzer, fieldName, components);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to know the instance, or can we do Analyzer analyzer instanceof WrapperAnalyzer wrapperAnalyzer then get the wrapped analyzer via getWrappedAnalyzer(fieldName)?

}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,16 @@

package org.apache.lucene.analysis;

import static org.apache.lucene.analysis.Analyzer.GLOBAL_REUSE_STRATEGY;

import java.io.IOException;
import java.util.concurrent.atomic.AtomicBoolean;
import org.apache.lucene.analysis.Analyzer.ReuseStrategy;
import org.apache.lucene.analysis.Analyzer.TokenStreamComponents;
import org.apache.lucene.analysis.AnalyzerWrapper.WrappingReuseStrategy;
import org.apache.lucene.tests.analysis.CannedTokenStream;
import org.apache.lucene.tests.util.LuceneTestCase;
import org.apache.lucene.util.CloseableThreadLocal;

public class TestAnalyzerWrapper extends LuceneTestCase {

Expand All @@ -40,7 +46,7 @@ protected TokenStreamComponents createComponents(String fieldName) {
}
};

Analyzer wrapped =
Analyzer wrapper =
new AnalyzerWrapper(analyzer.getReuseStrategy()) {
@Override
protected Analyzer getWrappedAnalyzer(String fieldName) {
Expand All @@ -55,9 +61,78 @@ protected TokenStreamComponents wrapComponents(
}
};

try (TokenStream ts = wrapped.tokenStream("", "text")) {
try (TokenStream ts = wrapper.tokenStream("", "text")) {
assert ts != null;
assertTrue(sourceCalled.get());
}
}

/**
* Test that {@link AnalyzerWrapper.WrappingReuseStrategy} consults the wrapped analyzer's reuse
* strategy if components can be reused or need to be updated.
*/
public void testWrappingReuseStrategy() {
AtomicBoolean reuse = new AtomicBoolean(true);

final ReuseStrategy wrappedAnalyzerStrategy =
new ReuseStrategy() {
@Override
public TokenStreamComponents getReusableComponents(Analyzer analyzer, String fieldName) {
if (reuse.get() == false) {
return null;
} else {
return (TokenStreamComponents) getStoredValue(analyzer);
}
}

@Override
public void setReusableComponents(
Analyzer analyzer, String fieldName, TokenStreamComponents components) {
setStoredValue(analyzer, components);
}
};
Analyzer wrappedAnalyzer =
new Analyzer(wrappedAnalyzerStrategy) {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
return new TokenStreamComponents(r -> {}, new CannedTokenStream());
}
};

final WrappingReuseStrategy wrapperAnalyzerStrategy =
new AnalyzerWrapper.WrappingReuseStrategy(GLOBAL_REUSE_STRATEGY);
CloseableThreadLocal<TokenStreamComponents> wrappedComponents = new CloseableThreadLocal<>();
AnalyzerWrapper wrapperAnalyzer =
new AnalyzerWrapper(wrapperAnalyzerStrategy) {
@Override
protected Analyzer getWrappedAnalyzer(String fieldName) {
return wrappedAnalyzer;
}

@Override
protected TokenStreamComponents wrapComponents(
String fieldName, TokenStreamComponents components) {
wrappedComponents.set(components);
return new TokenStreamComponents(
components.getSource(), new LowerCaseFilter(components.getTokenStream()));
}
};
wrapperAnalyzerStrategy.setUp(wrapperAnalyzer, wrappedAnalyzer, wrappedComponents);

TokenStream ts = wrapperAnalyzer.tokenStream("", "text");
TokenStream ts2 = wrapperAnalyzer.tokenStream("", "text");
assertEquals(ts2, ts);

reuse.set(false);
TokenStream ts3 = wrapperAnalyzer.tokenStream("", "text");
assertNotSame(ts3, ts2);
TokenStream ts4 = wrapperAnalyzer.tokenStream("", "text");
assertNotSame(ts4, ts3);

reuse.set(true);
TokenStream ts5 = wrapperAnalyzer.tokenStream("", "text");
assertEquals(ts5, ts4);
TokenStream ts6 = wrapperAnalyzer.tokenStream("", "text");
assertEquals(ts6, ts5);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import org.apache.lucene.analysis.AnalyzerWrapper;
import org.apache.lucene.analysis.TokenStreamToAutomaton;
import org.apache.lucene.analysis.miscellaneous.ConcatenateGraphFilter;
import org.apache.lucene.util.CloseableThreadLocal;

/**
* Wraps an {@link org.apache.lucene.analysis.Analyzer} to provide additional completion-only tuning
Expand Down Expand Up @@ -66,6 +67,9 @@ public final class CompletionAnalyzer extends AnalyzerWrapper {
*/
private final int maxGraphExpansions;

private CloseableThreadLocal<TokenStreamComponents> wrappedComponents =
new CloseableThreadLocal<>();

/**
* Wraps an analyzer to convert its output token stream to an automaton
*
Expand Down Expand Up @@ -112,6 +116,25 @@ public CompletionAnalyzer(
ConcatenateGraphFilter.DEFAULT_MAX_GRAPH_EXPANSIONS);
}

/**
* Creates CompletionAnalyzer with the given analyzer, preserving token separation, position
* increments, and using the {@link
* org.apache.lucene.analysis.AnalyzerWrapper.WrappingReuseStrategy} reuse strategy
*/
public CompletionAnalyzer(
Analyzer analyzer,
boolean preserveSep,
boolean preservePositionIncrements,
ReuseStrategy fallbackStrategy) {
super(new WrappingReuseStrategy(fallbackStrategy));
// häckidy-hick-hack, because we cannot call super() with a reference to "this":
((WrappingReuseStrategy) getReuseStrategy()).setUp(this, analyzer, wrappedComponents);
this.analyzer = analyzer;
this.preserveSep = preserveSep;
this.preservePositionIncrements = preservePositionIncrements;
this.maxGraphExpansions = ConcatenateGraphFilter.DEFAULT_MAX_GRAPH_EXPANSIONS;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   /**
   * JAVA DOCS about when you should use this and what it does
   */
   public static CompletionAnalyzer withWrappingReuseStrategy(
      Analyzer analyzer,
      boolean preserveSep,
      boolean preservePositionIncrements,
      ReuseStrategy fallbackStrategy) {
    CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer, preserveSep, preservePositionIncrements, fallbackStrategy);
    ((WrappingReuseStrategy)completionAnalyzer.getReuseStrategy()).setUp(completionAnalyzer, analyzer, completionAnalyzer.wrappedComponents);
    return completionAnalyzer;
  }

What if we made the ctor private, removed the setup hack and did it instead within a public static method?


/**
* Calls {@link #CompletionAnalyzer(org.apache.lucene.analysis.Analyzer, boolean, boolean, int)}
* preserving token separation and position increments
Expand Down Expand Up @@ -145,9 +168,16 @@ protected Analyzer getWrappedAnalyzer(String fieldName) {
return analyzer;
}

@Override
public void close() {
super.close();
wrappedComponents.close();
}

@Override
protected TokenStreamComponents wrapComponents(
String fieldName, TokenStreamComponents components) {
wrappedComponents.set(components);
CompletionTokenStream tokenStream =
new CompletionTokenStream(
components.getTokenStream(),
Expand Down
Loading