-
Notifications
You must be signed in to change notification settings - Fork 304
[Feature] Qwen3 Reranker #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
004f0a4
to
c16c0bd
Compare
c16c0bd
to
5763cd3
Compare
2f91a32
to
221323b
Compare
@sigridjineth Thanks for the great work! Excited to see if this will get merged |
I wonder if it would be a simpler code change to support the model as a SequenceClassificationModel as mentioned in this discussion? |
221323b
to
b7eabe1
Compare
f3289f5
to
c2742b7
Compare
c2742b7
to
7d32afe
Compare
I will try to work on #698 |
Mark |
I tried this PR with a "converted-to-classifier" Qwen3-reranker-0.6B and it litteraly explodes at warming up, trying to allocate more than 80GB (tested on the Metal version on my mac). Something obvious I may have missed ? |
What does this PR do?
This PR adds support for Qwen3 reranker models to
text-embeddings-inference
. (Issue: #643)These models function as binary classifiers that determine the relevance between a query and a document. They output a simple probability score, making them perfect for re-ranking search results.
Key Changes
ListwiseReranker
model type to properly distinguish these models from standard cross-encoder models.predict
method to extract the logits for "yes" and "no" tokens.is_reranker
flag is set in the model's config.Technical Details
9693
) and "no" (ID:2152
) tokens.Who can review?
Anyone in the community is welcome to review the PR once the tests have passed. Feel free to tag anyone who might be interested.
@OlivierDehaene or @Narsil