-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thoughts about design philosophy of RankLLM #109
Comments
Copy pasting comments from our slack discussion: Ronak: "yup these are two directions we can take it in. I am not sure what do you prefer Sahel: "When I decided to keep it as a separate repo, I had option 2 in mind as well. That's why it has a pyserini retriever. I see pros and cons for both approaches. My main concern about the #1 is expanding rankllm. For example adding training and other types of ranking prompts like pointwise. I think having it separate might make it easier to expand. I think Pyserini on its own is large enough and expanding. |
Current state of the design is available in these examples:
Comment from @lintool on the current design: |
Just dropping thoughts here:
I'm not sure if one would hold it down generally (besides training). With training specifically, I think the dependency charts with pyserini will be affected (because I think training frameworks quickly go through torch/transformers versions while Pyserini is slower). One issue with training addition is that additionally, we'll have to benchmark our models whenever these changes are made, especially if 2CR pages.
I think retrieval is always going to be important to users (even in our pipelines), but at the end of the day, they might not use Pyserini for it. Academically, we do need these coupled well, and I'm sure the community will use it. Practically, I think people will just use some LangChain/LLaMAindex/Vespa most of the time which can interface with RankLLM and have a multi-stage system like that. At least so I think. |
Yup, I am not sure if it is worse than 2, it is just with some of the prereq baggage of 1, making 2 a bit annoying. I would say there's a lot to be done to make it easier and accessible to use, simplifying some workflows/consistency etc, but those can be worked on. |
I agree with @ronakice: as a retriever I think a hybrid search via LangChain or even simply BM25 via LangChain is as popular as Pyserini if not more. |
Seems like we're leaning to Approach 2. I concur with this decision. |
Approach 2 is now implemented! |
What is RankLLM? I can think of two obvious answers:
Approach 1. RankLLM is a fully-integrated layer on top of Anserini and Pyserini.
If this is the case, then we need "deep" integration with Pyserini, pulling it in as a dependency (perhaps parts of it optional, etc.). Iteration would need to be coupled with Pyserini, and likely slower.
Approach 2. RankLLM is a lightweight general-purpose reranking library.
Basically, we can rerank anything... just give us something in this JSON format, and we'll rerank it for you. By the way, you can get the candidates from Pyserini, here's the command you run.
In this case, RankLLM does not need to have Pyserini as a dependency. We just need shim code in Pyserini to get its output into the right format. And for that, Anserini directly also.
Integration is not as tight - but this simplifies dependencies quite a bit...
Thoughts about these two approaches @ronakice @sahel-sh ?
The text was updated successfully, but these errors were encountered: