Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reusing a prebuilt pipeline for multiple API calls #28

Open
ilya-kolchinsky opened this issue Jan 10, 2025 · 11 comments
Open

Support reusing a prebuilt pipeline for multiple API calls #28

ilya-kolchinsky opened this issue Jan 10, 2025 · 11 comments
Assignees

Comments

@ilya-kolchinsky
Copy link
Collaborator

This should include optionally altering the parameters specified at pipeline construction time.

@hemajv hemajv self-assigned this Jan 13, 2025
@dmartinol
Copy link

Can we have these API compatible with the interfaces DocumentStoreRetriever and DocumentStoreIngestore defined in the instructlab package here?

FYI, @jwm4 @anastasds

@ilya-kolchinsky
Copy link
Collaborator Author

The ingestion interface (DocumentStoreIngestore) should not be a problem as it is identical to what we already have available. However, DocumentStoreRetriever significantly deviates from Pragmatic since it only handles the retrieval + prompt building whereas Pragmatic encapsulates the communication with the model and executes the prompt.
If absolutely necessary and there is no workaround, we can discuss a possibility of adding a "no_execute" flag that would instruct Pragmatic to return the augmented prompt instead of the query result.

@dmartinol
Copy link

If absolutely necessary and there is no workaround, we can discuss a possibility of adding a "no_execute" flag that would instruct Pragmatic to return the augmented prompt instead of the query result.

IMO it's necessary, otherwise we should have to move all the model communications to pragmatic, including those related to the non-RAG cases.

@jwm4
Copy link

jwm4 commented Jan 20, 2025

Here are some thoughts from me:

  1. I think the full end-to-end run-time pipeline for RAG should live inside PRAGmatic (which, as I understand it, is Ilya's view too). I think that's very important for unlocking the ability to innovate effectively within PRAGmatic and for getting some amount of consistency in behavior across different consumers of PRAGmatic.
  2. I don't agree with Daniele that this would require PRAGmatic to handle non-RAG model calls.
  3. As we discussed on Slack a while back, I think the best way for ilab chat to integrate with the RAG pipeline is via the OpenAI API. I think PRAGmatic should provide an OpenAI wrapper that wraps the entire RAG pipeline.
  4. I think we should plan to unwind the changes that Anastas made for RAG in chat in 1.4 and replace them with something where the only RAG specific code in the ilab chat code comes when instantiating the OpenAI instance that implements the chat; if RAG is on then we instantiate a RAG/PRAGmatic OpenAI object and if RAG is off we instantiate one that talks directly to the model.

@anastasds
Copy link

anastasds commented Jan 20, 2025

I do not have an opinion on direction for PRAGmatic itself and will only speak to an architectural point of view regarding InstructLab.

I struggle to find a coherent product design philosophy behind encapsulating model interactions within a RAG pipeline. Adding a second model interaction paradigm would almost definitely lead to duplication of code, feature disparity across the two, configuration complexity, a doubling of implementation cost of possible new chat-related functionality, likely double model evaluation costs by throwing into question equivalency of experimental setup, unnecessarily silo domain knowledge within the organization, complicate prompt template management, create a more complex and less understandable user interface, and more.

Regarding the OpenAI-compatible API around PRAGatic, that would make sense to me if it were returning model responses. Otherwise, coercing a retrieval pipeline to return outputs that are formatted like chat model responses seems to be a significant amount of unnecessary complexity that, as far as I can tell, actually subtracts value.

Another way of expressing this all is from the point of view of modularization - "things that change together should stay together". (See also: shearing layers.)Changes to model interactions in the InstructLab product should have to be made in only one place.

This topic throws into question about whether PRAGmatic is the best fit for InstructLab's immediate need for a retrieval pipeline whose output can be incorporated into its existing chat functionality. Perhaps there is an underlying retrieval component that should be a common component of both PRAGmatic and InstructLab?

@jwm4
Copy link

jwm4 commented Jan 20, 2025

I agree with you that "things that change together should stay together" -- but I feel the entire RAG pipeline (including retrieval and generation) is something that should change together, which is why I think it should stay together. Also, I agree that the OpenAI API makes sense only if it returns a response: that is what I was proposing.

@jwm4
Copy link

jwm4 commented Jan 20, 2025

We have a lot going on this week, so let's revisit next week once we've all had more time to think about this topic.

@ilya-kolchinsky
Copy link
Collaborator Author

Two things to mention here:

  1. The issue under which we are commenting is the one to introduce these "OpenAI API wrappers" that would encapsulate the entire RAG pipeline and return model responses. So, if we decide to implement it the way Bill suggested (which I like a lot), it will be supported out-of-the-box.
  2. On the other hand, why don't we go the other way around? How about having ilab provide an OpenAI API compatible object to Pragmatic through which the latter would interact with the model? It would require from Pragmatic to support using a client-supplied Python object for LLM communication (as opposed to creating it's own one as it currently works), and it will solve the potential code duplication / work duplication issues mentioned by Anastas. And it's a useful feature anyway.

@jwm4
Copy link

jwm4 commented Jan 20, 2025

Or even do both: Have an OpenAI-compatible wrapper which takes as input an OpenAI-compatible responde generation object.

@anastasds
Copy link

anastasds commented Jan 20, 2025

why don't we go the other way around? How about having ilab provide an OpenAI API compatible object to Pragmatic through which the latter would interact with the model?

I almost left a comment that one approach I see to having a unified chat paradigm would be with "InstructLab as a backend" to PRAGmatic, which is the same exact idea. However, the naive approach - InstructLab uses PRAGmatic for retrieval, PRAGmatic uses InstructLab for chat - would create a circular dependency between the two codebases. Even if both publish client libraries independent of their core codebases, the dependency graph would end up having a loop (or be at significant risk of creating one in the future).

@dmartinol
Copy link

Perhaps there is an underlying retrieval component that should be a common component of both PRAGmatic and InstructLab?

This looks like the implementation of the DocumentStoreRetriever interface from which we started the discussion, doesn't it?

BTW: I apologize for that, but my initial understanding was that PRAGmatic would act as a provider of modular RAG components (ingest, retrieve and similar functions to further configure the RAG pipeline), rather than serving as the platform to execute the RAG runtime. Even if the final result may be the same, the impact on the client platform(s) may be different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants