-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple LLMBlocks in a pipeline that use different inference endpoints #521
Comments
@williamcaban Tagging you for visibility here, since this is related to a discussion we had elsewhere recently about preparing for cases where we need separate OpenAI Client endpoints in a single Pipeline. I outlined what I believe the general issue to be above, including some proposed (but not set in stone) backwards-compatible design changes to enable this. |
Is |
The The only new attribute in the pipeline config I propose above is the |
This change allows a user to construct a PipelineContext with multiple OpenAI clients, such as: ```python PipelineContext( clients={ "default": OpenAI(base_url="https://foo.local"), "server_a": OpenAI(base_url="https://server_a.local"), "server_b": OpenAI(base_url="https://server_b.local"), } ) ``` And then, within the pipeline yaml, choose which client to apply to which LLMBlock via a new `client` key, such as: ```yaml version: "1.0" blocks: - name: server_a_client type: LLMBlock config: client: server_a ... - name: server_b_client type: LLMBlock config: client: server_b ... ``` See `docs/examples/multiple_llm_clients` for more details and a full example. Resolves instructlab#521 Signed-off-by: Ben Browning <[email protected]>
This change allows a user to construct a PipelineContext with multiple OpenAI clients, such as: ```python PipelineContext( clients={ "default": OpenAI(base_url="https://foo.local"), "server_a": OpenAI(base_url="https://server_a.local"), "server_b": OpenAI(base_url="https://server_b.local"), } ) ``` And then, within the pipeline yaml, choose which client to apply to which LLMBlock via a new `client` key, such as: ```yaml version: "1.0" blocks: - name: server_a_client type: LLMBlock config: client: server_a ... - name: server_b_client type: LLMBlock config: client: server_b ... ``` See `docs/examples/multiple_llm_clients` for more details and a full example. Resolves instructlab#521 Signed-off-by: Ben Browning <[email protected]>
We have some upcoming features coming out of our research team that will require Pipelines that contain multiple LLMBlocks, with each LLMBlock potentially using a different model id deployed on the same inference server, different model family (ie granite vs mistral and so on), or even different inference endpoints entirely for each.
Today users specify the inference endpoint via passing an OpenAI Client into the PipelineContext. This client gets used for every LLMBlock, with no ability to map the client to each LLMBlock.
What we need is some way to pass in multiple OpenAI clients, and map each to the relevant LLMBlock. One example of what this could look like:
Python code
We pass a map of OpenAI clients to PipelineContext in this example instead of a single one. If a single one was passed (ie for backwards compatibility), we could turn this internally into a map with that single client as the
"default"
value. Users would be able to control the full array of client parameters here, including SSL cert handling, timeouts, retries, and anything else that can be configured on the OpenAI client or its underlying httpx.Client.pipeline.yaml
Then, in your pipeline.yaml, we map each LLMBlock to a client. Any block that does not specify a client gets the
"default"
client. Otherwise, they can choose a string value that picks one of the clients passed into the PipelineContext out of that map. This allows us to create a re-usable Pipeline that expects N clients, and provides users running that Pipeline a way to configure those clients for their specific environment without modifying the Pipeline yaml itself.The text was updated successfully, but these errors were encountered: