Support multiple LLMBlocks in a pipeline that use different inference endpoints #521

bbrowning · 2025-01-28T20:37:05Z

We have some upcoming features coming out of our research team that will require Pipelines that contain multiple LLMBlocks, with each LLMBlock potentially using a different model id deployed on the same inference server, different model family (ie granite vs mistral and so on), or even different inference endpoints entirely for each.

Today users specify the inference endpoint via passing an OpenAI Client into the PipelineContext. This client gets used for every LLMBlock, with no ability to map the client to each LLMBlock.

What we need is some way to pass in multiple OpenAI clients, and map each to the relevant LLMBlock. One example of what this could look like:

Python code

clients = {
  "default": OpenAI(base_url="http://localhost:8000/v1", api_key="foo"),
  "granite-endpoint": OpenAI(base_url="http://localhost:8001/v1", api_key="bar"),
}
ctx = PipelineContext(clients=clients)

We pass a map of OpenAI clients to PipelineContext in this example instead of a single one. If a single one was passed (ie for backwards compatibility), we could turn this internally into a map with that single client as the "default" value. Users would be able to control the full array of client parameters here, including SSL cert handling, timeouts, retries, and anything else that can be configured on the OpenAI client or its underlying httpx.Client.

pipeline.yaml

version: "1.0"
blocks:
  - name: model_one
    type: LLMBlock
    config:
      model_family: mixtral
      model_id: Mixtral-8x7B-Instruct-v0.1
      config_path: model_one_config.yaml
      output_cols:
        - column_one
      gen_kwargs:
        max_tokens: 2048

  - name: model_two
    type: LLMBlock
    config:
      client: granite-endpoint
      model_family: granite
      model_id: granite-7b-lab
      config_path: model_two_config.yaml
      output_cols:
        - column_two
      gen_kwargs:
        max_tokens: 512

  - name: model_three
    type: LLMBlock
    config:
      client: granite-endpoint
      model_id: granite-7b-lab
      model_prompt: <s> [INST] {prompt} [/INST]
      config_path: model_three_config.yaml
      output_cols:
        - column_three
      gen_kwargs:
        max_tokens: 5

Then, in your pipeline.yaml, we map each LLMBlock to a client. Any block that does not specify a client gets the "default" client. Otherwise, they can choose a string value that picks one of the clients passed into the PipelineContext out of that map. This allows us to create a re-usable Pipeline that expects N clients, and provides users running that Pipeline a way to configure those clients for their specific environment without modifying the Pipeline yaml itself.

The text was updated successfully, but these errors were encountered:

bbrowning · 2025-01-28T20:40:29Z

@williamcaban Tagging you for visibility here, since this is related to a discussion we had elsewhere recently about preparing for cases where we need separate OpenAI Client endpoints in a single Pipeline. I outlined what I believe the general issue to be above, including some proposed (but not set in stone) backwards-compatible design changes to enable this.

williamcaban · 2025-01-29T11:44:24Z

Is config_path: modifying client behavior? (Like setting temperatures etc). Is that part of existing constructs or should it be cover in this enhancement?

bbrowning · 2025-01-29T16:14:59Z

The config_path key already exists in the Pipeline yamls - ie https://github.com/instructlab/sdg/blob/v0.7.0/src/instructlab/sdg/pipelines/full/knowledge.yaml#L12 . That's where we specify the actual prompt to use as well as any start/end tags to give to the model during generation.

The only new attribute in the pipeline config I propose above is the client key to choose which OpenAI client to use. I didn't explicitly state that in the text above, so thanks for asking the clarifying questions!

This change allows a user to construct a PipelineContext with multiple OpenAI clients, such as: ```python PipelineContext( clients={ "default": OpenAI(base_url="https://foo.local"), "server_a": OpenAI(base_url="https://server_a.local"), "server_b": OpenAI(base_url="https://server_b.local"), } ) ``` And then, within the pipeline yaml, choose which client to apply to which LLMBlock via a new `client` key, such as: ```yaml version: "1.0" blocks: - name: server_a_client type: LLMBlock config: client: server_a ... - name: server_b_client type: LLMBlock config: client: server_b ... ``` See `docs/examples/multiple_llm_clients` for more details and a full example. Resolves instructlab#521 Signed-off-by: Ben Browning <[email protected]>

bbrowning added enhancement New feature or request epic Larger tracking issue encompassing multiple smaller issues labels Jan 28, 2025

bbrowning linked a pull request Feb 19, 2025 that will close this issue

Allow multiple OpenAI clients per Pipeline #563

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple LLMBlocks in a pipeline that use different inference endpoints #521

Support multiple LLMBlocks in a pipeline that use different inference endpoints #521

bbrowning commented Jan 28, 2025

bbrowning commented Jan 28, 2025

williamcaban commented Jan 29, 2025

bbrowning commented Jan 29, 2025

Support multiple LLMBlocks in a pipeline that use different inference endpoints #521

Support multiple LLMBlocks in a pipeline that use different inference endpoints #521

Comments

bbrowning commented Jan 28, 2025

bbrowning commented Jan 28, 2025

williamcaban commented Jan 29, 2025

bbrowning commented Jan 29, 2025