Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple LLMBlocks in a pipeline that use different inference endpoints #521

Open
bbrowning opened this issue Jan 28, 2025 · 3 comments
Labels
enhancement New feature or request epic Larger tracking issue encompassing multiple smaller issues

Comments

@bbrowning
Copy link
Contributor

We have some upcoming features coming out of our research team that will require Pipelines that contain multiple LLMBlocks, with each LLMBlock potentially using a different model id deployed on the same inference server, different model family (ie granite vs mistral and so on), or even different inference endpoints entirely for each.

Today users specify the inference endpoint via passing an OpenAI Client into the PipelineContext. This client gets used for every LLMBlock, with no ability to map the client to each LLMBlock.

What we need is some way to pass in multiple OpenAI clients, and map each to the relevant LLMBlock. One example of what this could look like:

Python code

clients = {
  "default": OpenAI(base_url="http://localhost:8000/v1", api_key="foo"),
  "granite-endpoint": OpenAI(base_url="http://localhost:8001/v1", api_key="bar"),
}
ctx = PipelineContext(clients=clients)

We pass a map of OpenAI clients to PipelineContext in this example instead of a single one. If a single one was passed (ie for backwards compatibility), we could turn this internally into a map with that single client as the "default" value. Users would be able to control the full array of client parameters here, including SSL cert handling, timeouts, retries, and anything else that can be configured on the OpenAI client or its underlying httpx.Client.

pipeline.yaml

version: "1.0"
blocks:
  - name: model_one
    type: LLMBlock
    config:
      model_family: mixtral
      model_id: Mixtral-8x7B-Instruct-v0.1
      config_path: model_one_config.yaml
      output_cols:
        - column_one
      gen_kwargs:
        max_tokens: 2048

  - name: model_two
    type: LLMBlock
    config:
      client: granite-endpoint
      model_family: granite
      model_id: granite-7b-lab
      config_path: model_two_config.yaml
      output_cols:
        - column_two
      gen_kwargs:
        max_tokens: 512

  - name: model_three
    type: LLMBlock
    config:
      client: granite-endpoint
      model_id: granite-7b-lab
      model_prompt: <s> [INST] {prompt} [/INST]
      config_path: model_three_config.yaml
      output_cols:
        - column_three
      gen_kwargs:
        max_tokens: 5

Then, in your pipeline.yaml, we map each LLMBlock to a client. Any block that does not specify a client gets the "default" client. Otherwise, they can choose a string value that picks one of the clients passed into the PipelineContext out of that map. This allows us to create a re-usable Pipeline that expects N clients, and provides users running that Pipeline a way to configure those clients for their specific environment without modifying the Pipeline yaml itself.

@bbrowning bbrowning added enhancement New feature or request epic Larger tracking issue encompassing multiple smaller issues labels Jan 28, 2025
@bbrowning
Copy link
Contributor Author

@williamcaban Tagging you for visibility here, since this is related to a discussion we had elsewhere recently about preparing for cases where we need separate OpenAI Client endpoints in a single Pipeline. I outlined what I believe the general issue to be above, including some proposed (but not set in stone) backwards-compatible design changes to enable this.

@williamcaban
Copy link

Is config_path: modifying client behavior? (Like setting temperatures etc). Is that part of existing constructs or should it be cover in this enhancement?

@bbrowning
Copy link
Contributor Author

The config_path key already exists in the Pipeline yamls - ie https://github.com/instructlab/sdg/blob/v0.7.0/src/instructlab/sdg/pipelines/full/knowledge.yaml#L12 . That's where we specify the actual prompt to use as well as any start/end tags to give to the model during generation.

The only new attribute in the pipeline config I propose above is the client key to choose which OpenAI client to use. I didn't explicitly state that in the text above, so thanks for asking the clarifying questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request epic Larger tracking issue encompassing multiple smaller issues
Projects
None yet
Development

No branches or pull requests

2 participants