Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple LLMBlocks in a pipeline that use different inference endpoints #521

Open
bbrowning opened this issue Jan 28, 2025 · 3 comments · May be fixed by #563
Open

Support multiple LLMBlocks in a pipeline that use different inference endpoints #521

bbrowning opened this issue Jan 28, 2025 · 3 comments · May be fixed by #563
Labels
enhancement New feature or request epic Larger tracking issue encompassing multiple smaller issues

Comments

@bbrowning
Copy link
Contributor

We have some upcoming features coming out of our research team that will require Pipelines that contain multiple LLMBlocks, with each LLMBlock potentially using a different model id deployed on the same inference server, different model family (ie granite vs mistral and so on), or even different inference endpoints entirely for each.

Today users specify the inference endpoint via passing an OpenAI Client into the PipelineContext. This client gets used for every LLMBlock, with no ability to map the client to each LLMBlock.

What we need is some way to pass in multiple OpenAI clients, and map each to the relevant LLMBlock. One example of what this could look like:

Python code

clients = {
  "default": OpenAI(base_url="http://localhost:8000/v1", api_key="foo"),
  "granite-endpoint": OpenAI(base_url="http://localhost:8001/v1", api_key="bar"),
}
ctx = PipelineContext(clients=clients)

We pass a map of OpenAI clients to PipelineContext in this example instead of a single one. If a single one was passed (ie for backwards compatibility), we could turn this internally into a map with that single client as the "default" value. Users would be able to control the full array of client parameters here, including SSL cert handling, timeouts, retries, and anything else that can be configured on the OpenAI client or its underlying httpx.Client.

pipeline.yaml

version: "1.0"
blocks:
  - name: model_one
    type: LLMBlock
    config:
      model_family: mixtral
      model_id: Mixtral-8x7B-Instruct-v0.1
      config_path: model_one_config.yaml
      output_cols:
        - column_one
      gen_kwargs:
        max_tokens: 2048

  - name: model_two
    type: LLMBlock
    config:
      client: granite-endpoint
      model_family: granite
      model_id: granite-7b-lab
      config_path: model_two_config.yaml
      output_cols:
        - column_two
      gen_kwargs:
        max_tokens: 512

  - name: model_three
    type: LLMBlock
    config:
      client: granite-endpoint
      model_id: granite-7b-lab
      model_prompt: <s> [INST] {prompt} [/INST]
      config_path: model_three_config.yaml
      output_cols:
        - column_three
      gen_kwargs:
        max_tokens: 5

Then, in your pipeline.yaml, we map each LLMBlock to a client. Any block that does not specify a client gets the "default" client. Otherwise, they can choose a string value that picks one of the clients passed into the PipelineContext out of that map. This allows us to create a re-usable Pipeline that expects N clients, and provides users running that Pipeline a way to configure those clients for their specific environment without modifying the Pipeline yaml itself.

@bbrowning bbrowning added enhancement New feature or request epic Larger tracking issue encompassing multiple smaller issues labels Jan 28, 2025
@bbrowning
Copy link
Contributor Author

@williamcaban Tagging you for visibility here, since this is related to a discussion we had elsewhere recently about preparing for cases where we need separate OpenAI Client endpoints in a single Pipeline. I outlined what I believe the general issue to be above, including some proposed (but not set in stone) backwards-compatible design changes to enable this.

@williamcaban
Copy link

Is config_path: modifying client behavior? (Like setting temperatures etc). Is that part of existing constructs or should it be cover in this enhancement?

@bbrowning
Copy link
Contributor Author

The config_path key already exists in the Pipeline yamls - ie https://github.com/instructlab/sdg/blob/v0.7.0/src/instructlab/sdg/pipelines/full/knowledge.yaml#L12 . That's where we specify the actual prompt to use as well as any start/end tags to give to the model during generation.

The only new attribute in the pipeline config I propose above is the client key to choose which OpenAI client to use. I didn't explicitly state that in the text above, so thanks for asking the clarifying questions!

bbrowning added a commit to bbrowning/instructlab-sdg that referenced this issue Feb 19, 2025
This change allows a user to construct a PipelineContext with multiple
OpenAI clients, such as:

```python
PipelineContext(
    clients={
        "default": OpenAI(base_url="https://foo.local"),
        "server_a": OpenAI(base_url="https://server_a.local"),
        "server_b": OpenAI(base_url="https://server_b.local"),
    }
)
```

And then, within the pipeline yaml, choose which client to apply to
which LLMBlock via a new `client` key, such as:

```yaml
version: "1.0"
blocks:
  - name: server_a_client
    type: LLMBlock
    config:
      client: server_a
      ...

  - name: server_b_client
    type: LLMBlock
    config:
      client: server_b
      ...
```

See `docs/examples/multiple_llm_clients` for more details and a full
example.

Resolves instructlab#521

Signed-off-by: Ben Browning <[email protected]>
@bbrowning bbrowning linked a pull request Feb 19, 2025 that will close this issue
bbrowning added a commit to bbrowning/instructlab-sdg that referenced this issue Feb 19, 2025
This change allows a user to construct a PipelineContext with multiple
OpenAI clients, such as:

```python
PipelineContext(
    clients={
        "default": OpenAI(base_url="https://foo.local"),
        "server_a": OpenAI(base_url="https://server_a.local"),
        "server_b": OpenAI(base_url="https://server_b.local"),
    }
)
```

And then, within the pipeline yaml, choose which client to apply to
which LLMBlock via a new `client` key, such as:

```yaml
version: "1.0"
blocks:
  - name: server_a_client
    type: LLMBlock
    config:
      client: server_a
      ...

  - name: server_b_client
    type: LLMBlock
    config:
      client: server_b
      ...
```

See `docs/examples/multiple_llm_clients` for more details and a full
example.

Resolves instructlab#521

Signed-off-by: Ben Browning <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request epic Larger tracking issue encompassing multiple smaller issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants