Resolve confusion about "batching" support

Capturing some observations from #157

1. `Pipeline` has "batching" support - it can shard the dataset and spawn an instance of the pipeline for each shard - `batch_num_workers` and `batch_size`
2. `LLMBlock` has "batching" support - it can request multiple chat completions from the OpenAI server using the `n` argument - `num_instructions_to_generate`

In `ilab` we disable (1) with llama-cpp by passing `batch_size=None` - see instructlab/instructlab#346

In `LLMBlock` we disable (2) with llama-cpp with the `server_supports_batched` which checks whether the `n` argument works

Resolve:

- Do we want to call both of these "batching"
- Do we want to different ways of handling backend-specific capabilities?
- Should the library be trying to probe the backend for its capabilities, or should the library user give it information about the backend?
- `server_supports_batched` should be a property on `PipelineContext`, not something we set on the OpenAI client object

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolve confusion about "batching" support #174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Resolve confusion about "batching" support #174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions