Closed as not planned
Description
Capturing some observations from #157
Pipeline
has "batching" support - it can shard the dataset and spawn an instance of the pipeline for each shard -batch_num_workers
andbatch_size
LLMBlock
has "batching" support - it can request multiple chat completions from the OpenAI server using then
argument -num_instructions_to_generate
In ilab
we disable (1) with llama-cpp by passing batch_size=None
- see instructlab/instructlab#346
In LLMBlock
we disable (2) with llama-cpp with the server_supports_batched
which checks whether the n
argument works
Resolve:
- Do we want to call both of these "batching"
- Do we want to different ways of handling backend-specific capabilities?
- Should the library be trying to probe the backend for its capabilities, or should the library user give it information about the backend?
server_supports_batched
should be a property onPipelineContext
, not something we set on the OpenAI client object