Skip to content

Resolve confusion about "batching" support #174

Closed as not planned
Closed as not planned
@markmc

Description

@markmc

Capturing some observations from #157

  1. Pipeline has "batching" support - it can shard the dataset and spawn an instance of the pipeline for each shard - batch_num_workers and batch_size
  2. LLMBlock has "batching" support - it can request multiple chat completions from the OpenAI server using the n argument - num_instructions_to_generate

In ilab we disable (1) with llama-cpp by passing batch_size=None - see instructlab/instructlab#346

In LLMBlock we disable (2) with llama-cpp with the server_supports_batched which checks whether the n argument works

Resolve:

  • Do we want to call both of these "batching"
  • Do we want to different ways of handling backend-specific capabilities?
  • Should the library be trying to probe the backend for its capabilities, or should the library user give it information about the backend?
  • server_supports_batched should be a property on PipelineContext, not something we set on the OpenAI client object

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requestedstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions