Skip to content

Standardise model deployment handling across synthetic data generators #134

@jjuritzno10

Description

@jjuritzno10

Background

Follow-up from PR #113 review comment by @tnetennba3: #113 (comment)

response_generator.py accepts deployment as a parameter (resolving to gpt-5-nano in practice), while the other generators (context_generator.py, question_generator.py, theme_generator.py) hardcode "gpt-5-mini". The mixed approach is inconsistent and the rationale for which model is used where is not documented.

Proposal

  • Decide on a consistent way to specify the deployment across all four generators (either all hardcoded, or all parameterised — pick one).
  • Add brief comments at each call site explaining why a given model was chosen (e.g. gpt-5-nano for cheaper bulk response generation, gpt-5-mini for higher-quality context/question/theme generation).

Scope

  • evals/synthetic/llm_generators/context_generator.py
  • evals/synthetic/llm_generators/question_generator.py
  • evals/synthetic/llm_generators/theme_generator.py
  • evals/synthetic/llm_generators/response_generator.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions