Skip to content

JSON serialization error with Gemini 2.5 models in multi-turn evaluation #38

@AbhiPrasad

Description

Bug Report

Error Message:

TypeError: Object of type ChatCompletionMessage is not JSON serializable

Location: sdk/py/src/braintrust/functions/invoke.py:173
Affected Versions: Current Braintrust SDK (Python)
Impact: Blocks evaluation of multi-turn conversations when using Gemini 2.5 models with LLM judges

Symptoms

  1. Model-specific: Only occurs with Gemini 2.5 models (e.g., gemini-2.5-flash)
  2. Multi-turn specific: Single-turn conversations work fine
  3. Judge-specific: Only fails when LLM judge scorers are used (local scorers work)
  4. Consistent failure: Reproducible 100% of the time with the specific conditions
  5. Other models unaffected: GPT-4o-mini and other models work correctly with the same setup

Reproduction

The issue occurs when:

  1. Loading a multi-turn conversation dataset
  2. Evaluating with gemini-2.5-flash (or other Gemini 2.5 models)
  3. Using LLM judge scorers loaded via init_function() (e.g., rag-factuality-llm, rag-relevance-llm, rag-completeness-llm)

Key code path:

# LLM judges loaded from Braintrust
rag_factuality_llm = init_function(project_name=project_name, slug="rag-factuality-llm")

# Evaluation with multi-turn conversation dataset
# → triggers JSON serialization of ChatCompletionMessage at invoke.py:173

The ChatCompletionMessage object returned from the Gemini 2.5 model response is not being serialized to a plain dict/JSON-compatible format before being passed to the LLM judge scorer. This works with other models (e.g., GPT-4o-mini) that may return serializable types by default.

Labels

  • python
  • vertex-ai

Linear issue: https://linear.app/braintrustdata/issue/BRA-2972/json-serialization-error-with-gemini-25-models-in-multi-turn

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions