feat(mode): add TOON mode for compact token-efficient structured outputs #1976

hsaeed3 · 2025-12-29T22:05:43Z

Description

Add Mode.TOON for TOON (Token-Oriented Object Notation) - a compact data format that achieves 30-60% token reduction compared to JSON while maintaining structured output capabilities.

TOON uses a YAML-like syntax that eliminates JSON's redundant braces, brackets, and quotes:

from instructor import from_openai, Mode
from openai import OpenAI

client = from_openai(OpenAI(), mode=Mode.TOON)

user = client.create(
    messages=[{"role": "user", "content": "John Doe is 25 years old"}],
    model="gpt-4o-mini",
    response_model=User,
)

Changes

Add Mode.TOON enum value classified as JSON-like mode
Implement handle_toon / reask_toon request handlers in OpenAI utils
Add parse_toon method to OpenAISchema for response parsing
Support partial streaming via line-based TOON parsing
Add extract_code_block_from_stream utilities for streaming
Recursive structure generation for nested Pydantic models
Enable for OpenAI-compatible providers (OpenAI, OpenRouter, Anyscale, Together, Databricks)
Early import check fails before API call if toon-format not installed

Testing

Unit tests for Mode enum, structure generation, handlers
Integration tests for TOON parsing and code block extraction
Manual testing with simple, nested, list[str], and list[Model] schemas
Streaming tested with create_partial

Install with: pip install 'instructor[toon]'

Uses toon-format library: https://github.com/toon-format/toon-python

This PR was written by Cursor

Important

Add Mode.TOON for compact structured outputs with new handlers, parsers, and tests.

Behavior:
- Add Mode.TOON for compact token-efficient structured outputs.
- Implement handle_toon and reask_toon in utils.py for TOON request handling.
- Add parse_toon to function_calls.py for TOON response parsing.
- Support partial streaming via line-based TOON parsing in partial.py.
- Add extract_code_block_from_stream utilities in core.py.
- Enable TOON mode for OpenAI-compatible providers.
- Early import check for toon-format package.
Testing:
- Unit tests for TOON mode in test_toon_mode.py.
- Integration tests for TOON parsing and code block extraction.
- Manual testing with various schemas.
Misc:
- Add toon-format to pyproject.toml dependencies.

^{This description was created by}^{for 40a6603. You can customize this summary. It will automatically update as commits are pushed.}

…ured outputs TOON (Token-Oriented Object Notation) is a YAML-like format that achieves 30-60% token reduction on LLM outputs compared to JSON. Uses toon-format library: https://github.com/toon-format/toon-python Features: - Add Mode.TOON enum value classified as JSON-like mode (wasn't sure exactly where it would fit, but made the most sense within the JSON classification) - Implement handle_toon/reask_toon request handlers - Add parse_toon to OpenAISchema for response parsing - Support partial streaming via line-based parsing - Add extract_code_block_from_stream utilities - Recursive structure generation for nested Pydantic models - Proper TOON array format with [N] count markers - String quoting hints for numeric-looking string fields Supported types: - Simple fields (str, int, float, bool) - Nested Pydantic models - Lists of primitives (list[str], list[int]) - Lists of objects (list[Model]) Enabled for OpenAI-compatible providers: - OpenAI, OpenRouter, Anyscale, Together, Databricks NOTES: - I've tested various response cases using the models listed below, but further testing *may* still be required to ensure complex nested schemas are properly in both standard and partial cases: - openai/gpt-4o-mini - openai/gpt-5.2 - openrouter/moonshotai/kimi-k2-0905 - openrouter/google/gemma-2-27b-it - The dynamic system prompt being used to provide TOON response schema may benefit from optimization through DSPY or a similar framework.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 40a6603 in 2 minutes and 57 seconds. Click for details.

Reviewed 1010 lines of code in 11 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. instructor/core/client.py:809

Draft comment:
Consider centralizing the provider mode lists to avoid duplicating MODE.TOON across multiple provider checks.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 85% None

2. instructor/dsl/partial.py:290

Draft comment:
Avoid silently passing exceptions during TOON decoding; log or handle decoding errors to aid debugging in both sync and async chunk parsers.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% The comment is about code quality and error handling. It's suggesting that the bare except Exception: pass blocks are problematic. However, looking at the logic, the exception handling appears intentional - the code is trying to decode partial TOON content as it streams in, and it's expected that some chunks won't be valid until more data arrives. The code keeps track of last_successful_data and falls back to it if the final decode fails. This is a streaming parser that needs to handle incomplete data gracefully. The silent exception handling seems to be by design for this use case. The comment doesn't provide strong evidence that this is actually a problem - it's more of a general code quality suggestion. Without seeing actual issues or understanding the TOON format better, it's hard to say if logging would be helpful or just noisy. I might be missing that logging could be valuable for debugging streaming issues. The comment could be valid if users need to debug why TOON parsing is failing. However, the exception handling appears intentional for handling partial streaming data, and adding logging might create noise since exceptions are expected during streaming. While logging could theoretically help with debugging, the exception handling here is clearly intentional for handling partial streaming data. The code has a fallback mechanism with last_successful_data, which suggests the author thought through the error handling strategy. This is a speculative suggestion about code quality rather than identifying a concrete bug. The comment doesn't provide evidence that the current approach is causing problems. This comment should be deleted. It's a speculative code quality suggestion without strong evidence of an actual problem. The silent exception handling appears intentional for handling partial streaming data, and the comment doesn't demonstrate that this is causing issues or that logging would improve the code.

3. instructor/providers/openai/utils.py:510

Draft comment:
For type checking in _generate_toon_structure, consider using typing.get_origin and typing.get_args instead of comparing str(origin) with 'typing.Union'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 75% vs. threshold = 85% This is a code quality suggestion about using proper typing utilities. The comment is about new code added in the diff (the _generate_toon_structure function is entirely new). The suggestion is technically correct - typing.get_origin and typing.get_args are the proper way to introspect type hints in Python 3.8+, and comparing str(origin) is fragile. However, I need to check if this is actionable and clear. The comment does provide a specific alternative approach. The current code uses str(origin) == "typing.Union" which could break with different Python versions or typing implementations. This is a legitimate code quality concern about the new code. The comment might be considered somewhat obvious to experienced Python developers who work with typing. Also, the current implementation might work fine in practice, so this could be seen as a minor improvement rather than a critical issue. The comment doesn't explain why the change is needed or what problems the current approach might cause. While it might be somewhat obvious to some developers, using str(origin) for type comparison is indeed a code smell and the suggestion to use typing.get_origin and typing.get_args is a concrete, actionable improvement. This is exactly the kind of code quality refactor that the rules say is good. The comment is clear about what to change and provides the proper alternative. This is a valid code quality suggestion about new code in the diff. It provides a clear, actionable recommendation to use proper typing utilities instead of string comparison. This aligns with the rule that "Comments that suggest code quality refactors are good! But only if they are actionable and clear."

4. instructor/providers/openai/utils.py:560

Draft comment:
Consider adding a closing triple-backtick in the reask_toon error message for consistent TOON code block formatting.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85% None

5. instructor/utils/core.py:322

Draft comment:
Clarify the skip_lang_tag logic in extract_code_block_from_stream by adding an inline comment explaining that a newline ends the language tag.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85% None

6. instructor/processing/function_calls.py:440

Draft comment:
If _extract_toon_from_response is intended for broader use, consider making it public and enhance its documentation.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 85% None

7. instructor/providers/openai/utils.py:579

Draft comment:
Typographical error: In the error message for TOON mode reask, the opening code block marker toon does not have a matching closing marker. Consider updating this to include the proper closing triple backticks (e.g., toon```) to clearly indicate a code block.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85% The comment is about line 579 which contains instructional text for the LLM: "Return your corrected response in a toon code block." This is not markdown being rendered - it's a plain string that will be sent to the LLM as instructions. The backticks are part of the instruction text itself, telling the LLM what format to use. There's no syntax error here. The comment misunderstands the context - this isn't a markdown code block that needs closing, it's text instructing the LLM to create a code block in its response. This is similar to how you might write "Please format your answer in a **bold** style" - the **bold** is just example text, not actual markdown formatting. Could the instruction be clearer if it showed both opening and closing markers? Maybe the LLM would better understand if we wrote "Return your corrected response in a toon ... ``` code block." However, the current format is consistent with how such instructions are typically written, and there's no actual syntax error in the code. While showing both markers might be slightly clearer, the current instruction is perfectly valid and follows common patterns. The comment incorrectly identifies this as a "typographical error" when it's actually intentional instructional text. The code works as intended - it's telling the LLM what format to use in its response. This comment should be deleted. It misidentifies instructional text as a syntax error. The string on line 579 is telling the LLM to format its response in a toon code block - it's not creating a code block itself, so there's no missing closing marker.

Workflow ID: wflow_C6xIqOk9sFwr0sIt

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

jxnl · 2025-12-30T16:41:28Z

can you update docs and specific figure out which models this supports?

…documentation - Add enum coercion for TOON parsing (enums returned as strings from toon-format) - Support Literal, Union, Optional, and Annotated type annotations - Add TOON mode documentation to `docs/modes-comparison.md` - Add unit tests for all supported type annotations Token savings after testing with various schemas & models: ~17% vs JSON, ~20% vs MD_JSON & ~28% vs TOOLS mode

hsaeed3 · 2025-12-30T19:39:17Z

@jxnl was doing a bit more experimentation/testing with this last night, there's definitely meaningful savings in output token length compared to TOOLS/JSON/MD_JSON, along with high model support, but there's a bit of internal complexity I had to add in terms of supporting/parsing Enum/Literal/Union types through response.

Hopefully once this PR is merged, should significantly reduce the complexity I had to add directly to instructor.

I've updated docs with relevant information however, up to you if you'd want to go through with supporting this mode, otherwise happy to close the PR as an experiment for now.

…ponse mode.

jxnl · 2025-12-31T23:26:14Z

should be fine with this as a mode, since all other models can support this too with the same mode right?

whether is gemini, claude, gpt

jxnl · 2025-12-31T23:27:06Z

can you add some runnable scripts in examples/toon/run.py

Add ANTHROPIC_TOON mode for using TOON format with Claude models: - Add handle_anthropic_toon and reask_anthropic_toon handlers - Add parse_anthropic_toon for response parsing - Register Mode.ANTHROPIC_TOON in mode handlers and reask handlers Improve TOON structure generation: - Fix tabular array format for models containing list fields - Use list format instead of tabular when objects have nested lists Add example scripts in examples/toon/: - run.py: Demonstrates OpenAI and Anthropic TOON modes - Includes token usage comparison between TOON, JSON, MD_JSON, and TOOLS Supported providers for TOON: - OpenAI (Mode.TOON) - OpenRouter, Together, Anyscale, Groq (Mode.TOON) - Anthropic (Mode.ANTHROPIC_TOON)

hsaeed3 · 2026-01-02T18:49:54Z

added implementation for Claude models, but still needs further handling to support gemini/vertex/etc. models directly, the examples/toon/run.py tests responses from openai, anthropic as well as comparing token output between JSON, MD_JSON, TOOLS, compared to toon.

hsaeed3 · 2026-01-06T03:01:53Z

closed in favor of an updated PR

ellipsis-dev bot reviewed Dec 29, 2025

View reviewed changes

chore(mode.TOON): finalize base system and retry prompts for TOON res…

a15be4c

…ponse mode.

hsaeed3 closed this Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(mode): add TOON mode for compact token-efficient structured outputs #1976

feat(mode): add TOON mode for compact token-efficient structured outputs #1976

hsaeed3 commented Dec 29, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

jxnl commented Dec 30, 2025

Uh oh!

hsaeed3 commented Dec 30, 2025 •

edited

Loading

Uh oh!

jxnl commented Dec 31, 2025

Uh oh!

jxnl commented Dec 31, 2025

Uh oh!

hsaeed3 commented Jan 2, 2026

Uh oh!

hsaeed3 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(mode): add TOON mode for compact token-efficient structured outputs #1976

feat(mode): add TOON mode for compact token-efficient structured outputs #1976

Conversation

hsaeed3 commented Dec 29, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Testing

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

jxnl commented Dec 30, 2025

Uh oh!

hsaeed3 commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jxnl commented Dec 31, 2025

Uh oh!

jxnl commented Dec 31, 2025

Uh oh!

hsaeed3 commented Jan 2, 2026

Uh oh!

hsaeed3 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hsaeed3 commented Dec 29, 2025 •

edited by ellipsis-dev bot

Loading

hsaeed3 commented Dec 30, 2025 •

edited

Loading