Token-Oriented Object Notation (TOON) is an LLM-optimized data serialization format implemented in Python.
- π― LLM-optimized and Human-readable format: More compact and easier to read than JSON
- π Python-native: Automatic handling of datetime, dataclasses, Pydantic models
- π Smart array formatting: Inline, tabular, or list formats chosen automatically
- βοΈ Configurable: Custom delimiters, indentation, and length markers
- π Type-safe: Full type hints and Pydantic validation
- π Data Science Compatible: Compatible with JSON, Pandas and Pandas-like data tasks
Get more cognitive output and efficiency from LLMs with less tokens in prompts!
# Using uv (recommended)
uv add toon-llm
# Using pip
pip install toon-llmfrom toon import encode, decode
# Encode Python data to TOON LLM format
data = {
"username": "Alice",
"age": 30,
"tags": ["python", "coding", "llm"],
"active": True,
"invoices": [
{"id": 1, "amount": 250.75, "paid": False},
{"id": 2, "amount": 125.00, "paid": True},
{"id": 3, "amount": 320.40, "paid": True},
{"id": 4, "amount": 75.20, "paid": False},
{"id": 5, "amount": 600.00, "paid": True}
]
}
encoded = encode(data)
# username: Alice
# age: 30
# tags[3]: python,coding,llm
# active: true
# invoices[5]{id,amount,paid}:
# 1,250.75,false
# 2,125,true
# 3,320.40,true
# 4,75.20,false
# 5,600,true
llm_prompt = f"""
Process the following structured data and return the invoices that have not been paid:
```
{encoded}
```
"""
# Call your LLM with llm_prompt...TOON LLM includes a command-line interface for encoding and decoding data:
# Show help
uv run toon --help
# Encode JSON file to TOON format
uv run toon encode input.json -o output.toon
# Encode from stdin
echo '{"name": "Alice", "age": 30}' | uv run toon encode
# Decode TOON file to JSON
uv run toon decode input.toon -o output.json
# Decode with pretty printing
uv run toon decode input.toon --pretty
# Decode with validation
uv run toon decode input.toon --validate
# Custom formatting options
uv run toon encode input.json --indent 4 --delimiter "|"
# Show version
uv run toon --versionSee uv run toon encode --help and uv run toon decode --help for all available options.
- Quick Start Guide - Examples and usage overview
- Format Specification - Token Oriented Object Notation (TOON) specification (language agnostic)
- API Reference - Complete API documentation of the Python implementation
- LLM Prompts - Guidance for LLMs to understand and generate TOON format
- Coding Standards - For contributors
TOON LLM is a Python library that provides a clean, compact, and highly readable alternative to JSON for serializing Python data structures to minimise token usage with large language models (LLMs).
It is a Python compatible specification and implementation of Token-Oriented Object Notation format.
Cognitive load in LLMs can be significantly reduced by using more concise and structured data formats. TOON LLM achieves this by minimizing syntax noise and enhancing readability, making it easier for both humans and machines to parse and understand the data.
Using the cl100k_base tokenizer from OpenAI, here is a comparison of how the same data is represented in JSON vs TOON LLM.
JSON:
{
"weather_observations": [
{ "high_temp": 75, "low_temp": 50, "average_temp": 62.5, "dew_point": 45, "wind_chill": 60 },
{ "high_temp": 78, "low_temp": 52, "average_temp": 65.0, "dew_point": 48, "wind_chill": 63 },
{ "high_temp": 72, "low_temp": 48, "average_temp": 60.0, "dew_point": 42, "wind_chill": 58 },
{ "high_temp": 80, "low_temp": 55, "average_temp": 67.5, "dew_point": 50, "wind_chill": 65 },
{ "high_temp": 76, "low_temp": 51, "average_temp": 63.5, "dew_point": 46, "wind_chill": 61 },
{ "high_temp": 74, "low_temp": 49, "average_temp": 61.5, "dew_point": 44, "wind_chill": 59 },
{ "high_temp": 79, "low_temp": 54, "average_temp": 66.5, "dew_point": 49, "wind_chill": 64 },
{ "high_temp": 73, "low_temp": 47, "average_temp": 60.0, "dew_point": 41, "wind_chill": 57 },
{ "high_temp": 77, "low_temp": 53, "average_temp": 65.0, "dew_point": 47, "wind_chill": 62 },
{ "high_temp": 81, "low_temp": 56, "average_temp": 68.5, "dew_point": 51, "wind_chill": 66 }
]
}Token Count: 411
TOON LLM:
weather_observations[10]:
high_temp,low_temp,average_temp,dew_point,wind_chill
75,50,62.5,45,60
78,52,65.0,48,63
72,48,60.0,42,58
80,55,67.5,50,65
76,51,63.5,46,61
74,49,61.5,44,59
79,54,66.5,49,64
73,47,60.0,41,57
77,53,65.0,47,62
81,56,68.5,51,66
Token Count: 162
That is over a 60% reduction in token count compared to JSON!
Multiply that over large datasets and complex structures, and the savings become substantial.
Benefits:
- β¨ Less syntax noise (no braces, fewer quotes)
- π More compact (fewer lines and characters)
- ποΈ Easier to read and scan
- π― Clear structure through indentation
- π Smart array formatting (inline, tabular, or list)
TOON LLM provides flexible configuration options to customize the encoding format.
Read about them in the Specification and the API Documentation.
# Run tests
uv run pytest tests/ -v
# Run with coverage
uv run coverage run -m pytest && uv run coverage report
# Current status
# 310 tests passing
# 80.52% coverageContributions are welcome! Please read our Coding Standards before contributing.
# Clone repository
git clone https://github.com/davidpirogov/toon-llm.git
cd toon-llm
# Install dependencies
uv sync
# Run tests
uv run pytest
# Run linting
uv run ruff check src/toon/
# Format code
uv run ruff format src/toon/- Follow PEP 8 and our Coding Standards
- Add tests for new features
- Update documentation
- Ensure all tests pass
- Maintain or improve coverage
- Python 3.11 or higher
- Pydantic 2.x
This project is licensed under the MIT License - see the LICENSE file for details.
Inspired by Token-Oriented Object Notation by Johann Schopplich.
If you are looking for a TypeScript/JavaScript implementation, check out toon repository
- GitHub: https://github.com/davidpirogov/toon-llm
- Documentation: ./docs/
- Issues: GitHub Issues
- PyPI: https://pypi.org/project/toon-llm/