Skip to content

Conversation

@xaviviro
Copy link
Contributor

@xaviviro xaviviro commented Nov 3, 2025

Initial Release: Python TOON Format Implementation v1.0.0

Description

This PR establishes the official Python implementation of the TOON (Token-Oriented Object Notation) format. TOON is a compact, human-readable serialization format designed for passing structured data to Large Language Models with 30-60% token reduction compared to JSON.

This release migrates the complete implementation from the pytoon repository, adds comprehensive CI/CD infrastructure, and establishes the package as python-toon on PyPI.

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Documentation update
  • Bug fix (non-breaking change that fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Related Issues

Initial release - no related issues.

Changes Made

Core Implementation (11 modules, ~1,922 lines)

  • Complete encoder implementation with support for objects, arrays, tabular format, and primitives
  • Full decoder with strict/lenient parsing modes
  • CLI tool for JSON ↔ TOON conversion
  • Type definitions and constants following TOON specification
  • Value normalization for Python-specific types (Decimal, datetime, etc.)

Package Configuration

  • Package name: python-toon (PyPI)
  • Module name: toon_format (Python import)
  • Version: 1.0.0
  • Python support: 3.8-3.14 (including 3.14t free-threaded)
  • Build system: uv_build for fast, reliable builds
  • Dependencies: Zero runtime dependencies

CI/CD Infrastructure

  • GitHub Actions workflow for testing across Python 3.8-3.12
  • Automated PyPI publishing via OIDC trusted publishing
  • TestPyPI workflow for pre-release validation
  • Ruff linting and formatting enforcement
  • Type checking with mypy
  • Coverage reporting with pytest-cov

Testing

  • 73 comprehensive tests covering:
    • Encoding: primitives, objects, arrays (tabular and mixed), delimiters, indentation
    • Decoding: basic structures, strict mode, delimiters, length markers, edge cases
    • Roundtrip: encode → decode → encode consistency
    • 100% test pass rate

Documentation

  • Comprehensive README.md with:
    • Installation instructions (pip and uv)
    • Quick start guide
    • Complete API reference
    • CLI usage examples
    • LLM integration best practices
    • Token efficiency comparisons
  • CONTRIBUTING.md with development workflow
  • PR template for future contributions
  • Issue templates for bug reports
  • examples.py with 7 runnable demonstrations

SPEC Compliance

Implementation Details:

  • ✅ YAML-style indentation for nested objects
  • ✅ CSV-style tabular format for uniform arrays
  • ✅ Inline format for primitive arrays
  • ✅ List format for mixed arrays
  • ✅ Length markers [N] for all arrays
  • ✅ Optional # prefix for length markers
  • ✅ Delimiter options: comma (default), tab, pipe
  • ✅ Quoting rules for strings (minimal, spec-compliant)
  • ✅ Escape sequences: \", \\, \n, \r, \t
  • ✅ Primitives: null, true, false, numbers, strings
  • ✅ Strict and lenient parsing modes

Testing

  • All existing tests pass
  • Added new tests for changes
  • Tested on Python 3.8
  • Tested on Python 3.9
  • Tested on Python 3.10
  • Tested on Python 3.11
  • Tested on Python 3.12

Test Output

============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-8.4.2, pluggy-1.6.0
collected 73 items

tests/test_decoder.py .................................            [ 45%]
tests/test_encoder.py ........................................      [100%]

============================== 73 passed in 0.03s ==============================

Test Coverage:

  • Encoder: 40 tests covering all encoding scenarios
  • Decoder: 33 tests covering parsing and validation
  • All edge cases, delimiters, and format options tested
  • 100% pass rate

Code Quality

  • Ran ruff check src/toon_format tests - no issues
  • Ran ruff format src/toon_format tests - code formatted
  • Ran mypy src/toon_format - no critical errors
  • All tests pass: pytest tests/ -v

Linter Output:

$ ruff check src/toon_format tests
All checks passed!

Checklist

  • My code follows the project's coding standards (PEP 8, line length 100)
  • I have added type hints to new code
  • I have added tests that prove my fix/feature works
  • New and existing tests pass locally
  • I have updated documentation (README.md)
  • My changes do not introduce new dependencies
  • I have maintained Python 3.8+ compatibility
  • I have reviewed the TOON specification for relevant sections

Performance Impact

  • No performance impact
  • Performance improvement (describe below)
  • Potential performance regression (describe and justify below)

Performance Characteristics:

  • Encoder: Fast string building with minimal allocations
  • Decoder: Single-pass parsing with minimal backtracking
  • Zero runtime dependencies for optimal load times
  • Suitable for high-frequency encoding/decoding scenarios

Breaking Changes

  • No breaking changes
  • Breaking changes (describe migration path below)

This is the initial release, so no breaking changes apply.

Screenshots / Examples

Basic Usage

from toon_format import encode

# Simple object
data = {"name": "Alice", "age": 30}
print(encode(data))

Output:

name: Alice
age: 30

Tabular Array Example

users = [
    {"id": 1, "name": "Alice", "age": 30},
    {"id": 2, "name": "Bob", "age": 25},
    {"id": 3, "name": "Charlie", "age": 35},
]
print(encode(users))

Output:

[3,]{id,name,age}:
  1,Alice,30
  2,Bob,25
  3,Charlie,35

Token Efficiency

import json
from toon_format import encode

data = {
    "users": [
        {"id": 1, "name": "Alice", "age": 30, "active": True},
        {"id": 2, "name": "Bob", "age": 25, "active": True},
        {"id": 3, "name": "Charlie", "age": 35, "active": False},
    ]
}

json_str = json.dumps(data)
toon_str = encode(data)

print(f"JSON: {len(json_str)} characters")
print(f"TOON: {len(toon_str)} characters")
print(f"Reduction: {100 * (1 - len(toon_str) / len(json_str)):.1f}%")

Output:

JSON: 177 characters
TOON: 85 characters
Reduction: 52.0%

Additional Context

Package Details

Installation

# With pip
pip install python-toon

# With uv (recommended)
uv pip install python-toon

Development Setup

# Clone repository
git clone https://github.com/toon-format/toon-python.git
cd toon-python

# Install with uv
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run linters
ruff check src/toon_format tests
mypy src/toon_format

Key Features

  1. Token Efficiency: 30-60% reduction compared to JSON
  2. Human Readable: YAML-like syntax for objects, CSV-like for arrays
  3. Spec Compliant: 100% compatible with official TOON specification
  4. Type Safe: Full type hints throughout codebase
  5. Well Tested: 73 tests with 100% pass rate
  6. Zero Dependencies: No runtime dependencies
  7. Python 3.8+: Supports Python 3.8 through 3.14t (free-threaded)
  8. Fast: Single-pass parsing, minimal allocations
  9. Flexible: Multiple delimiters, indentation options, strict/lenient modes
  10. CLI Included: Command-line tool for JSON ↔ TOON conversion

Future Roadmap

  • Additional encoding options (custom formatters)
  • Performance optimizations for large datasets
  • Streaming encoder/decoder for very large files
  • Additional language implementations
  • Enhanced CLI features (pretty-printing, validation)

Checklist for Reviewers

  • Code changes are clear and well-documented
  • Tests adequately cover the changes
  • Documentation is updated
  • No security concerns
  • Follows TOON specification
  • Backward compatible (or breaking changes are justified and documented)

Review Focus Areas

  1. Spec Compliance: Verify encoding/decoding matches TOON spec exactly
  2. Edge Cases: Check handling of empty strings, special characters, nested structures
  3. Type Safety: Ensure type hints are accurate and complete
  4. Error Messages: Verify error messages are clear and helpful
  5. Documentation: Confirm examples work as shown
  6. CI/CD: Verify workflows are properly configured for PyPI deployment

@xaviviro xaviviro requested review from a team and johannschopplich as code owners November 3, 2025 09:18
@xaviviro xaviviro closed this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants