First version #3

xaviviro · 2025-11-03T09:18:45Z

Initial Release: Python TOON Format Implementation v1.0.0

Description

This PR establishes the official Python implementation of the TOON (Token-Oriented Object Notation) format. TOON is a compact, human-readable serialization format designed for passing structured data to Large Language Models with 30-60% token reduction compared to JSON.

This release migrates the complete implementation from the pytoon repository, adds comprehensive CI/CD infrastructure, and establishes the package as python-toon on PyPI.

Type of Change

New feature (non-breaking change that adds functionality)
Documentation update
Bug fix (non-breaking change that fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactoring (no functional changes)
Performance improvement
Test coverage improvement

Related Issues

Initial release - no related issues.

Changes Made

Core Implementation (11 modules, ~1,922 lines)

Complete encoder implementation with support for objects, arrays, tabular format, and primitives
Full decoder with strict/lenient parsing modes
CLI tool for JSON ↔ TOON conversion
Type definitions and constants following TOON specification
Value normalization for Python-specific types (Decimal, datetime, etc.)

Package Configuration

Package name: python-toon (PyPI)
Module name: toon_format (Python import)
Version: 1.0.0
Python support: 3.8-3.14 (including 3.14t free-threaded)
Build system: uv_build for fast, reliable builds
Dependencies: Zero runtime dependencies

CI/CD Infrastructure

GitHub Actions workflow for testing across Python 3.8-3.12
Automated PyPI publishing via OIDC trusted publishing
TestPyPI workflow for pre-release validation
Ruff linting and formatting enforcement
Type checking with mypy
Coverage reporting with pytest-cov

Testing

73 comprehensive tests covering:
- Encoding: primitives, objects, arrays (tabular and mixed), delimiters, indentation
- Decoding: basic structures, strict mode, delimiters, length markers, edge cases
- Roundtrip: encode → decode → encode consistency
- 100% test pass rate

Documentation

Comprehensive README.md with:
- Installation instructions (pip and uv)
- Quick start guide
- Complete API reference
- CLI usage examples
- LLM integration best practices
- Token efficiency comparisons
CONTRIBUTING.md with development workflow
PR template for future contributions
Issue templates for bug reports
examples.py with 7 runnable demonstrations

SPEC Compliance

This PR implements/fixes spec compliance
Spec section(s) affected: All sections (complete implementation)
Spec version: Latest (https://github.com/toon-format/spec)

Implementation Details:

✅ YAML-style indentation for nested objects
✅ CSV-style tabular format for uniform arrays
✅ Inline format for primitive arrays
✅ List format for mixed arrays
✅ Length markers [N] for all arrays
✅ Optional # prefix for length markers
✅ Delimiter options: comma (default), tab, pipe
✅ Quoting rules for strings (minimal, spec-compliant)
✅ Escape sequences: \", \\, \n, \r, \t
✅ Primitives: null, true, false, numbers, strings
✅ Strict and lenient parsing modes

Testing

Test Output

============================= test session starts ==============================
platform darwin -- Python 3.11.14, pytest-8.4.2, pluggy-1.6.0
collected 73 items

tests/test_decoder.py .................................            [ 45%]
tests/test_encoder.py ........................................      [100%]

============================== 73 passed in 0.03s ==============================

Test Coverage:

Encoder: 40 tests covering all encoding scenarios
Decoder: 33 tests covering parsing and validation
All edge cases, delimiters, and format options tested
100% pass rate

Code Quality

Ran ruff check src/toon_format tests - no issues
Ran ruff format src/toon_format tests - code formatted
Ran mypy src/toon_format - no critical errors
All tests pass: pytest tests/ -v

Linter Output:

$ ruff check src/toon_format tests
All checks passed!

Checklist

My code follows the project's coding standards (PEP 8, line length 100)
I have added type hints to new code
I have added tests that prove my fix/feature works
New and existing tests pass locally
I have updated documentation (README.md)
My changes do not introduce new dependencies
I have maintained Python 3.8+ compatibility
I have reviewed the TOON specification for relevant sections

Performance Impact

No performance impact
Performance improvement (describe below)
Potential performance regression (describe and justify below)

Performance Characteristics:

Encoder: Fast string building with minimal allocations
Decoder: Single-pass parsing with minimal backtracking
Zero runtime dependencies for optimal load times
Suitable for high-frequency encoding/decoding scenarios

Breaking Changes

No breaking changes
Breaking changes (describe migration path below)

This is the initial release, so no breaking changes apply.

Screenshots / Examples

Basic Usage

from toon_format import encode

# Simple object
data = {"name": "Alice", "age": 30}
print(encode(data))

Output:

name: Alice
age: 30

Tabular Array Example

users = [
    {"id": 1, "name": "Alice", "age": 30},
    {"id": 2, "name": "Bob", "age": 25},
    {"id": 3, "name": "Charlie", "age": 35},
]
print(encode(users))

Output:

[3,]{id,name,age}:
  1,Alice,30
  2,Bob,25
  3,Charlie,35

Token Efficiency

import json
from toon_format import encode

data = {
    "users": [
        {"id": 1, "name": "Alice", "age": 30, "active": True},
        {"id": 2, "name": "Bob", "age": 25, "active": True},
        {"id": 3, "name": "Charlie", "age": 35, "active": False},
    ]
}

json_str = json.dumps(data)
toon_str = encode(data)

print(f"JSON: {len(json_str)} characters")
print(f"TOON: {len(toon_str)} characters")
print(f"Reduction: {100 * (1 - len(toon_str) / len(json_str)):.1f}%")

Output:

JSON: 177 characters
TOON: 85 characters
Reduction: 52.0%

Additional Context

Package Details

PyPI Package: python-toon
Import Path: toon_format
CLI Command: toon
License: MIT
Repository: https://github.com/toon-format/toon-python
Documentation: https://github.com/toon-format/spec

Installation

# With pip
pip install python-toon

# With uv (recommended)
uv pip install python-toon

Development Setup

# Clone repository
git clone https://github.com/toon-format/toon-python.git
cd toon-python

# Install with uv
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run linters
ruff check src/toon_format tests
mypy src/toon_format

Key Features

Token Efficiency: 30-60% reduction compared to JSON
Human Readable: YAML-like syntax for objects, CSV-like for arrays
Spec Compliant: 100% compatible with official TOON specification
Type Safe: Full type hints throughout codebase
Well Tested: 73 tests with 100% pass rate
Zero Dependencies: No runtime dependencies
Python 3.8+: Supports Python 3.8 through 3.14t (free-threaded)
Fast: Single-pass parsing, minimal allocations
Flexible: Multiple delimiters, indentation options, strict/lenient modes
CLI Included: Command-line tool for JSON ↔ TOON conversion

Future Roadmap

Additional encoding options (custom formatters)
Performance optimizations for large datasets
Streaming encoder/decoder for very large files
Additional language implementations
Enhanced CLI features (pretty-printing, validation)

Checklist for Reviewers

Code changes are clear and well-documented
Tests adequately cover the changes
Documentation is updated
No security concerns
Follows TOON specification
Backward compatible (or breaking changes are justified and documented)

Review Focus Areas

Spec Compliance: Verify encoding/decoding matches TOON spec exactly
Edge Cases: Check handling of empty strings, special characters, nested structures
Type Safety: Ensure type hints are accurate and complete
Error Messages: Verify error messages are clear and helpful
Documentation: Confirm examples work as shown
CI/CD: Verify workflows are properly configured for PyPI deployment

xaviviro added 2 commits November 3, 2025 09:53

first commit

4721c8d

first

f3e0040

xaviviro requested review from a team and johannschopplich as code owners November 3, 2025 09:18

xaviviro closed this Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

First version #3

First version #3

Uh oh!

xaviviro commented Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

First version #3

First version #3

Uh oh!

Conversation

xaviviro commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Initial Release: Python TOON Format Implementation v1.0.0

Description

Type of Change

Related Issues

Changes Made

Core Implementation (11 modules, ~1,922 lines)

Package Configuration

CI/CD Infrastructure

Testing

Documentation

SPEC Compliance

Testing

Test Output

Code Quality

Checklist

Performance Impact

Breaking Changes

Screenshots / Examples

Basic Usage

Tabular Array Example

Token Efficiency

Additional Context

Package Details

Installation

Development Setup

Key Features

Future Roadmap

Checklist for Reviewers

Review Focus Areas

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xaviviro commented Nov 3, 2025 •

edited

Loading