Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 45 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,14 @@

Compact, human-readable serialization format for LLM contexts with **30-60% token reduction** vs JSON. Combines YAML-like indentation with CSV-like tabular arrays. Working towards full compatibility with the [official TOON specification](https://github.com/toon-format/spec).

**Key Features:** Minimal syntax • Tabular arrays for uniform data • Array length validation • Python 3.8+ • Comprehensive test coverage.
**Key Features:** Minimal syntax • Tabular arrays for uniform data • Array length validation • Python 3.8+ • Comprehensive test coverage

**🚀 Advanced Features (v0.9+):**
- **Type-Safe Integration**: Pydantic, dataclasses, attrs support
- **Streaming Processing**: Handle datasets larger than memory
- **Plugin System**: Custom encoders for NumPy, Pandas, UUID, etc.
- **Semantic Optimization**: AI-aware token reduction & field ordering
- **Batch Processing**: Multi-format conversion (JSON/YAML/XML/CSV) with auto-detection

```bash
# Beta published to PyPI - install from source:
Expand Down Expand Up @@ -40,6 +47,43 @@ decode("items[2]: apple,banana")
# {'items': ['apple', 'banana']}
```

### Advanced Usage (v0.9+)

```python
# Type-safe with Pydantic
from pydantic import BaseModel
from toon_format import encode_model, decode_model

class User(BaseModel):
name: str
age: int

user = User(name="Alice", age=30)
toon_str = encode_model(user)
decoded = decode_model(toon_str, User)

# Streaming for large datasets
from toon_format.streaming import StreamEncoder

with StreamEncoder("large_data.toon") as encoder:
encoder.start_array(fields=["id", "name"])
for i in range(1_000_000):
encoder.encode_item({"id": i, "name": f"user_{i}"})
encoder.end_array()

# Semantic optimization
from toon_format.semantic import optimize_for_llm

data = {"employee_identifier": 123, "full_name": "Alice"}
optimized = optimize_for_llm(data, abbreviate_keys=True)
# Result: {"emp_id": 123, "name": "Alice"}

# Batch convert JSON to TOON
from toon_format.batch import batch_convert

batch_convert("json_files/", "toon_files/", from_format="json", to_format="toon")
```

## CLI Usage

```bash
Expand Down
Loading