-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Type of Change
- Breaking change (incompatible with current spec)
- Backward-compatible addition
- Clarification or editorial improvement
- New optional feature
- Changes to conformance requirements
Summary
This proposal extends TOON's tabular array syntax to allow nested content per row.
The parent array header declares common fields, each row provides inline values for those fields, and rows can optionally contain additional nested arrays on subsequent indented lines.
Motivation
### Problem
The current TOON v3.0 specification (Section 9.3) restricts tabular format to arrays where:
1. Every element is an object
2. All objects have the **same set of keys**
3. All values are **primitives only** (no nested arrays/objects)
This forces a choice between compact tabular syntax (no nesting) or verbose expanded list syntax (allows nesting but repeats field names).
### Use Case
When serializing data with repeated structure plus per-item nested arrays (e.g., cathedrals with mass times, products with variants, users with roles), the current spec requires the verbose expanded list format, which significantly increases token count for LLM contexts.
### Benefits
- **Reduced token count**: Combines tabular compactness with nested content flexibility
- **Improved readability**: Common fields declared once in header, nested content clearly indented
- **Natural extension**: Builds on existing tabular syntax without breaking changes
- **Consistent with spec**: Rows remain marker-free, matching TOON v3.0 Section 9.3Detailed Design
## Detailed Design
### Proposed Syntax
Hybrid tabular arrays use the existing tabular header syntax. Rows provide inline values for declared fields, with optional nested content on subsequent indented lines:
cathedrals[3]{name,city,country,description}:
St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
massTimes[3]{day,time,type}:
Sunday,"09:00",Papal Mass
Sunday,"11:00",Solemn Mass
Daily,"07:00",Low Mass
Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
massTimes[2]{day,time,type}:
Sunday,"10:00",High Mass
Daily,"08:00",Morning Mass
St. Paul's Cathedral,London,England,Anglican cathedral and iconic London landmark
massTimes[2]{day,time,type}:
Sunday,"10:00",Sung Eucharist
Sunday,"15:15",Evensong
**Optional nested array hint in header:**
cathedrals[3]{name,city,country,description,massTimes[]}:
The `massTimes[]` hint signals "expect a massTimes array per row" without prescribing schema.
### Encoding Rules
- When encoding arrays of objects with nested arrays, encoders MAY use hybrid tabular format
- The parent header MUST declare all primitive fields: `key[N]{f1,f2,...}:`
- Rows MUST provide values for all declared fields in header order
- Nested content MUST appear at row depth + indentSize
- Nested arrays MUST use standard tabular syntax with their own headers
- Encoders MAY include nested array hints (`fieldName[]`) in the parent header
### Decoding Rules
- When parsing a tabular array, decoders MUST check for nested content at depth +1 after each row
- Row disambiguation follows TOON v3.0 Section 9.3 (delimiter-before-colon detection)
- Lines at row depth +1 with a colon indicate nested content for the current row
- Nested arrays are parsed recursively using standard tabular rules
- Decoders MUST support nested array hints (`fieldName[]`) in headers as informational
### Grammar Changes
; Existing (from TOON v3.0 Section 6)
bracket-seg = "[" [ "#" ] 1*DIGIT [ delimsym ] "]"
delimsym = HTAB / "|"
delim = delimsym / ","
; Field declarations (existing)
fields-seg = "{" fieldname *( delim fieldname ) "}"
fieldname = key
; Optional nested array hint (new)
nested-hint = key "[]"
fields-seg = "{" ( fieldname / nested-hint ) *( delim ( fieldname / nested-hint ) ) "}"
header = [ key ] bracket-seg [ fields-seg ] ":"
; Hybrid tabular row (new - extends existing tabular row)
hybrid-row = value *( delim value ) [ LF nested-content ]
nested-content = 1*( indent nested-array LF )
nested-array = key bracket-seg [ fields-seg ] ":" [ SP inline-values / LF rows ]Examples
### Before (current spec)
cathedrals[3]:
- name: St. Peter's Basilica
city: Vatican City
country: Vatican
description: Principal church of the Catholic Church
massTimes[3]:
- day: Sunday
time: "09:00"
type: Papal Mass
- day: Sunday
time: "11:00"
type: Solemn Mass
- day: Daily
time: "07:00"
type: Low Mass
- name: Cologne Cathedral
city: Cologne
country: Germany
description: Gothic masterpiece and UNESCO World Heritage Site
massTimes[2]:
- day: Sunday
time: "10:00"
type: High Mass
- day: Daily
time: "08:00"
type: Morning Mass
- name: St. Paul's Cathedral
city: London
country: England
description: Anglican cathedral and iconic London landmark
massTimes[2]:
- day: Sunday
time: "10:00"
type: Sung Eucharist
- day: Sunday
time: "15:15"
type: Evensong
### After (proposed)
cathedrals[3]{name,city,country,description}:
St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
massTimes[3]{day,time,type}:
Sunday,"09:00",Papal Mass
Sunday,"11:00",Solemn Mass
Daily,"07:00",Low Mass
Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
massTimes[2]{day,time,type}:
Sunday,"10:00",High Mass
Daily,"08:00",Morning Mass
St. Paul's Cathedral,London,England,Anglican cathedral and iconic London landmark
massTimes[2]{day,time,type}:
Sunday,"10:00",Sung Eucharist
Sunday,"15:15",Evensong
### Use Case: Multiple Nested Arrays
cathedrals[2]{name,city,country,description}:
St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
massTimes[3]{day,time,type}:
Sunday,"09:00",Papal Mass
Sunday,"11:00",Solemn Mass
Daily,"07:00",Low Mass
confessionTimes[2]{day,time}:
Saturday,"10:00"
Saturday,"16:00"
Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
massTimes[2]{day,time,type}:
Sunday,"10:00",High Mass
Daily,"08:00",Morning Mass
confessionTimes[2]{day,time}:
Friday,"17:00"
Saturday,"10:00"
### Use Case: Deeply Nested
countries[2]{name,continent}:
Italy,Europe
cathedrals[2]{name,city,description}:
St. Peter's Basilica,Vatican City,Principal church of the Catholic Church
massTimes[3]{day,time,type}:
Sunday,"09:00",Papal Mass
Sunday,"11:00",Solemn Mass
Daily,"07:00",Low Mass
Milan Cathedral,Milan,Gothic cathedral dedicated to the Nativity of Mary
massTimes[1]{day,time,type}:
Sunday,"10:00",Solemn Mass
Germany,Europe
cathedrals[1]{name,city,description}:
Cologne Cathedral,Cologne,Gothic masterpiece and UNESCO World Heritage Site
massTimes[2]{day,time,type}:
Sunday,"10:00",High Mass
Daily,"08:00",Morning Mass
### Use Case: With Nested Array Hint
cathedrals[3]{name,city,country,description,massTimes[]}:
St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
massTimes[3]{day,time,type}:
Sunday,"09:00",Papal Mass
Sunday,"11:00",Solemn Mass
Daily,"07:00",Low Mass
Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
massTimes[2]{day,time,type}:
Sunday,"10:00",High Mass
Daily,"08:00",Morning Mass
St. Paul's Cathedral,London,England,Anglican cathedral and iconic London landmark
massTimes[2]{day,time,type}:
Sunday,"10:00",Sung Eucharist
Sunday,"15:15",Evensong
### Use Case: Rows Without Nested Content (Fallback)
When no nested content is needed, hybrid tabular is identical to standard tabular:
cathedrals[3]{name,city,country,description}:
St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
St. Paul's Cathedral,London,England,Anglican cathedral and iconic London landmarkDrawbacks
- Increased parser complexity: Parsers must check for nested content after each row
- Indentation sensitivity: Deeper nesting requires careful indentation management
- Potential ambiguity: Row disambiguation must handle nested content detection
- Learning curve: Users must understand when hybrid tabular applies vs expanded list
Alternatives Considered
Alternative 1: List Markers for Hybrid Rows
cathedrals[3]{name,city,country,description}:
- St. Peter's Basilica,Vatican City,Vatican,Principal church
massTimes[2]{day,time,type}:
Sunday,"09:00",Papal Mass
Sunday,"11:00",Solemn Mass
Rejected because: Deviates from TOON v3.0 tabular syntax which uses no - markers. Introduces inconsistency where tabular rows sometimes have markers, sometimes don't.
Alternative 2: Schema Inheritance in Header
cathedrals[3]{name,city,country,description,massTimes[]{day,time,type}}:
St. Peter's Basilica,Vatican City,Vatican,Principal church
massTimes[2]:
Sunday,"09:00",Papal Mass
Sunday,"11:00",Solemn Mass
Deferred: More complex parsing, reduces readability when viewing rows in isolation. Could be added as optional enhancement in future version.
Do Nothing
Not acceptable because: Forces verbose expanded list format for common use cases, significantly increasing token count in LLM contexts where compactness is valuable.
Impact on Implementations
Impact on Implementations
- Reference implementation: Requires extending tabular row parsing to check for nested content at depth +1
- Community implementations: Need to update parsers to handle nested content detection
- Backward compatibility: Existing documents remain valid; new syntax is additive
- Encoder options: New
allowHybridTabularandincludeNestedArrayHintsoptions
Migration Strategy
Migration Strategy
For Implementers
- Update tabular row parser to check next line's depth
- If next line is at row depth +1 with a colon, parse as nested content
- Add support for
fieldName[]hints in field declarations - Add encoder options for hybrid tabular output
For Users
Existing TOON documents require no changes. To use hybrid tabular:
- Convert expanded list arrays to tabular format with field headers
- Move nested arrays to indented lines below each row
- Optionally add nested array hints to parent header
Test Cases
### Test 1: Basic Hybrid Tabular
{
"name": "basic_hybrid_tabular",
"input": {
"items": [
{"name": "A", "tags": [{"k": "v1"}, {"k": "v2"}]},
{"name": "B", "tags": [{"k": "v3"}]}
]
},
"expected": "items[2]{name}:\n A\n tags[2]{k}:\n v1\n v2\n B\n tags[1]{k}:\n v3",
"note": "Tests basic hybrid tabular with nested arrays"
}
**Expected TOON (pretty):**
items[2]{name}:
A
tags[2]{k}:
v1
v2
B
tags[1]{k}:
v3
### Test 2: Multiple Nested Arrays
{
"name": "multiple_nested_arrays",
"input": {
"cathedrals": [
{
"name": "St. Peter's",
"city": "Vatican",
"massTimes": [{"day": "Sunday", "time": "09:00"}],
"confessionTimes": [{"day": "Saturday", "time": "10:00"}]
},
{
"name": "Cologne",
"city": "Germany",
"massTimes": [{"day": "Sunday", "time": "10:00"}],
"confessionTimes": [{"day": "Friday", "time": "17:00"}]
}
]
},
"expected": "cathedrals[2]{name,city}:\n St. Peter's,Vatican\n massTimes[1]{day,time}:\n Sunday,\"09:00\"\n confessionTimes[1]{day,time}:\n Saturday,\"10:00\"\n Cologne,Germany\n massTimes[1]{day,time}:\n Sunday,\"10:00\"\n confessionTimes[1]{day,time}:\n Friday,\"17:00\"",
"note": "Tests rows with multiple nested arrays as siblings"
}
**Expected TOON (pretty):**
cathedrals[2]{name,city}:
St. Peter's,Vatican
massTimes[1]{day,time}:
Sunday,"09:00"
confessionTimes[1]{day,time}:
Saturday,"10:00"
Cologne,Germany
massTimes[1]{day,time}:
Sunday,"10:00"
confessionTimes[1]{day,time}:
Friday,"17:00"
### Test 3: Deeply Nested
{
"name": "deeply_nested",
"input": {
"countries": [
{
"name": "Italy",
"cities": [
{
"name": "Rome",
"landmarks": [{"name": "Colosseum"}, {"name": "Pantheon"}]
}
]
},
{
"name": "France",
"cities": [
{
"name": "Paris",
"landmarks": [{"name": "Eiffel Tower"}]
}
]
}
]
},
"expected": "countries[2]{name}:\n Italy\n cities[1]{name}:\n Rome\n landmarks[2]{name}:\n Colosseum\n Pantheon\n France\n cities[1]{name}:\n Paris\n landmarks[1]{name}:\n Eiffel Tower",
"note": "Tests nested arrays within nested arrays"
}
**Expected TOON (pretty):**
countries[2]{name}:
Italy
cities[1]{name}:
Rome
landmarks[2]{name}:
Colosseum
Pantheon
France
cities[1]{name}:
Paris
landmarks[1]{name}:
Eiffel Tower
### Test 4: Nested Array Hint in Header
{
"name": "nested_array_hint",
"input": {
"products": [
{"sku": "A1", "name": "Widget", "variants": [{"size": "S"}, {"size": "M"}]},
{"sku": "B2", "name": "Gadget", "variants": [{"size": "L"}]}
]
},
"expected": "products[2]{sku,name,variants[]}:\n A1,Widget\n variants[2]{size}:\n S\n M\n B2,Gadget\n variants[1]{size}:\n L",
"note": "Tests fieldName[] hint in header",
"encoderOption": "includeNestedArrayHints: true"
}
**Expected TOON (pretty):**
products[2]{sku,name,variants[]}:
A1,Widget
variants[2]{size}:
S
M
B2,Gadget
variants[1]{size}:
L
### Test 5: Empty Nested Array
{
"name": "empty_nested_array",
"input": {
"users": [
{"name": "Alice", "roles": [{"name": "admin"}, {"name": "user"}]},
{"name": "Bob", "roles": []}
]
},
"expected": "users[2]{name}:\n Alice\n roles[2]{name}:\n admin\n user\n Bob\n roles[0]{name}:",
"note": "Tests [0]{fields}: within row"
}
**Expected TOON (pretty):**
users[2]{name}:
Alice
roles[2]{name}:
admin
user
Bob
roles[0]{name}:
### Test 6: Mixed Rows (Some With Nested, Some Without)
{
"name": "mixed_rows",
"input": {
"items": [
{"id": 1, "name": "Simple"},
{"id": 2, "name": "Complex", "tags": [{"k": "a"}, {"k": "b"}]},
{"id": 3, "name": "Another Simple"}
]
},
"expected": "items[3]{id,name}:\n 1,Simple\n 2,Complex\n tags[2]{k}:\n a\n b\n 3,Another Simple",
"note": "Tests some rows with nested content, others without"
}
**Expected TOON (pretty):**
items[3]{id,name}:
1,Simple
2,Complex
tags[2]{k}:
a
b
3,Another Simple
### Test 7: Row Disambiguation
{
"name": "row_disambiguation",
"input": {
"records": [
{"key": "a:b", "value": "x,y", "meta": [{"t": "1"}]},
{"key": "c:d", "value": "z", "meta": [{"t": "2"}]}
]
},
"expected": "records[2]{key,value}:\n \"a:b\",\"x,y\"\n meta[1]{t}:\n 1\n \"c:d\",z\n meta[1]{t}:\n 2",
"note": "Tests delimiter-before-colon detection with values containing colons"
}
**Expected TOON (pretty):**
records[2]{key,value}:
"a:b","x,y"
meta[1]{t}:
1
"c:d",z
meta[1]{t}:
2
### Test 8: Tab Delimiter
{
"name": "tab_delimiter",
"input": {
"data": [
{"col1": "a", "col2": "b", "nested": [{"x": "1"}, {"x": "2"}]},
{"col1": "c", "col2": "d", "nested": [{"x": "3"}]}
]
},
"expected": "data[2\t]{col1\tcol2}:\n a\tb\n nested[2\t]{x}:\n 1\n 2\n c\td\n nested[1\t]{x}:\n 3",
"note": "Tests hybrid tabular with tab delimiter",
"encoderOption": "delimiter: tab"
}
**Expected TOON (pretty):**
data[2 ]{col1 col2}:
a b
nested[2 ]{x}:
1
2
c d
nested[1 ]{x}:
3
### Test 9: Quoted Values in Nested Arrays
{
"name": "quoted_nested_values",
"input": {
"events": [
{
"name": "Conference",
"sessions": [
{"title": "Intro, Part 1", "time": "09:00"},
{"title": "Q&A", "time": "10:00"}
]
}
]
},
"expected": "events[1]{name}:\n Conference\n sessions[2]{title,time}:\n \"Intro, Part 1\",\"09:00\"\n Q&A,\"10:00\"",
"note": "Tests values requiring quoting in nested arrays"
}
**Expected TOON (pretty):**
events[1]{name}:
Conference
sessions[2]{title,time}:
"Intro, Part 1","09:00"
Q&A,"10:00"
### Test 10: Single Row with Nested Content
{
"name": "single_row_nested",
"input": {
"report": [
{
"title": "Q4 Report",
"metrics": [
{"name": "Revenue", "value": 1000000},
{"name": "Costs", "value": 750000},
{"name": "Profit", "value": 250000}
]
}
]
},
"expected": "report[1]{title}:\n Q4 Report\n metrics[3]{name,value}:\n Revenue,1000000\n Costs,750000\n Profit,250000",
"note": "Tests single-element array with nested content"
}
**Expected TOON (pretty):**
report[1]{title}:
Q4 Report
metrics[3]{name,value}:
Revenue,1000000
Costs,750000
Profit,250000Affected Specification Sections
- Section 6: Header Syntax (add nested array hint grammar)
- Section 9.3: Arrays of Objects - Tabular Form (extend for nested content)
- Section 12: Indentation and Whitespace (clarify nested content depth)
- Section 14: Strict Mode Errors and Diagnostics (add hybrid tabular validations)
- Section 19: TOON Core Profile (update tabular array rules)
- Appendix A: Examples (add hybrid tabular examples)
Unresolved Questions
-
Nested array hint scope: Should
fieldName[]hints in headers be informational only, or should parsers validate their presence? -
Maximum nesting depth: Should there be a recommended limit for nested tabular depth?
-
Schema inheritance: Should nested arrays be able to inherit field schemas from parent headers? (DRY vs complexity tradeoff)
-
Empty nested arrays: How should empty nested arrays (
[0]{fields}:) be represented?- Option A: Just the header line with no rows
- Option B: Omit entirely if empty
Additional Context
- This proposal addresses token efficiency concerns for LLM context serialization
- Similar to how CSV handles nested data with denormalization, but TOON can preserve structure
- Consistent with TOON's design philosophy of being "YAML-like but more compact"
Checklist
- I have read the RFC process in CONTRIBUTING.md
- I have searched for similar proposals
- I have considered backward compatibility
- I understand this may require community discussion before acceptance