Skip to content

[RFC]: Hybrid Tabular Arrays with Nested Row Content #21

@michal-ciechan

Description

@michal-ciechan

Type of Change

  • Breaking change (incompatible with current spec)
  • Backward-compatible addition
  • Clarification or editorial improvement
  • New optional feature
  • Changes to conformance requirements

Summary

This proposal extends TOON's tabular array syntax to allow nested content per row.

The parent array header declares common fields, each row provides inline values for those fields, and rows can optionally contain additional nested arrays on subsequent indented lines.

Motivation

### Problem

The current TOON v3.0 specification (Section 9.3) restricts tabular format to arrays where:
1. Every element is an object
2. All objects have the **same set of keys**
3. All values are **primitives only** (no nested arrays/objects)

This forces a choice between compact tabular syntax (no nesting) or verbose expanded list syntax (allows nesting but repeats field names).

### Use Case

When serializing data with repeated structure plus per-item nested arrays (e.g., cathedrals with mass times, products with variants, users with roles), the current spec requires the verbose expanded list format, which significantly increases token count for LLM contexts.

### Benefits

- **Reduced token count**: Combines tabular compactness with nested content flexibility
- **Improved readability**: Common fields declared once in header, nested content clearly indented
- **Natural extension**: Builds on existing tabular syntax without breaking changes
- **Consistent with spec**: Rows remain marker-free, matching TOON v3.0 Section 9.3

Detailed Design

## Detailed Design

### Proposed Syntax

Hybrid tabular arrays use the existing tabular header syntax. Rows provide inline values for declared fields, with optional nested content on subsequent indented lines:


cathedrals[3]{name,city,country,description}:
  St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
    massTimes[3]{day,time,type}:
      Sunday,"09:00",Papal Mass
      Sunday,"11:00",Solemn Mass
      Daily,"07:00",Low Mass
  Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
    massTimes[2]{day,time,type}:
      Sunday,"10:00",High Mass
      Daily,"08:00",Morning Mass
  St. Paul's Cathedral,London,England,Anglican cathedral and iconic London landmark
    massTimes[2]{day,time,type}:
      Sunday,"10:00",Sung Eucharist
      Sunday,"15:15",Evensong


**Optional nested array hint in header:**


cathedrals[3]{name,city,country,description,massTimes[]}:


The `massTimes[]` hint signals "expect a massTimes array per row" without prescribing schema.

### Encoding Rules

- When encoding arrays of objects with nested arrays, encoders MAY use hybrid tabular format
- The parent header MUST declare all primitive fields: `key[N]{f1,f2,...}:`
- Rows MUST provide values for all declared fields in header order
- Nested content MUST appear at row depth + indentSize
- Nested arrays MUST use standard tabular syntax with their own headers
- Encoders MAY include nested array hints (`fieldName[]`) in the parent header

### Decoding Rules

- When parsing a tabular array, decoders MUST check for nested content at depth +1 after each row
- Row disambiguation follows TOON v3.0 Section 9.3 (delimiter-before-colon detection)
- Lines at row depth +1 with a colon indicate nested content for the current row
- Nested arrays are parsed recursively using standard tabular rules
- Decoders MUST support nested array hints (`fieldName[]`) in headers as informational

### Grammar Changes


; Existing (from TOON v3.0 Section 6)
bracket-seg   = "[" [ "#" ] 1*DIGIT [ delimsym ] "]"
delimsym      = HTAB / "|"
delim         = delimsym / ","

; Field declarations (existing)
fields-seg    = "{" fieldname *( delim fieldname ) "}"
fieldname     = key

; Optional nested array hint (new)
nested-hint   = key "[]"
fields-seg    = "{" ( fieldname / nested-hint ) *( delim ( fieldname / nested-hint ) ) "}"

header        = [ key ] bracket-seg [ fields-seg ] ":"

; Hybrid tabular row (new - extends existing tabular row)
hybrid-row    = value *( delim value ) [ LF nested-content ]
nested-content = 1*( indent nested-array LF )
nested-array  = key bracket-seg [ fields-seg ] ":" [ SP inline-values / LF rows ]

Examples

### Before (current spec)


cathedrals[3]:
  - name: St. Peter's Basilica
    city: Vatican City
    country: Vatican
    description: Principal church of the Catholic Church
    massTimes[3]:
      - day: Sunday
        time: "09:00"
        type: Papal Mass
      - day: Sunday
        time: "11:00"
        type: Solemn Mass
      - day: Daily
        time: "07:00"
        type: Low Mass
  - name: Cologne Cathedral
    city: Cologne
    country: Germany
    description: Gothic masterpiece and UNESCO World Heritage Site
    massTimes[2]:
      - day: Sunday
        time: "10:00"
        type: High Mass
      - day: Daily
        time: "08:00"
        type: Morning Mass
  - name: St. Paul's Cathedral
    city: London
    country: England
    description: Anglican cathedral and iconic London landmark
    massTimes[2]:
      - day: Sunday
        time: "10:00"
        type: Sung Eucharist
      - day: Sunday
        time: "15:15"
        type: Evensong


### After (proposed)


cathedrals[3]{name,city,country,description}:
  St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
    massTimes[3]{day,time,type}:
      Sunday,"09:00",Papal Mass
      Sunday,"11:00",Solemn Mass
      Daily,"07:00",Low Mass
  Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
    massTimes[2]{day,time,type}:
      Sunday,"10:00",High Mass
      Daily,"08:00",Morning Mass
  St. Paul's Cathedral,London,England,Anglican cathedral and iconic London landmark
    massTimes[2]{day,time,type}:
      Sunday,"10:00",Sung Eucharist
      Sunday,"15:15",Evensong


### Use Case: Multiple Nested Arrays


cathedrals[2]{name,city,country,description}:
  St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
    massTimes[3]{day,time,type}:
      Sunday,"09:00",Papal Mass
      Sunday,"11:00",Solemn Mass
      Daily,"07:00",Low Mass
    confessionTimes[2]{day,time}:
      Saturday,"10:00"
      Saturday,"16:00"
  Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
    massTimes[2]{day,time,type}:
      Sunday,"10:00",High Mass
      Daily,"08:00",Morning Mass
    confessionTimes[2]{day,time}:
      Friday,"17:00"
      Saturday,"10:00"


### Use Case: Deeply Nested


countries[2]{name,continent}:
  Italy,Europe
    cathedrals[2]{name,city,description}:
      St. Peter's Basilica,Vatican City,Principal church of the Catholic Church
        massTimes[3]{day,time,type}:
          Sunday,"09:00",Papal Mass
          Sunday,"11:00",Solemn Mass
          Daily,"07:00",Low Mass
      Milan Cathedral,Milan,Gothic cathedral dedicated to the Nativity of Mary
        massTimes[1]{day,time,type}:
          Sunday,"10:00",Solemn Mass
  Germany,Europe
    cathedrals[1]{name,city,description}:
      Cologne Cathedral,Cologne,Gothic masterpiece and UNESCO World Heritage Site
        massTimes[2]{day,time,type}:
          Sunday,"10:00",High Mass
          Daily,"08:00",Morning Mass


### Use Case: With Nested Array Hint


cathedrals[3]{name,city,country,description,massTimes[]}:
  St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
    massTimes[3]{day,time,type}:
      Sunday,"09:00",Papal Mass
      Sunday,"11:00",Solemn Mass
      Daily,"07:00",Low Mass
  Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
    massTimes[2]{day,time,type}:
      Sunday,"10:00",High Mass
      Daily,"08:00",Morning Mass
  St. Paul's Cathedral,London,England,Anglican cathedral and iconic London landmark
    massTimes[2]{day,time,type}:
      Sunday,"10:00",Sung Eucharist
      Sunday,"15:15",Evensong


### Use Case: Rows Without Nested Content (Fallback)

When no nested content is needed, hybrid tabular is identical to standard tabular:


cathedrals[3]{name,city,country,description}:
  St. Peter's Basilica,Vatican City,Vatican,Principal church of the Catholic Church
  Cologne Cathedral,Cologne,Germany,Gothic masterpiece and UNESCO World Heritage Site
  St. Paul's Cathedral,London,England,Anglican cathedral and iconic London landmark

Drawbacks

  • Increased parser complexity: Parsers must check for nested content after each row
  • Indentation sensitivity: Deeper nesting requires careful indentation management
  • Potential ambiguity: Row disambiguation must handle nested content detection
  • Learning curve: Users must understand when hybrid tabular applies vs expanded list

Alternatives Considered

Alternative 1: List Markers for Hybrid Rows

cathedrals[3]{name,city,country,description}:
  - St. Peter's Basilica,Vatican City,Vatican,Principal church
    massTimes[2]{day,time,type}:
      Sunday,"09:00",Papal Mass
      Sunday,"11:00",Solemn Mass

Rejected because: Deviates from TOON v3.0 tabular syntax which uses no - markers. Introduces inconsistency where tabular rows sometimes have markers, sometimes don't.

Alternative 2: Schema Inheritance in Header

cathedrals[3]{name,city,country,description,massTimes[]{day,time,type}}:
  St. Peter's Basilica,Vatican City,Vatican,Principal church
    massTimes[2]:
      Sunday,"09:00",Papal Mass
      Sunday,"11:00",Solemn Mass

Deferred: More complex parsing, reduces readability when viewing rows in isolation. Could be added as optional enhancement in future version.

Do Nothing

Not acceptable because: Forces verbose expanded list format for common use cases, significantly increasing token count in LLM contexts where compactness is valuable.

Impact on Implementations

Impact on Implementations

  • Reference implementation: Requires extending tabular row parsing to check for nested content at depth +1
  • Community implementations: Need to update parsers to handle nested content detection
  • Backward compatibility: Existing documents remain valid; new syntax is additive
  • Encoder options: New allowHybridTabular and includeNestedArrayHints options

Migration Strategy

Migration Strategy

For Implementers

  1. Update tabular row parser to check next line's depth
  2. If next line is at row depth +1 with a colon, parse as nested content
  3. Add support for fieldName[] hints in field declarations
  4. Add encoder options for hybrid tabular output

For Users

Existing TOON documents require no changes. To use hybrid tabular:

  1. Convert expanded list arrays to tabular format with field headers
  2. Move nested arrays to indented lines below each row
  3. Optionally add nested array hints to parent header

Test Cases

### Test 1: Basic Hybrid Tabular


{
  "name": "basic_hybrid_tabular",
  "input": {
    "items": [
      {"name": "A", "tags": [{"k": "v1"}, {"k": "v2"}]},
      {"name": "B", "tags": [{"k": "v3"}]}
    ]
  },
  "expected": "items[2]{name}:\n  A\n    tags[2]{k}:\n      v1\n      v2\n  B\n    tags[1]{k}:\n      v3",
  "note": "Tests basic hybrid tabular with nested arrays"
}


**Expected TOON (pretty):**

items[2]{name}:
  A
    tags[2]{k}:
      v1
      v2
  B
    tags[1]{k}:
      v3


### Test 2: Multiple Nested Arrays


{
  "name": "multiple_nested_arrays",
  "input": {
    "cathedrals": [
      {
        "name": "St. Peter's",
        "city": "Vatican",
        "massTimes": [{"day": "Sunday", "time": "09:00"}],
        "confessionTimes": [{"day": "Saturday", "time": "10:00"}]
      },
      {
        "name": "Cologne",
        "city": "Germany",
        "massTimes": [{"day": "Sunday", "time": "10:00"}],
        "confessionTimes": [{"day": "Friday", "time": "17:00"}]
      }
    ]
  },
  "expected": "cathedrals[2]{name,city}:\n  St. Peter's,Vatican\n    massTimes[1]{day,time}:\n      Sunday,\"09:00\"\n    confessionTimes[1]{day,time}:\n      Saturday,\"10:00\"\n  Cologne,Germany\n    massTimes[1]{day,time}:\n      Sunday,\"10:00\"\n    confessionTimes[1]{day,time}:\n      Friday,\"17:00\"",
  "note": "Tests rows with multiple nested arrays as siblings"
}


**Expected TOON (pretty):**

cathedrals[2]{name,city}:
  St. Peter's,Vatican
    massTimes[1]{day,time}:
      Sunday,"09:00"
    confessionTimes[1]{day,time}:
      Saturday,"10:00"
  Cologne,Germany
    massTimes[1]{day,time}:
      Sunday,"10:00"
    confessionTimes[1]{day,time}:
      Friday,"17:00"


### Test 3: Deeply Nested


{
  "name": "deeply_nested",
  "input": {
    "countries": [
      {
        "name": "Italy",
        "cities": [
          {
            "name": "Rome",
            "landmarks": [{"name": "Colosseum"}, {"name": "Pantheon"}]
          }
        ]
      },
      {
        "name": "France",
        "cities": [
          {
            "name": "Paris",
            "landmarks": [{"name": "Eiffel Tower"}]
          }
        ]
      }
    ]
  },
  "expected": "countries[2]{name}:\n  Italy\n    cities[1]{name}:\n      Rome\n        landmarks[2]{name}:\n          Colosseum\n          Pantheon\n  France\n    cities[1]{name}:\n      Paris\n        landmarks[1]{name}:\n          Eiffel Tower",
  "note": "Tests nested arrays within nested arrays"
}


**Expected TOON (pretty):**

countries[2]{name}:
  Italy
    cities[1]{name}:
      Rome
        landmarks[2]{name}:
          Colosseum
          Pantheon
  France
    cities[1]{name}:
      Paris
        landmarks[1]{name}:
          Eiffel Tower


### Test 4: Nested Array Hint in Header


{
  "name": "nested_array_hint",
  "input": {
    "products": [
      {"sku": "A1", "name": "Widget", "variants": [{"size": "S"}, {"size": "M"}]},
      {"sku": "B2", "name": "Gadget", "variants": [{"size": "L"}]}
    ]
  },
  "expected": "products[2]{sku,name,variants[]}:\n  A1,Widget\n    variants[2]{size}:\n      S\n      M\n  B2,Gadget\n    variants[1]{size}:\n      L",
  "note": "Tests fieldName[] hint in header",
  "encoderOption": "includeNestedArrayHints: true"
}


**Expected TOON (pretty):**

products[2]{sku,name,variants[]}:
  A1,Widget
    variants[2]{size}:
      S
      M
  B2,Gadget
    variants[1]{size}:
      L


### Test 5: Empty Nested Array


{
  "name": "empty_nested_array",
  "input": {
    "users": [
      {"name": "Alice", "roles": [{"name": "admin"}, {"name": "user"}]},
      {"name": "Bob", "roles": []}
    ]
  },
  "expected": "users[2]{name}:\n  Alice\n    roles[2]{name}:\n      admin\n      user\n  Bob\n    roles[0]{name}:",
  "note": "Tests [0]{fields}: within row"
}


**Expected TOON (pretty):**

users[2]{name}:
  Alice
    roles[2]{name}:
      admin
      user
  Bob
    roles[0]{name}:


### Test 6: Mixed Rows (Some With Nested, Some Without)


{
  "name": "mixed_rows",
  "input": {
    "items": [
      {"id": 1, "name": "Simple"},
      {"id": 2, "name": "Complex", "tags": [{"k": "a"}, {"k": "b"}]},
      {"id": 3, "name": "Another Simple"}
    ]
  },
  "expected": "items[3]{id,name}:\n  1,Simple\n  2,Complex\n    tags[2]{k}:\n      a\n      b\n  3,Another Simple",
  "note": "Tests some rows with nested content, others without"
}


**Expected TOON (pretty):**

items[3]{id,name}:
  1,Simple
  2,Complex
    tags[2]{k}:
      a
      b
  3,Another Simple


### Test 7: Row Disambiguation


{
  "name": "row_disambiguation",
  "input": {
    "records": [
      {"key": "a:b", "value": "x,y", "meta": [{"t": "1"}]},
      {"key": "c:d", "value": "z", "meta": [{"t": "2"}]}
    ]
  },
  "expected": "records[2]{key,value}:\n  \"a:b\",\"x,y\"\n    meta[1]{t}:\n      1\n  \"c:d\",z\n    meta[1]{t}:\n      2",
  "note": "Tests delimiter-before-colon detection with values containing colons"
}


**Expected TOON (pretty):**

records[2]{key,value}:
  "a:b","x,y"
    meta[1]{t}:
      1
  "c:d",z
    meta[1]{t}:
      2


### Test 8: Tab Delimiter


{
  "name": "tab_delimiter",
  "input": {
    "data": [
      {"col1": "a", "col2": "b", "nested": [{"x": "1"}, {"x": "2"}]},
      {"col1": "c", "col2": "d", "nested": [{"x": "3"}]}
    ]
  },
  "expected": "data[2\t]{col1\tcol2}:\n  a\tb\n    nested[2\t]{x}:\n      1\n      2\n  c\td\n    nested[1\t]{x}:\n      3",
  "note": "Tests hybrid tabular with tab delimiter",
  "encoderOption": "delimiter: tab"
}


**Expected TOON (pretty):**

data[2	]{col1	col2}:
  a	b
    nested[2	]{x}:
      1
      2
  c	d
    nested[1	]{x}:
      3


### Test 9: Quoted Values in Nested Arrays


{
  "name": "quoted_nested_values",
  "input": {
    "events": [
      {
        "name": "Conference",
        "sessions": [
          {"title": "Intro, Part 1", "time": "09:00"},
          {"title": "Q&A", "time": "10:00"}
        ]
      }
    ]
  },
  "expected": "events[1]{name}:\n  Conference\n    sessions[2]{title,time}:\n      \"Intro, Part 1\",\"09:00\"\n      Q&A,\"10:00\"",
  "note": "Tests values requiring quoting in nested arrays"
}


**Expected TOON (pretty):**

events[1]{name}:
  Conference
    sessions[2]{title,time}:
      "Intro, Part 1","09:00"
      Q&A,"10:00"


### Test 10: Single Row with Nested Content


{
  "name": "single_row_nested",
  "input": {
    "report": [
      {
        "title": "Q4 Report",
        "metrics": [
          {"name": "Revenue", "value": 1000000},
          {"name": "Costs", "value": 750000},
          {"name": "Profit", "value": 250000}
        ]
      }
    ]
  },
  "expected": "report[1]{title}:\n  Q4 Report\n    metrics[3]{name,value}:\n      Revenue,1000000\n      Costs,750000\n      Profit,250000",
  "note": "Tests single-element array with nested content"
}


**Expected TOON (pretty):**

report[1]{title}:
  Q4 Report
    metrics[3]{name,value}:
      Revenue,1000000
      Costs,750000
      Profit,250000

Affected Specification Sections

  • Section 6: Header Syntax (add nested array hint grammar)
  • Section 9.3: Arrays of Objects - Tabular Form (extend for nested content)
  • Section 12: Indentation and Whitespace (clarify nested content depth)
  • Section 14: Strict Mode Errors and Diagnostics (add hybrid tabular validations)
  • Section 19: TOON Core Profile (update tabular array rules)
  • Appendix A: Examples (add hybrid tabular examples)

Unresolved Questions

  1. Nested array hint scope: Should fieldName[] hints in headers be informational only, or should parsers validate their presence?

  2. Maximum nesting depth: Should there be a recommended limit for nested tabular depth?

  3. Schema inheritance: Should nested arrays be able to inherit field schemas from parent headers? (DRY vs complexity tradeoff)

  4. Empty nested arrays: How should empty nested arrays ([0]{fields}:) be represented?

    • Option A: Just the header line with no rows
    • Option B: Omit entirely if empty

Additional Context

  • This proposal addresses token efficiency concerns for LLM context serialization
  • Similar to how CSV handles nested data with denormalization, but TOON can preserve structure
  • Consistent with TOON's design philosophy of being "YAML-like but more compact"

Checklist

  • I have read the RFC process in CONTRIBUTING.md
  • I have searched for similar proposals
  • I have considered backward compatibility
  • I understand this may require community discussion before acceptance

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions