Skip to content

Commit 0bd6c96

Browse files
feat(spec): v3: standardized encoding for list-item objects
1 parent 7fddfa2 commit 0bd6c96

6 files changed

Lines changed: 131 additions & 123 deletions

File tree

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,27 @@ All notable changes to the TOON specification will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [3.0] - 2025-11-24
9+
10+
### Breaking Changes
11+
12+
- Standardized encoding for list-item objects whose first field is a tabular array (§10):
13+
- Encoders MUST emit `- key[N]{fields}:` on the hyphen line.
14+
- Tabular rows MUST appear at depth +2 relative to the hyphen line.
15+
- All other fields of the same object MUST appear at depth +1.
16+
- The v2.0 shallow form (rows and fields at the same depth) and the v2.1 bare-hyphen form are no longer normative and MUST NOT be emitted by conforming encoders.
17+
18+
### Changed
19+
20+
- Encoding/decoding rules (§10) simplified to describe only the YAML-style pattern; legacy layouts are treated as generic nesting and are not covered by conformance tests.
21+
- Nested tabular list-item example in Appendix A updated to the canonical v3.0 form.
22+
23+
### Migration from v2.1
24+
25+
- Update encoders to emit the YAML-style form for list-item objects whose first field is a tabular array.
26+
- If you rely on v2.0/v2.1 layouts, keep decoder compatibility in non-strict or implementation-defined modes; the spec no longer requires or tests these patterns.
27+
- Optionally regenerate existing `.toon` files for consistent v3 formatting.
28+
829
## [2.1] - 2025-11-23
930

1031
### Changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# TOON Format Specification
22

3-
[![SPEC v2.1](https://img.shields.io/badge/spec-v2.1-lightgrey)](./SPEC.md)
4-
[![Tests](https://img.shields.io/badge/tests-344-green)](./tests/fixtures/)
3+
[![SPEC v3.0](https://img.shields.io/badge/spec-v3.0-lightgrey)](./SPEC.md)
4+
[![Tests](https://img.shields.io/badge/tests-345-green)](./tests/fixtures/)
55
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
66

77
This repository contains the official specification for **Token-Oriented Object Notation (TOON)**, a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
@@ -10,7 +10,7 @@ This repository contains the official specification for **Token-Oriented Object
1010

1111
[→ Read the full specification (SPEC.md)](./SPEC.md)
1212

13-
- **Version:** 2.1 (2025-11-23)
13+
- **Version:** 3.0 (2025-11-24)
1414
- **Status:** Working Draft
1515
- **License:** MIT
1616

SPEC.md

Lines changed: 50 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Token-Oriented Object Notation
44

5-
**Version:** 2.1
5+
**Version:** 3.0
66

7-
**Date:** 2025-11-23
7+
**Date:** 2025-11-24
88

99
**Status:** Working Draft
1010

@@ -20,7 +20,7 @@ Token-Oriented Object Notation (TOON) is a line-oriented, indentation-based text
2020

2121
## Status of This Document
2222

23-
This document is a Working Draft v2.1 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
23+
This document is a Working Draft v3.0 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
2424

2525
This specification is stable for implementation but not yet finalized. Breaking changes may occur in future major versions.
2626

@@ -227,12 +227,11 @@ Implementations that fail to conform to any MUST or REQUIRED level requirement a
227227

228228
## 3. Encoding Normalization (Reference Encoder)
229229

230-
Encoders MUST normalize non-JSON values to the JSON data model before encoding:
230+
Encoders MUST normalize non-JSON values to the JSON data model before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
231231

232232
- Number:
233233
- Finite → number (canonical decimal form per Section 2). -0 → 0.
234234
- NaN, +Infinity, -Infinity → null.
235-
- Non-JSON types MUST be normalized to the JSON data model (object, array, string, number, boolean, or null) before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
236235
- Examples of host-type normalization (non-normative):
237236
- Date/time objects → ISO 8601 string representation.
238237
- Set-like collections → array.
@@ -384,9 +383,9 @@ A string value MUST be quoted if any of the following is true:
384383
- It contains a colon (:), double quote ("), or backslash (\).
385384
- It contains brackets or braces ([, ], {, }).
386385
- It contains control characters: newline, carriage return, or tab.
387-
- It contains the relevant delimiter:
388-
- Inside array scope: the active delimiter (Section 1).
389-
- Outside array scope: the document delimiter (Section 1).
386+
- It contains the relevant delimiter (see §11 for complete delimiter rules):
387+
- For inline array values and tabular row cells: the active delimiter from the nearest array header.
388+
- For object field values (key: value): the document delimiter, even when the object is within an array's scope.
390389
- It equals "-" or starts with "-" (any hyphen at position 0).
391390
392391
Otherwise, the string MAY be emitted without quotes. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted provided they do not violate the conditions.
@@ -403,12 +402,10 @@ Encoders MAY perform key folding when enabled (see §13.4 for complete folding r
403402
404403
### 7.4 Decoding Rules for Strings and Keys (Decoding)
405404
406-
- Quoted strings and keys MUST be unescaped per Section 7.1; any other escape MUST error. Quoted primitives remain strings.
407-
- Unquoted values:
408-
- true/false/null → boolean/null
409-
- Numeric tokens → numbers (with the leading-zero rule in Section 4)
410-
- Otherwise → strings
411-
- Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error.
405+
Decoding of value tokens follows §4 (unquoted type inference, quoted strings, numeric rules). This section adds key-specific requirements:
406+
407+
- Quoted keys MUST be unescaped per Section 7.1; any other escape MUST error.
408+
- Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error (see also §14.2).
412409
413410
## 8. Objects
414411
@@ -421,7 +418,6 @@ Encoders MAY perform key folding when enabled (see §13.4 for complete folding r
421418
- Decoding:
422419
- A line "key:" with nothing after the colon at depth d opens an object; subsequent lines at depth > d belong to that object until the depth decreases to ≤ d.
423420
- Lines "key: value" at the same depth are sibling fields.
424-
- Missing colon after a key MUST error.
425421
426422
## 9. Arrays
427423
@@ -474,6 +470,7 @@ Decoding:
474470
- Delimiter before colon → row.
475471
- Colon before delimiter → key-value line (end of rows).
476472
- If a line has an unquoted colon but no unquoted active delimiter → key-value line (end of rows).
473+
- When a tabular array appears as the first field of a list-item object, indentation is governed by Section 10.
477474
478475
### 9.4 Mixed / Non-Uniform Arrays — Expanded List
479476
@@ -499,48 +496,44 @@ Decoding:
499496
For an object appearing as a list item:
500497
501498
- Empty object list item: a single "-" at the list-item indentation level.
502-
- Encoding selection (normative):
503-
- When an object has **exactly one field** and that field encodes to a tabular array, encoders SHOULD use the compact form with the tabular header on the hyphen line:
504-
- Tabular array: - key[N<delim?>]{fields}:
505-
- Followed by tabular rows at depth +1 (relative to the hyphen line).
506-
- For all other cases (multiple fields, or single non-tabular field), encoders SHOULD emit a bare hyphen on its own line:
507-
- Bare hyphen: -
508-
- All fields appear at depth +1 under the hyphen line in encounter order, using normal object field rules (Section 8).
509-
- When a field is a tabular array, its header appears at depth +1 and its rows at depth +2 (relative to the hyphen line).
510-
- First field on the hyphen line (legacy encoding, still valid for decoding):
511-
- Primitive: - key: value
512-
- Primitive array: - key[M<delim?>]: v1<delim>…
513-
- Tabular array: - key[N<delim?>]{fields}:
514-
- Followed by tabular rows at depth +1 (relative to the hyphen line).
515-
- Non-uniform array: - key[N<delim?>]:
516-
- Followed by list items at depth +1.
517-
- Object: - key:
518-
- Nested object fields appear at depth +2 (i.e., one deeper than subsequent sibling fields of the same list item).
519-
- Remaining fields of the same object appear at depth +1 under the hyphen line in encounter order, using normal object field rules.
520-
521-
Decoding:
522-
- The first field is parsed from the hyphen line. If it is a nested object (- key:), nested fields are at +2 relative to the hyphen line; subsequent fields of the same list item are at +1.
523-
- If the first field is a tabular header on the hyphen line, its rows are at +1; subsequent sibling fields continue at +1 after the rows.
499+
- Encoding (normative):
500+
- When a list-item object has a tabular array (Section 9.3) as its first field in encounter order, encoders MUST emit the tabular header on the hyphen line:
501+
- The hyphen and tabular header appear on the same line at the list-item depth: - key[N<delim?>]{fields}:
502+
- Tabular rows MUST appear at depth +2 (relative to the hyphen line).
503+
- All other fields of the same object MUST appear at depth +1 under the hyphen line, in encounter order, using normal object field rules (Section 8).
504+
- Encoders MUST NOT emit tabular rows at depth +1 or sibling fields at the same depth as rows when the first field is a tabular array.
505+
- For all other cases (first field is not a tabular array), encoders SHOULD place the first field on the hyphen line. A bare hyphen on its own line is used only for empty list-item objects.
506+
- Decoding (normative):
507+
- When a decoder encounters a list-item line of the form - key[N<delim?>]{fields}: at depth d, it MUST treat this as the start of a tabular array field named key in the list-item object.
508+
- Lines at depth d+2 that conform to tabular row syntax (Section 9.3) are rows of that tabular array.
509+
- Lines at depth d+1 are additional fields of the same list-item object; the presence of a line at depth d+1 after rows terminates the rows.
510+
- All other object-as-list-item patterns (bare hyphen, first field on hyphen line for non-tabular values) are decoded according to the general rules in Section 8 and Section 9.
524511
525512
## 11. Delimiters
526513
527514
- Supported delimiters:
528515
- Comma (default): header omits the delimiter symbol.
529516
- Tab: header includes HTAB inside brackets and braces (e.g., [N<TAB>], {a<TAB>b}); rows/inline arrays use tabs.
530517
- Pipe: header includes "|" inside brackets and braces; rows/inline arrays use "|".
531-
- Document vs Active delimiter:
532-
- Encoders select a document delimiter (option) that influences quoting for all object values (key: value) throughout the document.
533-
- Inside an array header's scope, the active delimiter governs splitting and quoting only for inline arrays and tabular rows that the header introduces. Object values (key: value) follow document-delimiter quoting rules regardless of array scope.
534-
- Delimiter-aware quoting (encoding):
535-
- Inline array values and tabular row cells: strings containing the active delimiter MUST be quoted to avoid splitting.
536-
- Object values (key: value): encoders use the document delimiter to decide delimiter-aware quoting, regardless of whether the object appears within an array's scope.
537-
- Strings containing non-active delimiters do not require quoting unless another quoting condition applies (Section 7.2).
538-
- Delimiter-aware parsing (decoding):
539-
- Inline arrays and tabular rows MUST be split only on the active delimiter declared by the nearest array header.
518+
519+
### 11.1 Encoding Rules (Normative for Encoders)
520+
521+
- Document delimiter: Encoders select a document delimiter (option: comma, tab, pipe; default comma) that influences quoting for all object field values (key: value) throughout the document.
522+
- Active delimiter: Inside an array header's scope, the active delimiter governs quoting only for inline array values and tabular row cells.
523+
- Delimiter-aware quoting:
524+
- Inline array values and tabular row cells: strings containing the active delimiter MUST be quoted.
525+
- Object field values (key: value): encoders use the document delimiter to decide delimiter-aware quoting, regardless of whether the object appears within an array's scope.
526+
- Strings containing non-active delimiters do not require quoting unless another condition applies (§7.2).
527+
528+
### 11.2 Decoding Rules (Normative for Decoders)
529+
530+
- Active delimiter: Decoders use only the active delimiter declared by the nearest array header to split inline arrays and tabular rows.
531+
- Delimiter-aware parsing:
532+
- Inline arrays and tabular rows MUST be split only on the active delimiter.
540533
- Splitting MUST preserve empty tokens; surrounding spaces are trimmed, and empty tokens decode to the empty string.
541-
- Strings containing the active delimiter MUST be quoted to avoid splitting; non-active delimiters MUST NOT cause splits.
542534
- Nested headers may change the active delimiter; decoding MUST use the delimiter declared by the nearest header.
543-
- If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope.
535+
- If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope (§6).
536+
- Object field values (key: value): Decoders parse the entire post-colon token as a single value; document delimiter is not a decoder concept.
544537
545538
## 12. Indentation and Whitespace
546539
@@ -738,12 +731,14 @@ When strict mode is enabled (default), decoders MUST error on the following cond
738731
739732
### 14.3 Indentation Errors
740733
734+
See §12 for indentation semantics. In strict mode, decoders MUST error on:
741735
- Leading spaces not a multiple of indentSize.
742736
- Any tab used in indentation (tabs allowed in quoted strings and as HTAB delimiter).
743737
744738
### 14.4 Structural Errors
745739
746-
- Blank lines inside arrays/tabular rows.
740+
See §12 for blank line semantics. In strict mode, decoders MUST error on:
741+
- Blank lines inside arrays/tabular rows (between the first and last item/row).
747742
748743
For root-form rules, including handling of empty documents, see §5.
749744
@@ -1000,14 +995,13 @@ items[2]:
1000995
Nested tabular inside a list item:
1001996
```
1002997
items[1]:
1003-
-
1004-
users[2]{id,name}:
998+
- users[2]{id,name}:
1005999
1,Ada
10061000
2,Bob
10071001
status: active
10081002
```
10091003

1010-
Note: Encoders use this format (bare hyphen with all fields indented) for objects with multiple fields. Older encodings may place the first field on the hyphen line; both are valid for decoders.
1004+
Note: When a list-item object has a tabular array as its first field, encoders emit the tabular header on the hyphen line with rows at depth +2 and other fields at depth +1. This is the canonical encoding for list-item objects whose first field is a tabular array.
10111005

10121006
Delimiter variations:
10131007
```
@@ -1235,6 +1229,10 @@ Note: Host-type normalization tests (e.g., BigInt, Date, Set, Map) are language-
12351229

12361230
This appendix summarizes major changes between spec versions. For the complete changelog, see [`CHANGELOG.md`](./CHANGELOG.md) in the specification repository.
12371231

1232+
### v3.0 (2025-11-24)
1233+
1234+
- Standardized encoding for list-item objects whose first field is a tabular array (§10).
1235+
12381236
### v2.1 (2025-11-23)
12391237

12401238
- Tightened canonical encoding for objects as list items (§10): bare `-` for multi-field objects, compact `- key[N]{fields}:` only for single-field tabular arrays, to improve visual consistency and LLM readability.

tests/fixtures/decode/arrays-nested.json

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"version": "2.1",
2+
"version": "3.0",
33
"category": "decode",
44
"description": "Nested and mixed array decoding - list format, arrays of arrays, root arrays, mixed types",
55
"tests": [
@@ -52,8 +52,8 @@
5252
"specSection": "9.4"
5353
},
5454
{
55-
"name": "parses nested tabular arrays as first field on hyphen line (legacy)",
56-
"input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
55+
"name": "parses list items whose first field is a tabular array",
56+
"input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
5757
"expected": {
5858
"items": [
5959
{
@@ -66,25 +66,23 @@
6666
]
6767
},
6868
"specSection": "10",
69-
"note": "Still valid for backward compatibility"
69+
"note": "Canonical encoding: tabular header on hyphen line, rows at depth +2, sibling fields at depth +1"
7070
},
7171
{
72-
"name": "parses nested tabular arrays in list items with bare hyphen",
73-
"input": "items[1]:\n -\n users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
72+
"name": "parses single-field list-item object with tabular array",
73+
"input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob",
7474
"expected": {
7575
"items": [
7676
{
7777
"users": [
7878
{ "id": 1, "name": "Ada" },
7979
{ "id": 2, "name": "Bob" }
80-
],
81-
"status": "active"
80+
]
8281
}
8382
]
8483
},
8584
"specSection": "10",
86-
"minSpecVersion": "2.1",
87-
"note": "Canonical v2.1+ encoding (bare hyphen with all fields indented)"
85+
"note": "Single-field list-item object: only the tabular array, no sibling fields"
8886
},
8987
{
9088
"name": "parses objects containing arrays (including empty arrays) in list format",
@@ -98,7 +96,7 @@
9896
},
9997
{
10098
"name": "parses arrays of arrays within objects",
101-
"input": "items[1]:\n - matrix[2]:\n - [2]: 1,2\n - [2]: 3,4\n name: grid",
99+
"input": "items[1]:\n - matrix[2]:\n - [2]: 1,2\n - [2]: 3,4\n name: grid",
102100
"expected": {
103101
"items": [
104102
{ "matrix": [[1, 2], [3, 4]], "name": "grid" }

tests/fixtures/encode/arrays-nested.json

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"version": "2.1",
2+
"version": "3.0",
33
"category": "encode",
44
"description": "Nested and mixed array encoding - arrays of arrays, mixed type arrays, root arrays",
55
"tests": [
@@ -50,16 +50,14 @@
5050
{
5151
"name": "encodes root-level array of non-uniform objects in list format",
5252
"input": [{ "id": 1 }, { "id": 2, "name": "Ada" }],
53-
"expected": "[2]:\n -\n id: 1\n -\n id: 2\n name: Ada",
54-
"specSection": "9.4",
55-
"minSpecVersion": "2.1"
53+
"expected": "[2]:\n - id: 1\n - id: 2\n name: Ada",
54+
"specSection": "9.4"
5655
},
5756
{
5857
"name": "encodes root-level array mixing primitive, object, and array of objects in list format",
5958
"input": ["summary", { "id": 1, "name": "Ada" }, [{ "id": 2 }, { "status": "draft" }]],
60-
"expected": "[3]:\n - summary\n -\n id: 1\n name: Ada\n - [2]:\n -\n id: 2\n -\n status: draft",
61-
"specSection": "9.4",
62-
"minSpecVersion": "2.1"
59+
"expected": "[3]:\n - summary\n - id: 1\n name: Ada\n - [2]:\n - id: 2\n - status: draft",
60+
"specSection": "9.4"
6361
},
6462
{
6563
"name": "encodes root-level arrays of arrays",
@@ -92,18 +90,16 @@
9290
"input": {
9391
"items": [1, { "a": 1 }, "text"]
9492
},
95-
"expected": "items[3]:\n - 1\n -\n a: 1\n - text",
96-
"specSection": "9.4",
97-
"minSpecVersion": "2.1"
93+
"expected": "items[3]:\n - 1\n - a: 1\n - text",
94+
"specSection": "9.4"
9895
},
9996
{
10097
"name": "uses list format for arrays mixing objects and arrays",
10198
"input": {
10299
"items": [{ "a": 1 }, [1, 2]]
103100
},
104-
"expected": "items[2]:\n -\n a: 1\n - [2]: 1,2",
105-
"specSection": "9.4",
106-
"minSpecVersion": "2.1"
101+
"expected": "items[2]:\n - a: 1\n - [2]: 1,2",
102+
"specSection": "9.4"
107103
}
108104
]
109105
}

0 commit comments

Comments
 (0)