You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+21Lines changed: 21 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,27 @@ All notable changes to the TOON specification will be documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
+
## [3.0] - 2025-11-24
9
+
10
+
### Breaking Changes
11
+
12
+
- Standardized encoding for list-item objects whose first field is a tabular array (§10):
13
+
- Encoders MUST emit `- key[N]{fields}:` on the hyphen line.
14
+
- Tabular rows MUST appear at depth +2 relative to the hyphen line.
15
+
- All other fields of the same object MUST appear at depth +1.
16
+
- The v2.0 shallow form (rows and fields at the same depth) and the v2.1 bare-hyphen form are no longer normative and MUST NOT be emitted by conforming encoders.
17
+
18
+
### Changed
19
+
20
+
- Encoding/decoding rules (§10) simplified to describe only the YAML-style pattern; legacy layouts are treated as generic nesting and are not covered by conformance tests.
21
+
- Nested tabular list-item example in Appendix A updated to the canonical v3.0 form.
22
+
23
+
### Migration from v2.1
24
+
25
+
- Update encoders to emit the YAML-style form for list-item objects whose first field is a tabular array.
26
+
- If you rely on v2.0/v2.1 layouts, keep decoder compatibility in non-strict or implementation-defined modes; the spec no longer requires or tests these patterns.
27
+
- Optionally regenerate existing `.toon` files for consistent v3 formatting.
This repository contains the official specification for **Token-Oriented Object Notation (TOON)**, a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
@@ -10,7 +10,7 @@ This repository contains the official specification for **Token-Oriented Object
10
10
11
11
[→ Read the full specification (SPEC.md)](./SPEC.md)
Copy file name to clipboardExpand all lines: SPEC.md
+50-52Lines changed: 50 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
## Token-Oriented Object Notation
4
4
5
-
**Version:**2.1
5
+
**Version:**3.0
6
6
7
-
**Date:** 2025-11-23
7
+
**Date:** 2025-11-24
8
8
9
9
**Status:** Working Draft
10
10
@@ -20,7 +20,7 @@ Token-Oriented Object Notation (TOON) is a line-oriented, indentation-based text
20
20
21
21
## Status of This Document
22
22
23
-
This document is a Working Draft v2.1 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
23
+
This document is a Working Draft v3.0 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
24
24
25
25
This specification is stable for implementation but not yet finalized. Breaking changes may occur in future major versions.
26
26
@@ -227,12 +227,11 @@ Implementations that fail to conform to any MUST or REQUIRED level requirement a
227
227
228
228
## 3. Encoding Normalization (Reference Encoder)
229
229
230
-
Encoders MUST normalize non-JSON values to the JSON data model before encoding:
230
+
Encoders MUST normalize non-JSON values to the JSON data model before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
231
231
232
232
- Number:
233
233
- Finite → number (canonical decimal form per Section 2). -0 → 0.
234
234
- NaN, +Infinity, -Infinity → null.
235
-
- Non-JSON types MUST be normalized to the JSON data model (object, array, string, number, boolean, or null) before encoding. The mapping from host-specific types to JSON model is implementation-defined and MUST be documented.
236
235
- Examples of host-type normalization (non-normative):
237
236
- Date/time objects → ISO 8601 string representation.
238
237
- Set-like collections → array.
@@ -384,9 +383,9 @@ A string value MUST be quoted if any of the following is true:
384
383
- It contains a colon (:), double quote ("), or backslash (\).
385
384
- It contains brackets or braces ([, ], {, }).
386
385
- It contains control characters: newline, carriage return, or tab.
387
-
- It contains the relevant delimiter:
388
-
- Inside array scope: the active delimiter (Section 1).
389
-
- Outside array scope: the document delimiter (Section 1).
386
+
- It contains the relevant delimiter (see §11 for complete delimiter rules):
387
+
- For inline array values and tabular row cells: the active delimiter from the nearest array header.
388
+
- For object field values (key: value): the document delimiter, even when the object is within an array's scope.
390
389
- It equals "-" or starts with "-" (any hyphen at position 0).
391
390
392
391
Otherwise, the string MAY be emitted without quotes. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted provided they do not violate the conditions.
@@ -403,12 +402,10 @@ Encoders MAY perform key folding when enabled (see §13.4 for complete folding r
403
402
404
403
### 7.4 Decoding Rules for Strings and Keys (Decoding)
405
404
406
-
- Quoted strings and keys MUST be unescaped per Section 7.1; any other escape MUST error. Quoted primitives remain strings.
407
-
- Unquoted values:
408
-
- true/false/null → boolean/null
409
-
- Numeric tokens → numbers (with the leading-zero rule in Section 4)
410
-
- Otherwise → strings
411
-
- Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error.
405
+
Decoding of value tokens follows §4 (unquoted type inference, quoted strings, numeric rules). This section adds key-specific requirements:
406
+
407
+
- Quoted keys MUST be unescaped per Section 7.1; any other escape MUST error.
408
+
- Keys (quoted or unquoted) MUST be followed by ":"; missing colon MUST error (see also §14.2).
412
409
413
410
## 8. Objects
414
411
@@ -421,7 +418,6 @@ Encoders MAY perform key folding when enabled (see §13.4 for complete folding r
421
418
- Decoding:
422
419
- A line "key:" with nothing after the colon at depth d opens an object; subsequent lines at depth > d belong to that object until the depth decreases to ≤ d.
423
420
- Lines "key: value" at the same depth are sibling fields.
424
-
- Missing colon after a key MUST error.
425
421
426
422
## 9. Arrays
427
423
@@ -474,6 +470,7 @@ Decoding:
474
470
- Delimiter before colon → row.
475
471
- Colon before delimiter → key-value line (end of rows).
476
472
- If a line has an unquoted colon but no unquoted active delimiter → key-value line (end of rows).
473
+
- When a tabular array appears as the first field of a list-item object, indentation is governed by Section 10.
477
474
478
475
### 9.4 Mixed / Non-Uniform Arrays — Expanded List
479
476
@@ -499,48 +496,44 @@ Decoding:
499
496
For an object appearing as a list item:
500
497
501
498
- Empty object list item: a single "-" at the list-item indentation level.
502
-
- Encoding selection (normative):
503
-
- When an object has **exactly one field** and that field encodes to a tabular array, encoders SHOULD use the compact form with the tabular header on the hyphen line:
504
-
- Tabular array: - key[N<delim?>]{fields}:
505
-
- Followed by tabular rows at depth +1 (relative to the hyphen line).
506
-
- For all other cases (multiple fields, or single non-tabular field), encoders SHOULD emit a bare hyphen on its own line:
507
-
- Bare hyphen: -
508
-
- All fields appear at depth +1 under the hyphen line in encounter order, using normal object field rules (Section 8).
509
-
- When a field is a tabular array, its header appears at depth +1 and its rows at depth +2 (relative to the hyphen line).
510
-
- First field on the hyphen line (legacy encoding, still valid for decoding):
511
-
- Primitive: - key: value
512
-
- Primitive array: - key[M<delim?>]: v1<delim>…
513
-
- Tabular array: - key[N<delim?>]{fields}:
514
-
- Followed by tabular rows at depth +1 (relative to the hyphen line).
515
-
- Non-uniform array: - key[N<delim?>]:
516
-
- Followed by list items at depth +1.
517
-
- Object: - key:
518
-
- Nested object fields appear at depth +2 (i.e., one deeper than subsequent sibling fields of the same list item).
519
-
- Remaining fields of the same object appear at depth +1 under the hyphen line in encounter order, using normal object field rules.
520
-
521
-
Decoding:
522
-
- The first field is parsed from the hyphen line. If it is a nested object (- key:), nested fields are at +2 relative to the hyphen line; subsequent fields of the same list item are at +1.
523
-
- If the first field is a tabular header on the hyphen line, its rows are at +1; subsequent sibling fields continue at +1 after the rows.
499
+
- Encoding (normative):
500
+
- When a list-item object has a tabular array (Section 9.3) as its first field in encounter order, encoders MUST emit the tabular header on the hyphen line:
501
+
- The hyphen and tabular header appear on the same line at the list-item depth: - key[N<delim?>]{fields}:
502
+
- Tabular rows MUST appear at depth +2 (relative to the hyphen line).
503
+
- All other fields of the same object MUST appear at depth +1 under the hyphen line, in encounter order, using normal object field rules (Section 8).
504
+
- Encoders MUST NOT emit tabular rows at depth +1 or sibling fields at the same depth as rows when the first field is a tabular array.
505
+
- For all other cases (first field is not a tabular array), encoders SHOULD place the first field on the hyphen line. A bare hyphen on its own line is used only for empty list-item objects.
506
+
- Decoding (normative):
507
+
- When a decoder encounters a list-item line of the form - key[N<delim?>]{fields}: at depth d, it MUST treat this as the start of a tabular array field named key in the list-item object.
508
+
- Lines at depth d+2 that conform to tabular row syntax (Section 9.3) are rows of that tabular array.
509
+
- Lines at depth d+1 are additional fields of the same list-item object; the presence of a line at depth d+1 after rows terminates the rows.
510
+
- All other object-as-list-item patterns (bare hyphen, first field on hyphen line for non-tabular values) are decoded according to the general rules in Section 8 and Section 9.
524
511
525
512
## 11. Delimiters
526
513
527
514
- Supported delimiters:
528
515
- Comma (default): header omits the delimiter symbol.
529
516
- Tab: header includes HTAB inside brackets and braces (e.g., [N<TAB>], {a<TAB>b}); rows/inline arrays use tabs.
530
517
- Pipe: header includes "|" inside brackets and braces; rows/inline arrays use "|".
531
-
- Document vs Active delimiter:
532
-
- Encoders select a document delimiter (option) that influences quoting for all object values (key: value) throughout the document.
533
-
- Inside an array header's scope, the active delimiter governs splitting and quoting only for inline arrays and tabular rows that the header introduces. Object values (key: value) follow document-delimiter quoting rules regardless of array scope.
534
-
- Delimiter-aware quoting (encoding):
535
-
- Inline array values and tabular row cells: strings containing the active delimiter MUST be quoted to avoid splitting.
536
-
- Object values (key: value): encoders use the document delimiter to decide delimiter-aware quoting, regardless of whether the object appears within an array's scope.
537
-
- Strings containing non-active delimiters do not require quoting unless another quoting condition applies (Section 7.2).
538
-
- Delimiter-aware parsing (decoding):
539
-
- Inline arrays and tabular rows MUST be split only on the active delimiter declared by the nearest array header.
518
+
519
+
### 11.1 Encoding Rules (Normative for Encoders)
520
+
521
+
- Document delimiter: Encoders select a document delimiter (option: comma, tab, pipe; default comma) that influences quoting for all object field values (key: value) throughout the document.
522
+
- Active delimiter: Inside an array header's scope, the active delimiter governs quoting only for inline array values and tabular row cells.
523
+
- Delimiter-aware quoting:
524
+
- Inline array values and tabular row cells: strings containing the active delimiter MUST be quoted.
525
+
- Object field values (key: value): encoders use the document delimiter to decide delimiter-aware quoting, regardless of whether the object appears within an array's scope.
526
+
- Strings containing non-active delimiters do not require quoting unless another condition applies (§7.2).
527
+
528
+
### 11.2 Decoding Rules (Normative for Decoders)
529
+
530
+
- Active delimiter: Decoders use only the active delimiter declared by the nearest array header to split inline arrays and tabular rows.
531
+
- Delimiter-aware parsing:
532
+
- Inline arrays and tabular rows MUST be split only on the active delimiter.
540
533
- Splitting MUST preserve empty tokens; surrounding spaces are trimmed, and empty tokens decode to the empty string.
541
-
- Strings containing the active delimiter MUST be quoted to avoid splitting; non-active delimiters MUST NOT cause splits.
542
534
- Nested headers may change the active delimiter; decoding MUST use the delimiter declared by the nearest header.
543
-
- If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope.
535
+
- If the bracket declares tab or pipe, the same symbol MUST be used in the fields segment and for splitting all rows/values in that scope (§6).
536
+
- Object field values (key: value): Decoders parse the entire post-colon token as a single value; document delimiter is not a decoder concept.
544
537
545
538
## 12. Indentation and Whitespace
546
539
@@ -738,12 +731,14 @@ When strict mode is enabled (default), decoders MUST error on the following cond
738
731
739
732
### 14.3 Indentation Errors
740
733
734
+
See §12 for indentation semantics. In strict mode, decoders MUST error on:
741
735
- Leading spaces not a multiple of indentSize.
742
736
- Any tab used in indentation (tabs allowed in quoted strings and as HTAB delimiter).
743
737
744
738
### 14.4 Structural Errors
745
739
746
-
- Blank lines inside arrays/tabular rows.
740
+
See §12 for blank line semantics. In strict mode, decoders MUST error on:
741
+
- Blank lines inside arrays/tabular rows (between the first and last item/row).
747
742
748
743
For root-form rules, including handling of empty documents, see §5.
749
744
@@ -1000,14 +995,13 @@ items[2]:
1000
995
Nested tabular inside a list item:
1001
996
```
1002
997
items[1]:
1003
-
-
1004
-
users[2]{id,name}:
998
+
- users[2]{id,name}:
1005
999
1,Ada
1006
1000
2,Bob
1007
1001
status: active
1008
1002
```
1009
1003
1010
-
Note: Encoders use this format (bare hyphen with all fields indented) for objects with multiple fields. Older encodings may place the first field on the hyphen line; both are valid for decoders.
1004
+
Note: When a list-item object has a tabular array as its first field, encoders emit the tabular header on the hyphen line with rows at depth +2 and other fields at depth +1. This is the canonical encoding for list-item objects whose first field is a tabular array.
This appendix summarizes major changes between spec versions. For the complete changelog, see [`CHANGELOG.md`](./CHANGELOG.md) in the specification repository.
1237
1231
1232
+
### v3.0 (2025-11-24)
1233
+
1234
+
- Standardized encoding for list-item objects whose first field is a tabular array (§10).
1235
+
1238
1236
### v2.1 (2025-11-23)
1239
1237
1240
1238
- Tightened canonical encoding for objects as list items (§10): bare `-` for multi-field objects, compact `- key[N]{fields}:` only for single-field tabular arrays, to improve visual consistency and LLM readability.
0 commit comments