Skip to content

Commit a5c25a1

Browse files
feat(spec): parse nested tabular arrays in list items with bare hyphen
1 parent 78d3b20 commit a5c25a1

6 files changed

Lines changed: 157 additions & 115 deletions

File tree

CHANGELOG.md

Lines changed: 34 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -5,87 +5,83 @@ All notable changes to the TOON specification will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [2.1] - 2025-11-23
9+
10+
### Changed
11+
12+
- Canonical encoding for objects as list items (§10):
13+
- Encoders SHOULD emit `- key[N]{fields}:` only when the list-item object has exactly one field and that field is a tabular array.
14+
- In all other cases, encoders SHOULD emit a bare `-` line and place all fields at depth +1; tabular array headers then appear at depth +1 and their rows at depth +2.
15+
816
## [2.0] - 2025-11-10
917

1018
### Breaking Changes
1119

12-
- **Removed:** Length marker (`#`) prefix in array headers has been completely removed from the specification
13-
- The `[#N]` format is no longer valid syntax. All array headers MUST use `[N]` format only
14-
- Encoders MUST NOT emit `[#N]` format
15-
- Decoders MUST NOT accept `[#N]` format (breaking change from v1.5)
20+
- Removed `[#N]` length-marker syntax in array headers; `[N]` is now the only valid format.
21+
- Encoders MUST NOT emit `[#N]`; decoders MUST reject it.
1622

1723
### Removed
1824

19-
- All references to length marker from terminology (§1.4), header syntax (§6), ABNF grammar, conformance requirements (§13.2), and parsing helpers (Appendix B)
20-
- `lengthMarker` encoder option removed from all implementations
21-
- Length marker test fixtures removed
25+
- The `lengthMarker` encoder option and any CLI flags exposing it.
2226

2327
### Migration from v1.5
2428

25-
- Update decoder implementations to reject `[#N]` syntax
26-
- Convert any existing `.toon` files using `[#N]` format to `[N]` format
27-
- Remove `lengthMarker` option from encoder configurations
28-
- Remove `--length-marker` CLI flags if present
29+
- Update decoders to reject `[#N]` syntax.
30+
- Convert existing `.toon` files using `[#N]` to `[N]`.
31+
- Remove `lengthMarker` configuration and CLI options.
2932

3033
## [1.5] - 2025-11-08
3134

3235
### Added
3336

34-
- Optional key folding for encoders: `keyFolding="safe"` mode with `flattenDepth` control to collapse single-key object chains into dotted-path notation (§13.4)
35-
- Optional path expansion for decoders: `expandPaths="safe"` mode to split dotted keys into nested objects, with conflict resolution tied to `strict` option (§13.4, §14.5)
36-
- IdentifierSegment terminology and path separator definition (fixed to `"."` in v1.5) (§1.9)
37-
- Deep-merge semantics for path expansion: recursive merge for objects, error on conflict when `strict=true`, last-write-wins (LWW) when `strict=false` (§13.4)
37+
- Optional key folding for encoders: `keyFolding="safe"` with `flattenDepth` to collapse single-key object chains into dotted paths (§13.4).
38+
- Optional path expansion for decoders: `expandPaths="safe"` to split dotted keys into nested objects with deep-merge semantics and conflict handling tied to `strict` (§13.4, §14.5).
39+
- IdentifierSegment terminology and fixed `"."` path separator for safe folding/expansion (§1.9).
3840

3941
### Changed
4042

41-
- Both new features default to OFF and are fully backward-compatible
42-
- Safe-mode folding requires IdentifierSegment validation, collision avoidance, and no quoting
43+
- Safe-mode folding requires IdentifierSegment-only segments, no path separator in segments, no quoting, and collision avoidance.
44+
- Both features default to `off` and are backward-compatible.
4345

4446
## [1.4] - 2025-11-05
4547

4648
### Changed
4749

48-
- Removed JavaScript-specific normalization details from specification; replaced with language-agnostic requirements (Section 3)
49-
- Defined canonical number format for encoders: no exponent notation, no trailing zeros, no leading zeros except "0" (Section 2)
50-
- Clarified decoder handling of exponent notation and out-of-range numbers (Section 2)
51-
- Expanded `\w` regex notation to explicit character class `[A-Za-z0-9_]` for cross-language clarity (Section 7.3)
52-
- Clarified non-strict mode tab handling as implementation-defined (Section 12)
50+
- Generalized normalization rules and defined canonical number format for encoders (no exponent notation, no trailing zeros, no leading zeros except `"0"`), plus decoder handling of exponent forms and out-of-range numbers (§2-§3).
51+
- Replaced `\w` with explicit `[A-Za-z0-9_]` in key regexes for cross-language clarity (§7.3).
52+
- Clarified non-strict mode tab handling as implementation-defined (§12).
5353

5454
### Added
5555

56-
- Appendix G: Host Type Normalization Examples with guidance for Go, JavaScript, Python, and Rust implementations
56+
- Appendix G with host-type normalization examples for Go, JavaScript, Python, and Rust.
5757

5858
## [1.3] - 2025-10-31
5959

6060
### Added
6161

62-
- Numeric precision requirements: JavaScript implementations SHOULD use `Number.toString()` precision (15-17 digits), all implementations MUST preserve round-trip fidelity (Section 2)
63-
- RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (Section 6)
62+
- Numeric precision requirements: JavaScript implementations SHOULD use `Number.toString()` precision (1517 digits); all implementations MUST preserve round-trip fidelity (§2).
63+
- RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (§6).
6464

6565
## [1.2] - 2025-10-29
6666

6767
### Changed
6868

69-
- Clarified delimiter scoping behavior between array headers
70-
- Tightened strict-mode indentation requirements: leading spaces MUST be exact multiples of indentSize; tabs in indentation MUST error
71-
- Defined blank-line and trailing-newline decoding behavior with explicit skipping rules outside arrays
72-
- Clarified hyphen-based quoting: "-" or any string starting with "-" MUST be quoted
73-
- Clarified BigInt normalization: values outside safe integer range are converted to quoted decimal strings
74-
- Clarified row/key disambiguation: uses first unquoted delimiter vs colon position
69+
- Tightened delimiter scoping, indentation, blank-line handling, and hyphen-based quoting rules (§11-§12).
70+
- Clarified BigInt normalization (out-of-range values → quoted decimal strings) and row/key disambiguation (first unquoted delimiter vs colon) (§2, §9.3).
7571

7672
## [1.1] - 2025-10-29
7773

7874
### Added
7975

80-
- Strict-mode rules
81-
- Delimiter-aware parsing
82-
- Decoder options (indent, strict)
76+
- Strict-mode rules.
77+
- Delimiter-aware parsing.
78+
- Decoder options (`indent`, `strict`).
8379

8480
## [1.0] - 2025-10-28
8581

8682
### Added
8783

88-
- Initial specification release
89-
- Encoding normalization rules
90-
- Decoding interpretation guidelines
91-
- Conformance requirements
84+
- Initial specification release.
85+
- Encoding normalization rules.
86+
- Decoding interpretation guidelines.
87+
- Conformance requirements.

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# TOON Format Specification
22

3-
[![SPEC v2.0](https://img.shields.io/badge/spec-v2.0-lightgrey)](./SPEC.md)
4-
[![Tests](https://img.shields.io/badge/tests-342-green)](./tests/fixtures/)
3+
[![SPEC v2.1](https://img.shields.io/badge/spec-v2.1-lightgrey)](./SPEC.md)
4+
[![Tests](https://img.shields.io/badge/tests-344-green)](./tests/fixtures/)
55
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
66

77
This repository contains the official specification for **Token-Oriented Object Notation (TOON)**, a compact, human-readable encoding of the JSON data model for LLM prompts. It provides a lossless serialization of the same objects, arrays, and primitives as JSON, but in a syntax that minimizes tokens and makes structure easy for models to follow.
@@ -10,7 +10,7 @@ This repository contains the official specification for **Token-Oriented Object
1010

1111
[→ Read the full specification (SPEC.md)](./SPEC.md)
1212

13-
- **Version:** 2.0 (2025-11-10)
13+
- **Version:** 2.1 (2025-11-23)
1414
- **Status:** Working Draft
1515
- **License:** MIT
1616

SPEC.md

Lines changed: 32 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Token-Oriented Object Notation
44

5-
**Version:** 2.0
5+
**Version:** 2.1
66

7-
**Date:** 2025-11-10
7+
**Date:** 2025-11-23
88

99
**Status:** Working Draft
1010

@@ -20,7 +20,7 @@ Token-Oriented Object Notation (TOON) is a line-oriented, indentation-based text
2020

2121
## Status of This Document
2222

23-
This document is a Working Draft v2.0 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
23+
This document is a Working Draft v2.1 and may be updated, replaced, or obsoleted. Implementers should monitor the canonical repository at https://github.com/toon-format/spec for changes.
2424

2525
This specification is stable for implementation but not yet finalized. Breaking changes may occur in future major versions.
2626

@@ -499,7 +499,15 @@ Decoding:
499499
For an object appearing as a list item:
500500
501501
- Empty object list item: a single "-" at the list-item indentation level.
502-
- First field on the hyphen line:
502+
- Encoding selection (normative):
503+
- When an object has **exactly one field** and that field encodes to a tabular array, encoders SHOULD use the compact form with the tabular header on the hyphen line:
504+
- Tabular array: - key[N<delim?>]{fields}:
505+
- Followed by tabular rows at depth +1 (relative to the hyphen line).
506+
- For all other cases (multiple fields, or single non-tabular field), encoders SHOULD emit a bare hyphen on its own line:
507+
- Bare hyphen: -
508+
- All fields appear at depth +1 under the hyphen line in encounter order, using normal object field rules (Section 8).
509+
- When a field is a tabular array, its header appears at depth +1 and its rows at depth +2 (relative to the hyphen line).
510+
- First field on the hyphen line (legacy encoding, still valid for decoding):
503511
- Primitive: - key: value
504512
- Primitive array: - key[M<delim?>]: v1<delim>…
505513
- Tabular array: - key[N<delim?>]{fields}:
@@ -508,7 +516,7 @@ For an object appearing as a list item:
508516
- Followed by list items at depth +1.
509517
- Object: - key:
510518
- Nested object fields appear at depth +2 (i.e., one deeper than subsequent sibling fields of the same list item).
511-
- Remaining fields of the same object appear at depth +1 under the hyphen line in encounter order, using normal object field rules.
519+
- Remaining fields of the same object appear at depth +1 under the hyphen line in encounter order, using normal object field rules.
512520
513521
Decoding:
514522
- The first field is parsed from the hyphen line. If it is a nested object (- key:), nested fields are at +2 relative to the hyphen line; subsequent fields of the same list item are at +1.
@@ -992,12 +1000,15 @@ items[2]:
9921000
Nested tabular inside a list item:
9931001
```
9941002
items[1]:
995-
- users[2]{id,name}:
996-
1,Ada
997-
2,Bob
1003+
-
1004+
users[2]{id,name}:
1005+
1,Ada
1006+
2,Bob
9981007
status: active
9991008
```
10001009

1010+
Note: Encoders use this format (bare hyphen with all fields indented) for objects with multiple fields. Older encodings may place the first field on the hyphen line; both are valid for decoders.
1011+
10011012
Delimiter variations:
10021013
```
10031014
items[2 ]{sku name qty price}:
@@ -1222,52 +1233,39 @@ Note: Host-type normalization tests (e.g., BigInt, Date, Set, Map) are language-
12221233

12231234
## Appendix D: Document Changelog (Informative)
12241235

1236+
This appendix summarizes major changes between spec versions. For the complete changelog, see [`CHANGELOG.md`](./CHANGELOG.md) in the specification repository.
1237+
1238+
### v2.1 (2025-11-23)
1239+
1240+
- Tightened canonical encoding for objects as list items (§10): bare `-` for multi-field objects, compact `- key[N]{fields}:` only for single-field tabular arrays, to improve visual consistency and LLM readability.
1241+
12251242
### v2.0 (2025-11-10)
12261243

1227-
- Breaking change: Length marker (`#`) prefix in array headers has been completely removed from the specification.
1228-
- The `[#N]` format is no longer valid syntax. All array headers MUST use `[N]` format only.
1229-
- Encoders MUST NOT emit `[#N]` format.
1230-
- Decoders MUST NOT accept `[#N]` format (breaking change from v1.5).
1231-
- Removed all references to length marker from terminology, grammar, conformance requirements, and parsing helpers.
1244+
- Removed `[#N]` length-marker syntax from array headers; `[N]` is now the only valid form.
12321245

12331246
### v1.5 (2025-11-08)
12341247

1235-
- Added optional key folding for encoders: `keyFolding='safe'` mode with `flattenDepth` control (§13.4).
1236-
- Added optional path expansion for decoders: `expandPaths='safe'` mode with conflict resolution tied to existing `strict` option (§13.4).
1237-
- Defined safe-mode requirements for folding: IdentifierSegment validation, no path separator in segments, collision avoidance, no quoting required (§7.3, §13.4).
1238-
- Specified deep-merge semantics for expansion: recursive merge for objects; conflict policy (error in strict mode, LWW when strict=false) for non-objects (§13.4).
1239-
- Added strict-mode error category for path expansion conflicts (§14.5).
1240-
- Both features default to OFF; fully backward-compatible.
1248+
- Added optional key folding (`keyFolding="safe"`) and path expansion (`expandPaths="safe"`) with deep-merge semantics and strict-mode conflict handling (§13.4, §14.5).
12411249

12421250
### v1.4 (2025-11-05)
12431251

1244-
- Removed JavaScript-specific normalization details; replaced with language-agnostic requirements (Section 3).
1245-
- Defined canonical number format for encoders and decoder acceptance rules (Section 2).
1246-
- Added Appendix G with host-type normalization examples for Go, JavaScript, Python, and Rust.
1247-
- Clarified non-strict mode tab handling as implementation-defined (Section 12).
1248-
- Expanded regex notation for cross-language clarity (Section 7.3).
1252+
- Generalized normalization and numeric canonicalization rules, and added host-type normalization guidance (Appendix G).
12491253

12501254
### v1.3 (2025-10-31)
12511255

1252-
- Added numeric precision requirements: JavaScript implementations SHOULD use Number.toString() precision (15-17 digits), all implementations MUST preserve round-trip fidelity (Section 2).
1253-
- Added RFC 5234 core rules (ALPHA, DIGIT, DQUOTE, HTAB, LF, SP) to ABNF grammar definitions (Section 6).
1256+
- Added numeric precision guidance and ABNF core rules for headers and keys (§2, §6).
12541257

12551258
### v1.2 (2025-10-29)
12561259

1257-
- Clarified delimiter scoping behavior between array headers.
1258-
- Tightened strict-mode indentation requirements: leading spaces MUST be exact multiples of indentSize; tabs in indentation MUST error.
1259-
- Defined blank-line and trailing-newline decoding behavior with explicit skipping rules outside arrays.
1260-
- Clarified hyphen-based quoting: "-" or any string starting with "-" MUST be quoted.
1261-
- Clarified BigInt normalization: values outside safe integer range are converted to quoted decimal strings.
1262-
- Clarified row/key disambiguation: uses first unquoted delimiter vs colon position.
1260+
- Tightened delimiter scoping, indentation, blank-line handling, hyphen-based quoting, BigInt normalization, and row/key disambiguation rules (§2, §9, §11-§12).
12631261

12641262
### v1.1 (2025-10-29)
12651263

1266-
Added strict-mode rules, delimiter-aware parsing, and decoder options (indent, strict).
1264+
- Introduced strict-mode validation, delimiter-aware parsing, and decoder options (indent, strict).
12671265

12681266
### v1.0 (2025-10-28)
12691267

1270-
Initial encoding, normalization, and conformance rules.
1268+
- Initial specification: encoding normalization, decoding interpretation, and conformance requirements.
12711269

12721270
## Appendix E: Acknowledgments and License
12731271

tests/fixtures/decode/arrays-nested.json

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"version": "1.4",
2+
"version": "2.1",
33
"category": "decode",
44
"description": "Nested and mixed array decoding - list format, arrays of arrays, root arrays, mixed types",
55
"tests": [
@@ -52,7 +52,7 @@
5252
"specSection": "9.4"
5353
},
5454
{
55-
"name": "parses nested tabular arrays as first field on hyphen line",
55+
"name": "parses nested tabular arrays as first field on hyphen line (legacy)",
5656
"input": "items[1]:\n - users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
5757
"expected": {
5858
"items": [
@@ -65,7 +65,26 @@
6565
}
6666
]
6767
},
68-
"specSection": "10"
68+
"specSection": "10",
69+
"note": "Still valid for backward compatibility"
70+
},
71+
{
72+
"name": "parses nested tabular arrays in list items with bare hyphen",
73+
"input": "items[1]:\n -\n users[2]{id,name}:\n 1,Ada\n 2,Bob\n status: active",
74+
"expected": {
75+
"items": [
76+
{
77+
"users": [
78+
{ "id": 1, "name": "Ada" },
79+
{ "id": 2, "name": "Bob" }
80+
],
81+
"status": "active"
82+
}
83+
]
84+
},
85+
"specSection": "10",
86+
"minSpecVersion": "2.1",
87+
"note": "Canonical v2.1+ encoding (bare hyphen with all fields indented)"
6988
},
7089
{
7190
"name": "parses objects containing arrays (including empty arrays) in list format",

tests/fixtures/encode/arrays-nested.json

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"version": "1.4",
2+
"version": "2.1",
33
"category": "encode",
44
"description": "Nested and mixed array encoding - arrays of arrays, mixed type arrays, root arrays",
55
"tests": [
@@ -50,14 +50,16 @@
5050
{
5151
"name": "encodes root-level array of non-uniform objects in list format",
5252
"input": [{ "id": 1 }, { "id": 2, "name": "Ada" }],
53-
"expected": "[2]:\n - id: 1\n - id: 2\n name: Ada",
54-
"specSection": "9.4"
53+
"expected": "[2]:\n -\n id: 1\n -\n id: 2\n name: Ada",
54+
"specSection": "9.4",
55+
"minSpecVersion": "2.1"
5556
},
5657
{
5758
"name": "encodes root-level array mixing primitive, object, and array of objects in list format",
5859
"input": ["summary", { "id": 1, "name": "Ada" }, [{ "id": 2 }, { "status": "draft" }]],
59-
"expected": "[3]:\n - summary\n - id: 1\n name: Ada\n - [2]:\n - id: 2\n - status: draft",
60-
"specSection": "9.4"
60+
"expected": "[3]:\n - summary\n -\n id: 1\n name: Ada\n - [2]:\n -\n id: 2\n -\n status: draft",
61+
"specSection": "9.4",
62+
"minSpecVersion": "2.1"
6163
},
6264
{
6365
"name": "encodes root-level arrays of arrays",
@@ -90,16 +92,18 @@
9092
"input": {
9193
"items": [1, { "a": 1 }, "text"]
9294
},
93-
"expected": "items[3]:\n - 1\n - a: 1\n - text",
94-
"specSection": "9.4"
95+
"expected": "items[3]:\n - 1\n -\n a: 1\n - text",
96+
"specSection": "9.4",
97+
"minSpecVersion": "2.1"
9598
},
9699
{
97100
"name": "uses list format for arrays mixing objects and arrays",
98101
"input": {
99102
"items": [{ "a": 1 }, [1, 2]]
100103
},
101-
"expected": "items[2]:\n - a: 1\n - [2]: 1,2",
102-
"specSection": "9.4"
104+
"expected": "items[2]:\n -\n a: 1\n - [2]: 1,2",
105+
"specSection": "9.4",
106+
"minSpecVersion": "2.1"
103107
}
104108
]
105109
}

0 commit comments

Comments
 (0)