You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,20 @@ All notable changes to the TOON specification will be documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
+
## [1.5] - 2025-11-08
9
+
10
+
### Added
11
+
12
+
- Optional key folding for encoders: `keyFolding="safe"` mode with `flattenDepth` control to collapse single-key object chains into dotted-path notation (§13.4)
13
+
- Optional path expansion for decoders: `expandPaths="safe"` mode to split dotted keys into nested objects, with conflict resolution tied to `strict` option (§13.4, §14.5)
14
+
- IdentifierSegment terminology and path separator definition (fixed to `"."` in v1.5) (§1.9)
15
+
- Deep-merge semantics for path expansion: recursive merge for objects, error on conflict when `strict=true`, last-write-wins (LWW) when `strict=false` (§13.4)
16
+
17
+
### Changed
18
+
19
+
- Both new features default to OFF and are fully backward-compatible
20
+
- Safe-mode folding requires IdentifierSegment validation, collision avoidance, and no quoting
- Opt-in via `keyFolding="safe"` with `flattenDepth` control
24
+
-**Path Expansion** (decode): Expand dotted keys back to nested objects
25
+
-`a.b.c: 1` → `{"a": {"b": {"c": 1}}}`
26
+
- Opt-in via `expandPaths="safe"` with deep-merge semantics
27
+
28
+
> [!NOTE]
29
+
> Both features are opt-in to maintain backward compatibility.
30
+
19
31
## What is TOON?
20
32
21
33
**Token-Oriented Object Notation** is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input, not output.
Copy file name to clipboardExpand all lines: SPEC.md
+186-5Lines changed: 186 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
## Token-Oriented Object Notation
4
4
5
-
**Version:** 1.4
5
+
**Version:** 1.5
6
6
7
-
**Date:** 2025-11-05
7
+
**Date:** 2025-11-10
8
8
9
9
**Status:** Working Draft
10
10
@@ -189,6 +189,12 @@ Implementations that fail to conform to any MUST or REQUIRED level requirement a
189
189
- Regular expressions appear in slash-delimited form.
190
190
- ABNF snippets follow RFC 5234; HTAB means the U+0009 character.
191
191
192
+
### 1.9 Key Folding and Path Expansion Terms
193
+
194
+
- IdentifierSegment: A key segment eligible for safe folding and expansion, matching the pattern `^[A-Za-z_][A-Za-z0-9_]*$` (contains only letters, digits, and underscores; does not start with a digit; does not contain dots).
195
+
- Path separator: The character used to join/split key segments during folding and expansion. Fixed to `"."` (U+002E, FULL STOP) in v1.5.
196
+
- Note: Unquoted keys in TOON remain permissive per §7.3 (`^[A-Za-z_][A-Za-z0-9_.]*$`, allowing dots). IdentifierSegment is a stricter pattern used only for safe folding and expansion eligibility checks.
197
+
192
198
## 2. Data Model
193
199
194
200
- TOON models data as:
@@ -351,6 +357,8 @@ Decoding requirements:
351
357
- If a fields segment occurs between the bracket and the colon, parse field names using the active delimiter; quoted names MUST be unescaped per Section 7.1.
352
358
- A colon MUST follow the bracket and optional fields; missing colon MUST error.
353
359
360
+
Note: Key folding (§13.4) affects only the key prefix in headers. The header grammar remains unchanged. Example: `data.meta.items[2]{id,name}:` is a valid header with a folded key prefix `data.meta.items`, followed by a standard bracket segment, field list, and colon. Parsing treats folded keys as literal keys; see §13.4 for optional path expansion.
361
+
354
362
## 7. Strings and Keys
355
363
356
364
### 7.1 Escaping (Encoding and Decoding)
@@ -393,6 +401,8 @@ Object keys and tabular field names:
393
401
394
402
Keys requiring quoting per the above rules MUST be quoted in all contexts, including array headers (e.g., "my-key"[N]:).
395
403
404
+
Encoders MAY perform key folding when enabled (see §13.4 for complete folding rules and requirements).
405
+
396
406
### 7.4 Decoding Rules for Strings and Keys (Decoding)
397
407
398
408
- Quoted strings and keys MUST be unescaped per Section 7.1; any other escape MUST error. Quoted primitives remain strings.
@@ -409,6 +419,7 @@ Keys requiring quoting per the above rules MUST be quoted in all contexts, inclu
409
419
- Nested or empty objects: key: on its own line. If non-empty, nested fields appear at depth +1.
410
420
- Key order: Implementations MUST preserve encounter order when emitting fields.
411
421
- An empty object at the root yields an empty document (no lines).
422
+
- Dotted keys (e.g., `user.name`) are valid literal keys in TOON. Decoders MUST treat them as single literal keys unless path expansion is explicitly enabled (see §13.4). This preserves backward compatibility and allows safe opt-in expansion behavior.
412
423
- Decoding:
413
424
- A line "key:" with nothing after the colon at depth d opens an object; subsequent lines at depth > d belong to that object until the depth decreases to ≤ d.
414
425
- Lines "key: value" at the same depth are sibling fields.
Strict-mode errors are enumerated in §14; validators MAY add informative diagnostics for style and encoding invariants.
590
604
605
+
### 13.4 Key Folding and Path Expansion
606
+
607
+
Key folding and path expansion are optional transformations for compact dotted-path notation. Both default to `"off"`.
608
+
609
+
#### Encoder: Key Folding
610
+
611
+
Key folding allows encoders to collapse chains of single-key objects into dotted-path notation, reducing verbosity for deeply nested structures.
612
+
613
+
Mode: `"off"` | `"safe"` (default: `"off"`)
614
+
- `"off"`: No folding is performed. All objects are encoded with standard nesting.
615
+
- `"safe"`: Fold eligible chains according to the rules below.
616
+
617
+
flattenDepth: The maximum number of segments from K0 to include in the folded path (default: Infinity when keyFolding is `"safe"`; values less than 2 have no practical effect).
618
+
- A value of 2 folds only two-segment chains: `{a: {b: val}}` → `a.b: val`.
619
+
- A value of Infinity folds entire eligible chains: `{a: {b: {c: val}}}` → `a.b.c: val`.
620
+
621
+
Foldable chain: A chain K0 → K1 → ... → Kn is foldable when:
622
+
- Each Ki (where i = 0 to n−1) is an object with exactly one key Ki+1.
623
+
- The chain stops at the first non-single-key object or when encountering a leaf value.
624
+
- Arrays are not considered single-key objects; a chain stops at arrays.
625
+
- The leaf value at Kn is either a primitive, an array, or an empty object.
626
+
627
+
Safe mode requirements (all MUST hold for a chain to be folded):
628
+
1. All folded segments K0 through K(d−1) (where d = min(chain length, flattenDepth)) MUST be IdentifierSegments (§1.9): matching `^[A-Za-z_][A-Za-z0-9_]*$`.
629
+
2. No segment may contain the path separator (`.` in v1.5).
630
+
3. The resulting folded key string MUST NOT equal any existing sibling literal key at the same object depth (collision avoidance).
631
+
4. If any segment would require quoting per §7.3, the chain MUST NOT be folded.
632
+
633
+
Folding process:
634
+
- For a foldable chain of length n, determine d = min(n, flattenDepth).
635
+
- Fold segments K0 through K(d−1) into a single key: `K0.K1.....K(d−1)`.
636
+
- If d < n, emit the remaining structure (Kd through Kn) as normal nested objects.
637
+
- The leaf value at Kn is encoded normally (primitive, array, or empty object).
Path expansion allows decoders to split dotted keys into nested object structures, enabling round-trip compatibility with folded encodings.
647
+
648
+
Mode: `"off"` | `"safe"` (default: `"off"`)
649
+
- `"off"`: Dotted keys are treated as literal keys. No expansion is performed.
650
+
- `"safe"`: Expand eligible dotted keys according to the rules below.
651
+
652
+
Safe mode behavior:
653
+
- Any key containing the path separator (`.`) is considered for expansion.
654
+
- Split the key into segments at each occurrence of `.`.
655
+
- Only expand when ALL resulting segments are IdentifierSegments (§1.9) and none contain `.` after splitting.
656
+
- Keys that do not meet the expansion criteria remain as literal keys.
657
+
658
+
Deep merge semantics:
659
+
When multiple expanded keys construct overlapping object paths, the decoder MUST merge them recursively:
660
+
- Object + Object: Deep merge recursively (recurse into nested keys and apply these rules).
661
+
- Object + Non-object (array or primitive): This is a conflict. Apply conflict resolution policy.
662
+
- Array + Array or Primitive + Primitive: This is a conflict. Apply conflict resolution policy. Arrays are never merged element-wise.
663
+
- Key ordering: During expansion, newly created keys are inserted in encounter order (the order they appear in the document). When merging creates nested keys, keys from later lines are appended after existing keys at the same depth. This ensures deterministic, predictable key order in the resulting object.
664
+
665
+
Conflict resolution:
666
+
- Conflict definition: A conflict occurs when expansion requires an object at a given path but finds a non-object value (array or primitive), or vice versa. A conflict also occurs when a final leaf key already exists with a non-object value that must be overwritten.
667
+
- `strict=true` (default): Decoders MUST error on any conflict. This ensures data integrity and catches structural inconsistencies.
668
+
- `strict=false`: Last-write-wins (LWW) conflict resolution: keys appearing later in document order (encounter order during parsing) overwrite earlier values. This provides deterministic behavior for lenient parsing.
669
+
670
+
Application order: Path expansion is applied AFTER all base parsing rules (§4–12) have been applied and BEFORE the final decoded value is returned to the caller. Structural validations enumerated in §14 (strict-mode errors for array counts, indentation, etc.) operate on the pre-expanded structure and remain unaffected by expansion.
- Input: `user.name: Ada` with `expandPaths="off"` → Output: `{"user.name": "Ada"}`
675
+
- Input: `a.b.c: 1` and `a.b.d: 2` and `a.e: 3` with `expandPaths="safe"` → Output: `{"a": {"b": {"c": 1, "d": 2}, "e": 3}}` (deep merge)
676
+
- Input: `a.b: 1` then `a: 2` with `expandPaths="safe"` and `strict=true` → Error: "Expansion conflict at path 'a' (object vs primitive)"
677
+
- Input: `a.b: 1` then `a: 2` with `expandPaths="safe"` and `strict=false` → Output: `{"a": 2}` (LWW)
678
+
591
679
### 13.1 Encoder Conformance Checklist
592
680
593
681
Conforming encoders MUST:
@@ -601,6 +689,8 @@ Conforming encoders MUST:
601
689
- [ ] Convert -0 to 0 (§2)
602
690
- [ ] Convert NaN/±Infinity to null (§3)
603
691
- [ ] Emit no trailing spaces or trailing newline (§12)
692
+
- [ ] When `keyFolding="safe"`, folding MUST comply with §13.4 (IdentifierSegment validation, no separator in segments, collision avoidance, no quoting required)
693
+
- [ ] When `flattenDepth` is set, folding MUST stop at the configured segment count (§13.4)
604
694
605
695
### 13.2 Decoder Conformance Checklist
606
696
@@ -609,9 +699,12 @@ Conforming decoders MUST:
609
699
- [ ] Split inline arrays and tabular rows using active delimiter only (§11)
610
700
- [ ] Unescape quoted strings with only valid escapes (§7.1)
- [ ] Preserve array order and object key order (§2)
705
+
- [ ] When `expandPaths="safe"`, expansion MUST follow §13.4 (IdentifierSegment-only segments, deep merge, conflict rules)
706
+
- [ ] When `expandPaths="safe"` with `strict=true`, MUST error on expansion conflicts per §14.5
707
+
- [ ] When `expandPaths="safe"` with `strict=false`, apply LWW conflict resolution (§13.4)
615
708
616
709
### 13.3 Validator Conformance Checklist
617
710
@@ -650,7 +743,17 @@ When strict mode is enabled (default), decoders MUST error on the following cond
650
743
651
744
For root-form rules, including handling of empty documents, see §5.
652
745
653
-
### 14.5 Recommended Error Messages and Validator Diagnostics (Informative)
746
+
### 14.5 Path Expansion Conflicts
747
+
748
+
When `expandPaths="safe"` is enabled:
749
+
- With `strict=true` (default): Decoders MUST error on any expansion conflict.
750
+
- With `strict=false`: Decoders MUST apply deterministic last-write-wins (LWW) resolution in document order. Implementations MUST resolve conflicts silently and MUST NOT emit diagnostics during normal decode operations.
751
+
752
+
See §13.4 for complete conflict definitions, deep-merge semantics, and examples.
753
+
754
+
Note (informative): Implementations MAY expose conflict diagnostics via out-of-band mechanisms (e.g., debug hooks, verbose CLI flags, or separate validation APIs), but such facilities are non-normative and MUST NOT affect default decode behavior or output.
755
+
756
+
### 14.6 Recommended Error Messages and Validator Diagnostics (Informative)
- Added optional key folding for encoders: `keyFolding='safe'`mode with `flattenDepth` control (§13.4).
1237
+
- Added optional path expansion for decoders: `expandPaths='safe'`mode with conflict resolution tied to existing `strict` option (§13.4).
1238
+
- Defined safe-mode requirements for folding: IdentifierSegment validation, no path separator in segments, collision avoidance, no quoting required (§7.3, §13.4).
1239
+
- Specified deep-merge semantics for expansion: recursive merge for objects; conflict policy (error in strict mode, LWW when strict=false) for non-objects (§13.4).
1240
+
- Added strict-mode error category for path expansion conflicts (§14.5).
1241
+
- Both features default to OFF; fully backward-compatible.
0 commit comments