Skip to content

Commit c90d354

Browse files
feat(spec): key folding & path expansion (closes #4, #5)
1 parent 51fe1e9 commit c90d354

8 files changed

Lines changed: 640 additions & 15 deletions

File tree

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,20 @@ All notable changes to the TOON specification will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.5] - 2025-11-08
9+
10+
### Added
11+
12+
- Optional key folding for encoders: `keyFolding="safe"` mode with `flattenDepth` control to collapse single-key object chains into dotted-path notation (§13.4)
13+
- Optional path expansion for decoders: `expandPaths="safe"` mode to split dotted keys into nested objects, with conflict resolution tied to `strict` option (§13.4, §14.5)
14+
- IdentifierSegment terminology and path separator definition (fixed to `"."` in v1.5) (§1.9)
15+
- Deep-merge semantics for path expansion: recursive merge for objects, error on conflict when `strict=true`, last-write-wins (LWW) when `strict=false` (§13.4)
16+
17+
### Changed
18+
19+
- Both new features default to OFF and are fully backward-compatible
20+
- Safe-mode folding requires IdentifierSegment validation, collision avoidance, and no quoting
21+
822
## [1.4] - 2025-11-05
923

1024
### Changed

README.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# TOON Format Specification
22

3-
[![SPEC v1.4](https://img.shields.io/badge/spec-v1.4-lightgrey)](./SPEC.md)
3+
[![SPEC v1.5](https://img.shields.io/badge/spec-v1.5-lightgrey)](./SPEC.md)
44
[![Tests](https://img.shields.io/badge/tests-323-green)](./tests/fixtures/)
55
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
66

@@ -10,12 +10,24 @@ This repository contains the official specification for **Token-Oriented Object
1010

1111
[→ Read the full specification (SPEC.md)](./SPEC.md)
1212

13-
- **Version:** 1.4 (2025-11-05)
13+
- **Version:** 1.5 (2025-11-10)
1414
- **Status:** Working Draft
1515
- **License:** MIT
1616

1717
The specification includes complete grammar (ABNF), encoding rules, validation requirements, and conformance criteria.
1818

19+
### New in v1.5
20+
21+
- **Key Folding** (encode): Collapse nested single-key objects into compact dotted paths
22+
- `{"a": {"b": {"c": 1}}}``a.b.c: 1`
23+
- Opt-in via `keyFolding="safe"` with `flattenDepth` control
24+
- **Path Expansion** (decode): Expand dotted keys back to nested objects
25+
- `a.b.c: 1``{"a": {"b": {"c": 1}}}`
26+
- Opt-in via `expandPaths="safe"` with deep-merge semantics
27+
28+
> [!NOTE]
29+
> Both features are opt-in to maintain backward compatibility.
30+
1931
## What is TOON?
2032

2133
**Token-Oriented Object Notation** is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input, not output.

SPEC.md

Lines changed: 186 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Token-Oriented Object Notation
44

5-
**Version:** 1.4
5+
**Version:** 1.5
66

7-
**Date:** 2025-11-05
7+
**Date:** 2025-11-10
88

99
**Status:** Working Draft
1010

@@ -189,6 +189,12 @@ Implementations that fail to conform to any MUST or REQUIRED level requirement a
189189
- Regular expressions appear in slash-delimited form.
190190
- ABNF snippets follow RFC 5234; HTAB means the U+0009 character.
191191

192+
### 1.9 Key Folding and Path Expansion Terms
193+
194+
- IdentifierSegment: A key segment eligible for safe folding and expansion, matching the pattern `^[A-Za-z_][A-Za-z0-9_]*$` (contains only letters, digits, and underscores; does not start with a digit; does not contain dots).
195+
- Path separator: The character used to join/split key segments during folding and expansion. Fixed to `"."` (U+002E, FULL STOP) in v1.5.
196+
- Note: Unquoted keys in TOON remain permissive per §7.3 (`^[A-Za-z_][A-Za-z0-9_.]*$`, allowing dots). IdentifierSegment is a stricter pattern used only for safe folding and expansion eligibility checks.
197+
192198
## 2. Data Model
193199

194200
- TOON models data as:
@@ -351,6 +357,8 @@ Decoding requirements:
351357
- If a fields segment occurs between the bracket and the colon, parse field names using the active delimiter; quoted names MUST be unescaped per Section 7.1.
352358
- A colon MUST follow the bracket and optional fields; missing colon MUST error.
353359
360+
Note: Key folding (§13.4) affects only the key prefix in headers. The header grammar remains unchanged. Example: `data.meta.items[2]{id,name}:` is a valid header with a folded key prefix `data.meta.items`, followed by a standard bracket segment, field list, and colon. Parsing treats folded keys as literal keys; see §13.4 for optional path expansion.
361+
354362
## 7. Strings and Keys
355363
356364
### 7.1 Escaping (Encoding and Decoding)
@@ -393,6 +401,8 @@ Object keys and tabular field names:
393401
394402
Keys requiring quoting per the above rules MUST be quoted in all contexts, including array headers (e.g., "my-key"[N]:).
395403
404+
Encoders MAY perform key folding when enabled (see §13.4 for complete folding rules and requirements).
405+
396406
### 7.4 Decoding Rules for Strings and Keys (Decoding)
397407
398408
- Quoted strings and keys MUST be unescaped per Section 7.1; any other escape MUST error. Quoted primitives remain strings.
@@ -409,6 +419,7 @@ Keys requiring quoting per the above rules MUST be quoted in all contexts, inclu
409419
- Nested or empty objects: key: on its own line. If non-empty, nested fields appear at depth +1.
410420
- Key order: Implementations MUST preserve encounter order when emitting fields.
411421
- An empty object at the root yields an empty document (no lines).
422+
- Dotted keys (e.g., `user.name`) are valid literal keys in TOON. Decoders MUST treat them as single literal keys unless path expansion is explicitly enabled (see §13.4). This preserves backward compatibility and allows safe opt-in expansion behavior.
412423
- Decoding:
413424
- A line "key:" with nothing after the colon at depth d opens an object; subsequent lines at depth > d belong to that object until the depth decreases to ≤ d.
414425
- Lines "key: value" at the same depth are sibling fields.
@@ -582,12 +593,89 @@ Options:
582593
- indent (default: 2 spaces)
583594
- delimiter (document delimiter; default: comma; alternatives: tab, pipe)
584595
- lengthMarker (default: disabled)
596+
- keyFolding (default: `"off"`; alternatives: `"safe"`)
597+
- flattenDepth (default: Infinity when keyFolding is `"safe"`; non-negative integer ≥ 0; values 0 or 1 have no practical folding effect)
585598
- Decoder options:
586599
- indent (default: 2 spaces)
587-
- strict (default: true)
600+
- strict (default: `true`)
601+
- expandPaths (default: `"off"`; alternatives: `"safe"`)
588602
589603
Strict-mode errors are enumerated in §14; validators MAY add informative diagnostics for style and encoding invariants.
590604
605+
### 13.4 Key Folding and Path Expansion
606+
607+
Key folding and path expansion are optional transformations for compact dotted-path notation. Both default to `"off"`.
608+
609+
#### Encoder: Key Folding
610+
611+
Key folding allows encoders to collapse chains of single-key objects into dotted-path notation, reducing verbosity for deeply nested structures.
612+
613+
Mode: `"off"` | `"safe"` (default: `"off"`)
614+
- `"off"`: No folding is performed. All objects are encoded with standard nesting.
615+
- `"safe"`: Fold eligible chains according to the rules below.
616+
617+
flattenDepth: The maximum number of segments from K0 to include in the folded path (default: Infinity when keyFolding is `"safe"`; values less than 2 have no practical effect).
618+
- A value of 2 folds only two-segment chains: `{a: {b: val}}` → `a.b: val`.
619+
- A value of Infinity folds entire eligible chains: `{a: {b: {c: val}}}` → `a.b.c: val`.
620+
621+
Foldable chain: A chain K0 → K1 → ... → Kn is foldable when:
622+
- Each Ki (where i = 0 to n−1) is an object with exactly one key Ki+1.
623+
- The chain stops at the first non-single-key object or when encountering a leaf value.
624+
- Arrays are not considered single-key objects; a chain stops at arrays.
625+
- The leaf value at Kn is either a primitive, an array, or an empty object.
626+
627+
Safe mode requirements (all MUST hold for a chain to be folded):
628+
1. All folded segments K0 through K(d−1) (where d = min(chain length, flattenDepth)) MUST be IdentifierSegments (§1.9): matching `^[A-Za-z_][A-Za-z0-9_]*$`.
629+
2. No segment may contain the path separator (`.` in v1.5).
630+
3. The resulting folded key string MUST NOT equal any existing sibling literal key at the same object depth (collision avoidance).
631+
4. If any segment would require quoting per §7.3, the chain MUST NOT be folded.
632+
633+
Folding process:
634+
- For a foldable chain of length n, determine d = min(n, flattenDepth).
635+
- Fold segments K0 through K(d−1) into a single key: `K0.K1.....K(d−1)`.
636+
- If d < n, emit the remaining structure (Kd through Kn) as normal nested objects.
637+
- The leaf value at Kn is encoded normally (primitive, array, or empty object).
638+
639+
Examples:
640+
- `{a: {b: {c: 1}}}` with safe mode, depth=Infinity → `a.b.c: 1`
641+
- `{a: {b: {c: {d: 1}}}}` with safe mode, depth=2 → produces `a.b:` followed by nested `c:` and `d: 1` at appropriate depths
642+
- `{data: {"full-name": {x: 1}}}` → safe mode skips (segment `"full-name"` requires quoting); emits standard nested structure
643+
644+
#### Decoder: Path Expansion
645+
646+
Path expansion allows decoders to split dotted keys into nested object structures, enabling round-trip compatibility with folded encodings.
647+
648+
Mode: `"off"` | `"safe"` (default: `"off"`)
649+
- `"off"`: Dotted keys are treated as literal keys. No expansion is performed.
650+
- `"safe"`: Expand eligible dotted keys according to the rules below.
651+
652+
Safe mode behavior:
653+
- Any key containing the path separator (`.`) is considered for expansion.
654+
- Split the key into segments at each occurrence of `.`.
655+
- Only expand when ALL resulting segments are IdentifierSegments (§1.9) and none contain `.` after splitting.
656+
- Keys that do not meet the expansion criteria remain as literal keys.
657+
658+
Deep merge semantics:
659+
When multiple expanded keys construct overlapping object paths, the decoder MUST merge them recursively:
660+
- Object + Object: Deep merge recursively (recurse into nested keys and apply these rules).
661+
- Object + Non-object (array or primitive): This is a conflict. Apply conflict resolution policy.
662+
- Array + Array or Primitive + Primitive: This is a conflict. Apply conflict resolution policy. Arrays are never merged element-wise.
663+
- Key ordering: During expansion, newly created keys are inserted in encounter order (the order they appear in the document). When merging creates nested keys, keys from later lines are appended after existing keys at the same depth. This ensures deterministic, predictable key order in the resulting object.
664+
665+
Conflict resolution:
666+
- Conflict definition: A conflict occurs when expansion requires an object at a given path but finds a non-object value (array or primitive), or vice versa. A conflict also occurs when a final leaf key already exists with a non-object value that must be overwritten.
667+
- `strict=true` (default): Decoders MUST error on any conflict. This ensures data integrity and catches structural inconsistencies.
668+
- `strict=false`: Last-write-wins (LWW) conflict resolution: keys appearing later in document order (encounter order during parsing) overwrite earlier values. This provides deterministic behavior for lenient parsing.
669+
670+
Application order: Path expansion is applied AFTER all base parsing rules (§4–12) have been applied and BEFORE the final decoded value is returned to the caller. Structural validations enumerated in §14 (strict-mode errors for array counts, indentation, etc.) operate on the pre-expanded structure and remain unaffected by expansion.
671+
672+
Examples:
673+
- Input: `data.meta.items[2]: a,b` with `expandPaths="safe"` → Output: `{"data": {"meta": {"items": ["a", "b"]}}}`
674+
- Input: `user.name: Ada` with `expandPaths="off"` → Output: `{"user.name": "Ada"}`
675+
- Input: `a.b.c: 1` and `a.b.d: 2` and `a.e: 3` with `expandPaths="safe"` → Output: `{"a": {"b": {"c": 1, "d": 2}, "e": 3}}` (deep merge)
676+
- Input: `a.b: 1` then `a: 2` with `expandPaths="safe"` and `strict=true` → Error: "Expansion conflict at path 'a' (object vs primitive)"
677+
- Input: `a.b: 1` then `a: 2` with `expandPaths="safe"` and `strict=false` → Output: `{"a": 2}` (LWW)
678+
591679
### 13.1 Encoder Conformance Checklist
592680
593681
Conforming encoders MUST:
@@ -601,6 +689,8 @@ Conforming encoders MUST:
601689
- [ ] Convert -0 to 0 (§2)
602690
- [ ] Convert NaN/±Infinity to null (§3)
603691
- [ ] Emit no trailing spaces or trailing newline (§12)
692+
- [ ] When `keyFolding="safe"`, folding MUST comply with §13.4 (IdentifierSegment validation, no separator in segments, collision avoidance, no quoting required)
693+
- [ ] When `flattenDepth` is set, folding MUST stop at the configured segment count (§13.4)
604694
605695
### 13.2 Decoder Conformance Checklist
606696
@@ -609,9 +699,12 @@ Conforming decoders MUST:
609699
- [ ] Split inline arrays and tabular rows using active delimiter only (§11)
610700
- [ ] Unescape quoted strings with only valid escapes (§7.1)
611701
- [ ] Type unquoted primitives: true/false/null → booleans/null, numeric → number, else → string (§4)
612-
- [ ] Enforce strict-mode rules when strict=true (§14)
702+
- [ ] Enforce strict-mode rules when `strict=true` (§14)
613703
- [ ] Accept and ignore optional # length marker (§6)
614704
- [ ] Preserve array order and object key order (§2)
705+
- [ ] When `expandPaths="safe"`, expansion MUST follow §13.4 (IdentifierSegment-only segments, deep merge, conflict rules)
706+
- [ ] When `expandPaths="safe"` with `strict=true`, MUST error on expansion conflicts per §14.5
707+
- [ ] When `expandPaths="safe"` with `strict=false`, apply LWW conflict resolution (§13.4)
615708
616709
### 13.3 Validator Conformance Checklist
617710
@@ -650,7 +743,17 @@ When strict mode is enabled (default), decoders MUST error on the following cond
650743
651744
For root-form rules, including handling of empty documents, see §5.
652745
653-
### 14.5 Recommended Error Messages and Validator Diagnostics (Informative)
746+
### 14.5 Path Expansion Conflicts
747+
748+
When `expandPaths="safe"` is enabled:
749+
- With `strict=true` (default): Decoders MUST error on any expansion conflict.
750+
- With `strict=false`: Decoders MUST apply deterministic last-write-wins (LWW) resolution in document order. Implementations MUST resolve conflicts silently and MUST NOT emit diagnostics during normal decode operations.
751+
752+
See §13.4 for complete conflict definitions, deep-merge semantics, and examples.
753+
754+
Note (informative): Implementations MAY expose conflict diagnostics via out-of-band mechanisms (e.g., debug hooks, verbose CLI flags, or separate validation APIs), but such facilities are non-normative and MUST NOT affect default decode behavior or output.
755+
756+
### 14.6 Recommended Error Messages and Validator Diagnostics (Informative)
654757
655758
Validators SHOULD additionally report:
656759
- Trailing spaces, trailing newlines (encoding invariants).
@@ -972,6 +1075,74 @@ Quoted keys with arrays (keys requiring quoting per Section 7.3):
9721075
- id: 2
9731076
```
9741077

1078+
Key folding and path expansion (v1.5+):
1079+
1080+
Encoding - basic folding (safe mode, depth=Infinity):
1081+
1082+
Input: `{"a": {"b": {"c": 1}}}`
1083+
```
1084+
a.b.c: 1
1085+
```
1086+
1087+
Encoding - folding with inline array:
1088+
1089+
Input: `{"data": {"meta": {"items": ["x", "y"]}}}`
1090+
```
1091+
data.meta.items[2]: x,y
1092+
```
1093+
1094+
Encoding - folding with tabular array:
1095+
1096+
Input: `{"a": {"b": {"items": [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]}}}`
1097+
```
1098+
a.b.items[2]{id,name}:
1099+
1,A
1100+
2,B
1101+
```
1102+
1103+
Encoding - partial folding (flattenDepth=2):
1104+
1105+
Input: `{"a": {"b": {"c": {"d": 1}}}}`
1106+
```
1107+
a.b:
1108+
c:
1109+
d: 1
1110+
```
1111+
1112+
Decoding - basic expansion (safe mode round-trip):
1113+
1114+
Input: `data.meta.items[2]: a,b` with options `{expandPaths: "safe"}`
1115+
1116+
Output: `{"data": {"meta": {"items": ["a", "b"]}}}`
1117+
1118+
Decoding - deep merge (multiple expanded keys):
1119+
1120+
Input with options `{expandPaths: "safe"}`:
1121+
```
1122+
a.b.c: 1
1123+
a.b.d: 2
1124+
a.e: 3
1125+
```
1126+
Output: `{"a": {"b": {"c": 1, "d": 2}, "e": 3}}`
1127+
1128+
Decoding - conflict error (strict=true, default):
1129+
1130+
Input with options `{expandPaths: "safe", strict: true}`:
1131+
```
1132+
a.b: 1
1133+
a: 2
1134+
```
1135+
Result: Error - "Expansion conflict at path 'a' (object vs primitive)"
1136+
1137+
Decoding - conflict LWW (strict=false):
1138+
1139+
Input with options `{expandPaths: "safe", strict: false}`:
1140+
```
1141+
a.b: 1
1142+
a: 2
1143+
```
1144+
Output: `{"a": 2}`
1145+
9751146
## Appendix B: Parsing Helpers (Informative)
9761147

9771148
These sketches illustrate structure and common decoding helpers. They are informative; normative behavior is defined in Sections 4–12 and 14.
@@ -1060,6 +1231,15 @@ Note: Host-type normalization tests (e.g., BigInt, Date, Set, Map) are language-
10601231

10611232
## Appendix D: Document Changelog (Informative)
10621233

1234+
### v1.5 (2025-11-08)
1235+
1236+
- Added optional key folding for encoders: `keyFolding='safe'` mode with `flattenDepth` control (§13.4).
1237+
- Added optional path expansion for decoders: `expandPaths='safe'` mode with conflict resolution tied to existing `strict` option (§13.4).
1238+
- Defined safe-mode requirements for folding: IdentifierSegment validation, no path separator in segments, collision avoidance, no quoting required (§7.3, §13.4).
1239+
- Specified deep-merge semantics for expansion: recursive merge for objects; conflict policy (error in strict mode, LWW when strict=false) for non-objects (§13.4).
1240+
- Added strict-mode error category for path expansion conflicts (§14.5).
1241+
- Both features default to OFF; fully backward-compatible.
1242+
10631243
### v1.4 (2025-11-05)
10641244

10651245
- Removed JavaScript-specific normalization details; replaced with language-agnostic requirements (Section 3).
@@ -1249,6 +1429,7 @@ For a detailed version history, see Appendix D.
12491429

12501430
- Backward-compatible evolutions SHOULD preserve current headers, quoting rules, and indentation semantics.
12511431
- Reserved/structural characters (colon, brackets, braces, hyphen) MUST retain current meanings.
1432+
- The path separator (see §1.9) is fixed to `"."` in v1.5; future versions MAY make this configurable.
12521433
- Future work (non-normative): schemas, comments/annotations, additional delimiter profiles, optional \uXXXX escapes (if added, must be precisely defined).
12531434

12541435
## 21. Intellectual Property Considerations

VERSIONING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The TOON specification follows [Semantic Versioning](https://semver.org/) with a
1313
- **MAJOR version** - Incremented for breaking changes that are incompatible with previous versions
1414
- **MINOR version** - Incremented for backward-compatible additions, clarifications, or non-breaking changes
1515

16-
**Example:** Moving from v1.3 to v1.4 means your implementation keeps working. Moving from v1.3 to v2.0 means you'll likely need to update your code.
16+
**Example:** Moving from v1.5 to v1.6 means your implementation keeps working. Moving from v1.5 to v2.0 means you'll likely need to update your code.
1717

1818
## What Constitutes a Breaking Change
1919

0 commit comments

Comments
 (0)