Skip to content

Unicode in whitespace #135

@moorereason

Description

@moorereason

The non-conforming TOML snippet

I'm using escape sequences to represent the Unicode code points of non-ASCII chars:

0=0
\u2000\u2000
1=1\u2000\u2000
2=2

As a hexdump:

$ echo "0=0\n\u2000\u2000\n1=1\u2000\u2000\n2=2" | hexdump -C
00000000  30 3d 30 0a e2 80 80 e2  80 80 0a 31 3d 31 e2 80  |0=0........1=1..|
00000010  80 e2 80 80 0a 32 3d 32  0a                       |.....2=2.|
00000019

What you expected

I expected an error because \u2000 is not valid TOML whitespace.

What you got

$ echo "0=0\n\u2000\u2000\n1=1\u2000\u2000\n2=2" | tt_decoder
{"0":{"type":"integer","value":"0"},"1":{"type":"integer","value":"1"},"2":{"type":"integer","value":"2"}}

Environment

toml++ version and/or commit hash:
v3 cdf85a9

Any other useful information:
The fuzzer triggered on \u2000 which is a general punctuation code point from what I can tell (I'm not a Unicode guru). I didn't try any other code points.

Found while doing differential fuzzing against go-toml. The fuzzer is merciless. 😄

Metadata

Metadata

Assignees

Labels

TOML specAn issue relating to the library's TOML spec conformance.bugSomething isn't workingimplemented in v3Fixes + features which were implemented in v3 release.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions