Support supplementary CPs in Unicode identifiers #2522

dcodeIO · 2022-09-25T22:43:03Z

Showed up in #2495, where 4-byte Unicode characters are not accepted as identifier starts/parts by the tokenizer. In TS, there exist maps for ES3, ES5 and ESNext, and I assume our current one matches the ES5 one. This PR adds a script we can use to generate the tables for ESNext (respectively: current Unicode tables used by Node), which should be the only relevant maps for us.

On our end, the respective maps are here. For ESNext, we'll need to go from u16 to u32 since these values are codepoints.

I've read the contributing guidelines
I've added my name and email to the NOTICE file

src/util/text.ts

MaxGraey

Nice!

src/util/text.ts

MaxGraey · 2022-09-26T13:33:05Z

scripts/unicode-identifier.js

+console.log(`const unicodeIdentifierStartMin = ${starts[0]};`);
+console.log(`const unicodeIdentifierStartMax = ${starts[starts.length - 1]};\n`);


Suggested change

console.log(`const unicodeIdentifierStartMin = ${starts[0]};`);

console.log(`const unicodeIdentifierStartMax = ${starts[starts.length - 1]};\n`);

console.log(`const UnicodeIdentifierStartMin = ${starts[0]};`);

console.log(`const UnicodeIdentifierStartMax = ${starts[starts.length - 1]};\n`);

Using title case for SomeEnum.SomeValue seems fine, but just SomeValue conflicts visually with classes, enums and interfaces. Not sure it's preferable?

scripts/unicode-identifier.js

src/util/text.ts

src/tokenizer.ts

Add script to generate unicode identifier starts/parts

62e0f29

dcodeIO requested a review from MaxGraey September 25, 2022 22:43

simplify

5de629c

dcodeIO mentioned this pull request Sep 25, 2022

Add two to four bytes utf8 character tests #2495

Merged

2 tasks

dcodeIO added 10 commits September 26, 2022 03:44

handle max cp

efa019d

i -> cp

dd32010

generate comments

727927d

install tables

3714970

clean

dfdf900

fix

163d014

compact

5ceeb7c

safe

1a6e09e

indicate code point

8a01c8a

use i32

37a35b8

dcodeIO commented Sep 26, 2022

View reviewed changes

src/util/text.ts Show resolved Hide resolved

dcodeIO added 2 commits September 26, 2022 05:05

simplify

ce03268

use a utility function for clarity

31eeb71

MaxGraey approved these changes Sep 26, 2022

View reviewed changes