Skip to content

Commit ffb41fb

Browse files
authored
Compress Unicode data more in Base36 (#60)
* PoC: compress unicode via Base36 string * cleanup * reduce code size * bring back optimization * shrink lookup table adjust size and perf * cleanup module scritps * update Hermes bundle stats * update Yarn * change module names * add changeset * add more benchmark records
1 parent 9e0feca commit ffb41fb

29 files changed

+1288
-749
lines changed

.changeset/fast-singers-switch.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
"unicode-segmenter": minor
3+
---
4+
5+
Code size is signaficantly reduced, minified JS now works in half
6+
7+
There are also some performance improvements.
8+
Not that much, but getting improvement on size without giving it up is a huge win.
9+
10+
- Compress Unicode data more in Base36
11+
12+
- Changed the internal representation into TypedArray to improve its access pattern.
13+
14+
- Shrank the grapheme lookup table size.
15+
This does not impact performance except for some edges like Hindi and Demonic, but it does reduce the bundle size.

.yarn/releases/yarn-4.4.1.cjs renamed to .yarn/releases/yarn-4.5.1.cjs

Lines changed: 367 additions & 358 deletions
Large diffs are not rendered by default.

.yarnrc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@ nmMode: hardlinks-global
22

33
nodeLinker: node-modules
44

5-
yarnPath: .yarn/releases/yarn-4.4.1.cjs
5+
yarnPath: .yarn/releases/yarn-4.5.1.cjs

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ Since [Hermes doesn't support the `Intl.Segmenter` API](https://github.com/faceb
231231

232232
| Name | Unicode® | ESM? | Size | Size (min) | Size (min+gzip) | Size (min+br) |
233233
|------------------------------|----------|------|----------:|-----------:|----------------:|--------------:|
234-
| `unicode-segmenter/grapheme` | 16.0.0 | ✔️ | 28,330 | 24,351 | 6,395 | 4,300 |
234+
| `unicode-segmenter/grapheme` | 16.0.0 | ✔️ | 17,125 | 12,720 | 5,256 | 3,913 |
235235
| `graphemer` | 15.0.0 | ✖️ ️| 410,435 | 95,104 | 15,752 | 10,660 |
236236
| `grapheme-splitter` | 10.0.0 | ✖️ | 122,252 | 23,680 | 7,852 | 4,841 |
237237
| `@formatjs/intl-segmenter`* | 15.0.0 | ✖️ | 491,043 | 318,721 | 54,248 | 34,380 |
@@ -247,7 +247,7 @@ Since [Hermes doesn't support the `Intl.Segmenter` API](https://github.com/faceb
247247

248248
| Name | Bytecode size | Bytecode size (gzip)* |
249249
|------------------------------|--------------:|----------------------:|
250-
| `unicode-segmenter/grapheme` | 35,074 | 13,366 |
250+
| `unicode-segmenter/grapheme` | 23,992 | 12,533 |
251251
| `graphemer` | 133,949 | 31,710 |
252252
| `grapheme-splitter` | 63,810 | 19,125 |
253253
| `@formatjs/intl-segmenter`* | 315,865 | 99,063 |

0 commit comments

Comments
 (0)