You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Counts the number of tokens in the input text or chat. Use this method when you need to determine the number of tokens without checking against a limit.
241
+
The optional `encodeOptions` parameter allows you to specify custom sets of allowed or disallowed special tokens.
In this example, an Error is thrown, because the input text contains a disallowed special token.
372
373
374
+
## Performance Optimization
375
+
376
+
### LRU Merge Cache
377
+
378
+
The tokenizer uses an LRU (Least Recently Used) cache to improve encoding performance for similar strings. By default, it stores up to 100,000 merged token pairs. You can adjust this value to optimize for your specific use case:
379
+
380
+
- Increasing the cache size will make encoding similar strings faster but consume more memory
381
+
- Setting it to 0 will disable caching completely
382
+
- For applications processing many unique strings, a smaller cache might be more efficient
383
+
384
+
You can modify the cache size using the `setMergeCacheSize` function:
385
+
386
+
```ts
387
+
import { setMergeCacheSize } from'gpt-tokenizer'
388
+
389
+
// Set to 5000 entries
390
+
setMergeCacheSize(5000)
391
+
392
+
// Disable caching completely
393
+
setMergeCacheSize(0)
394
+
```
395
+
373
396
## Testing and Validation
374
397
375
398
`gpt-tokenizer` includes a set of test cases in the [TestPlans.txt](./data/TestPlans.txt) file to ensure its compatibility with OpenAI's Python `tiktoken` library. These test cases validate the functionality and behavior of `gpt-tokenizer`, providing a reliable reference for developers.
0 commit comments