You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+37-8Lines changed: 37 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
[](https://codesandbox.io/s/gpt-tokenizer-tjcjoz?fontsize=14&hidenavigation=1&theme=dark)
4
4
5
-
`gpt-tokenizer` is a Token Byte Pair Encoder/Decoder supporting all OpenAI's models (including GPT-3.5, GPT-4, GPT-4o, and o1).
5
+
`gpt-tokenizer` is a Token Byte Pair Encoder/Decoder supporting all OpenAI's models (including GPT-4o, o1, o3, o4, GPT-4.1 and older models like GPT-3.5, GPT-4).
6
6
It's the [_fastest, smallest and lowest footprint_](#benchmarks) GPT tokenizer available for all JavaScript environments. It's written in TypeScript.
7
7
8
8
This library has been trusted by:
@@ -17,7 +17,7 @@ Please consider [🩷 sponsoring](https://github.com/sponsors/niieani) the proje
17
17
18
18
#### Features
19
19
20
-
As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM. This package is a port of OpenAI's [tiktoken](https://github.com/openai/tiktoken), with some additional, unique features sprinkled on top:
20
+
It is the most feature-complete, open-source GPT tokenizer on NPM. This package is a port of OpenAI's [tiktoken](https://github.com/openai/tiktoken), with some additional, unique features sprinkled on top:
21
21
22
22
- Support for easily tokenizing chats thanks to the `encodeChat` function
23
23
- Support for all current OpenAI models (available encodings: `r50k_base`, `p50k_base`, `p50k_edit`, `cl100k_base` and `o200k_base`)
@@ -26,6 +26,8 @@ As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM. T
26
26
- Provides the ability to decode an asynchronous stream of data (using `decodeAsyncGenerator` and `decodeGenerator` with any iterable input)
27
27
- No global cache (no accidental memory leaks, as with the original GPT-3-Encoder implementation)
28
28
- Includes a highly performant `isWithinTokenLimit` function to assess token limit without encoding the entire text/chat
29
+
- Built-in cost estimation with the `estimateCost` function for calculating API usage costs
30
+
- Full library of OpenAI models with comprehensive pricing information (see [`src/models.ts`](./src/models.ts) and [`src/models.gen.ts`](./src/models.gen.ts))
29
31
- Improves overall performance by eliminating transitive arrays
30
32
- Type-safe (written in TypeScript)
31
33
- Works in the browser out-of-the-box
@@ -51,8 +53,8 @@ npm install gpt-tokenizer
51
53
52
54
If you wish to use a custom encoding, fetch the relevant script.
Estimates the cost of processing a given number of tokens using the model's pricing data. This function calculates costs for different API usage types (main API, batch API) and cached tokens when available.
336
+
337
+
The function returns a `PriceData` object with the following structure:
338
+
-`main`: Main API pricing with `input`, `output`, `cached_input`, and `cached_output` costs
339
+
-`batch`: Batch API pricing with the same cost categories
340
+
341
+
All costs are calculated in USD based on the token count provided.
console.log('Main API input cost:', costEstimate.main?.input)
352
+
console.log('Main API output cost:', costEstimate.main?.output)
353
+
console.log('Batch API input cost:', costEstimate.batch?.input)
354
+
```
355
+
356
+
Note: The model spec must be available either through the model-specific import or by passing it as the second parameter. Cost information may not be available for all models.
357
+
329
358
## Special tokens
330
359
331
360
There are a few special tokens that are used by the GPT models.
0 commit comments