You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-6Lines changed: 9 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,21 +4,22 @@
4
4
5
5
`gpt-tokenizer` is a highly optimized Token Byte Pair Encoder/Decoder for all OpenAI's models (including those used by GPT-2, GPT-3, GPT-3.5 and GPT-4). It's written in TypeScript, and is fully compatible with all modern JavaScript environments.
6
6
7
+
This package is a port of OpenAI's [tiktoken](https://github.com/openai/tiktoken), with some additional features sprinkled on top.
8
+
7
9
OpenAI's GPT models utilize byte pair encoding to transform text into a sequence of integers before feeding them into the model.
8
10
9
11
As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM. It implements some unique features, such as:
10
12
13
+
- Support for easily tokenizing chats thanks to the `encodeChat` function
11
14
- Support for all current OpenAI models (available encodings: `r50k_base`, `p50k_base`, `p50k_edit` and `cl100k_base`)
12
-
- Generator function versions of both the decoder and encoder
15
+
- Generator function versions of both the decoder and encoder functions
13
16
- Provides the ability to decode an asynchronous stream of data (using `decodeAsyncGenerator` and `decodeGenerator` with any iterable input)
14
17
- No global cache (no accidental memory leaks, as with the original GPT-3-Encoder implementation)
15
-
- Includes a highly performant `isWithinTokenLimit` function to assess token limit without encoding the entire text
18
+
- Includes a highly performant `isWithinTokenLimit` function to assess token limit without encoding the entire text/chat
16
19
- Improves overall performance by eliminating transitive arrays
17
20
- Type-safe (written in TypeScript)
18
21
- Works in the browser out-of-the-box
19
22
20
-
This package is a port of OpenAI's [tiktoken](https://github.com/openai/tiktoken), with some additional features sprinkled on top.
21
-
22
23
Thanks to @dmitry-brazhenko's [SharpToken](https://github.com/dmitry-brazhenko/SharpToken), whose code was served as a reference for the port.
23
24
24
25
Historical note: This package started off as a fork of [latitudegames/GPT-3-Encoder](https://github.com/latitudegames/GPT-3-Encoder), but version 2.0 was rewritten from scratch.
0 commit comments