performance of countTokens

I comparing performance of `gpt-tokenizer 2.7.0` and `tiktoken 1.0.17`, on Intel based Mac + `node 22.11.0` I'm always getting worser times for `gpt-tokenizer` than for `tiktoken`. I'm I doing something wrong or is this expected?

![image](https://github.com/user-attachments/assets/555d548c-3073-4834-be5e-8ab1e8b444ee)

```
import { countTokens } from 'gpt-tokenizer';
import { encoding_for_model } from 'tiktoken';

const SAMPLE_TEXT = 'Occaecat est tempor incididunt voluptate exercitation irure quis aliqua sunt dolor. Anim nostrud incididunt eu aliquip quis culpa do incididunt eu. Magna qui dolor deserunt sit velit. Dolor anim laborum ut ad in et occaecat enim elit culpa commodo. Sit ut sit mollit adipisicing. Labore culpa do cillum proident incididunt et. Reprehenderit nisi excepteur culpa consectetur mollit consectetur laborum';

const LONG_MSG_REPEATS = 50000;
const EXPECTED_TOKENS = 86;

const gpt35Encoding = encoding_for_model('gpt-3.5-turbo');

describe('TokenizerService', () => {
  it('gpt-tokenizer short text', () => {
    const tokens = countTokens(SAMPLE_TEXT);
    expect(tokens).toBe(EXPECTED_TOKENS);
  });

  it('tiktoken short text', () => {
    const tokens = gpt35Encoding.encode(SAMPLE_TEXT).length;
    expect(tokens).toBe(EXPECTED_TOKENS);
  });

  it('gpt-tokenizer long text', () => {
    const tokens = countTokens(SAMPLE_TEXT.repeat(LONG_MSG_REPEATS));
    expect(tokens).toBe(EXPECTED_TOKENS * LONG_MSG_REPEATS);
  });

  it('tiktoken long text', () => {
    const tokens = gpt35Encoding.encode(SAMPLE_TEXT.repeat(LONG_MSG_REPEATS)).length;
    expect(tokens).toBe(EXPECTED_TOKENS * LONG_MSG_REPEATS);
  });
});
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance of countTokens #68

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

performance of countTokens #68

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions