Skip to content

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Dec 13, 2025

Previously we used @typescript-eslint/typescript-estree parser to tokenize a source text. @typescript-eslint/typescript-estree uses TypeScript parser internally, and then builds the tokens array by traversing TypeScript's AST.

Instead, use TypeScript parser directly, and copy the code from TS-ESLint for converting tokens from TypeScript's format to TS-ESTree format.

Why?

This has a few advantages:

  1. Remove a dependency.
  2. TS-ESLint's tokens did not have start or end properties, which we want for consistency with our AST. By reimplementing ourselves, we can add those properties to tokens.
  3. TS-ESLint does a bunch of other work besides compiling the tokens array - it converts the AST too. We discard the AST that TS-ESLint produces, so this work is pointless. By cutting out TS-ESLint, we also cut out that unnecessary work, and do only what we need to - tokenizing.
  4. TS-ESLint would throw an error if TypeScript produces any errors (even non-fatal ones), and there is no option to tell TS-ESLint not to do that. This caused a bunch of conformance tests to fail, because ESLint's tests include invalid code.
  5. Can add workarounds for TypeScript bugs - e.g. not being able to parse JSX closing fragments containing line breaks (<>hello<\n/>).

Implementation detail

We no longer need to get comments from TypeScript. We use the comments produced by Oxc parser instead.

CommentToken type

A side effect is that there's no longer any need for separate Comment and CommentToken types - as we use our own comments now. So this PR removes CommentToken type.

Have marked this as a breaking change due to removal of this type.

Copy link
Member Author

overlookmotel commented Dec 13, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added A-linter Area - Linter A-cli Area - CLI A-linter-plugins Area - Linter JS plugins labels Dec 13, 2025
@github-actions github-actions bot added the C-bug Category - Bug label Dec 13, 2025
@overlookmotel overlookmotel force-pushed the 12-13-test_linter_plugins_refactor_tokens_tests branch from c457c5c to bfbf14d Compare December 13, 2025 18:42
@overlookmotel overlookmotel force-pushed the 12-12-fix_linter_plugins_use_typescript_parser_directly_for_getting_tokens branch from 6821b23 to 3d24979 Compare December 13, 2025 18:42
@overlookmotel overlookmotel self-assigned this Dec 13, 2025
graphite-app bot pushed a commit that referenced this pull request Dec 13, 2025
Pure refactor of tests.

Unit tests for `isSpaceBetween` were using `@typescript-eslint/typescript-estree` to parse code snippets.

Instead use Oxc parser, and set up global state the same way as if the file was being linted, instead of mocking `source_code.ts` to get the source text into the linter's internals.

This brings the tests closer to real usage. It also prepares the way for changing how tokens are parsed in a following PR (#16805).
@graphite-app graphite-app bot changed the base branch from 12-13-test_linter_plugins_refactor_tokens_tests to graphite-base/16805 December 13, 2025 18:52
@graphite-app graphite-app bot force-pushed the graphite-base/16805 branch from bfbf14d to d734b86 Compare December 13, 2025 18:58
@graphite-app graphite-app bot force-pushed the 12-12-fix_linter_plugins_use_typescript_parser_directly_for_getting_tokens branch from 3d24979 to babcbb4 Compare December 13, 2025 18:58
@graphite-app graphite-app bot changed the base branch from graphite-base/16805 to main December 13, 2025 18:59
@graphite-app graphite-app bot force-pushed the 12-12-fix_linter_plugins_use_typescript_parser_directly_for_getting_tokens branch from babcbb4 to a211e0f Compare December 13, 2025 18:59
@overlookmotel overlookmotel changed the title fix(linter/plugins): use TypeScript parser directly for getting tokens fix(linter/plugins)!: use TypeScript parser directly for getting tokens Dec 13, 2025
@overlookmotel overlookmotel changed the base branch from main to graphite-base/16805 December 13, 2025 21:13
@overlookmotel overlookmotel force-pushed the 12-12-fix_linter_plugins_use_typescript_parser_directly_for_getting_tokens branch from a211e0f to 1973405 Compare December 13, 2025 21:13
@overlookmotel overlookmotel changed the base branch from graphite-base/16805 to 12-13-refactor_linter_plugins_stronger_debug_checks_for_tokens December 13, 2025 21:13
@overlookmotel overlookmotel marked this pull request as ready for review December 13, 2025 21:15
Copilot AI review requested due to automatic review settings December 13, 2025 21:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request replaces the @typescript-eslint/typescript-estree parser with direct TypeScript parser usage for tokenization. The change removes an external dependency, adds start and end properties to tokens for consistency with Oxc AST, and enables parsing of invalid code that TS-ESLint would reject.

Key changes:

  • Implements custom token parsing using TypeScript's parser directly instead of TS-ESLint
  • Removes CommentToken type (breaking change) in favor of using Oxc's Comment type
  • Uses Oxc parser's comments instead of TypeScript's comments
  • Adds start and end properties to tokens alongside existing range property

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pnpm-lock.yaml Removes @typescript-eslint/typescript-estree and related dependencies
apps/oxlint/package.json Removes @typescript-eslint/typescript-estree from dependencies
apps/oxlint/tsdown.config.ts Updates bundle entry from ts_eslint.cjs to typescript.cjs and updates comments
apps/oxlint/src-js/plugins/typescript.cjs New bundle entry that exports TypeScript module directly
apps/oxlint/src-js/plugins/ts_eslint.cjs Deleted - replaced by typescript.cjs
apps/oxlint/src-js/plugins/tokens_parse.ts New file implementing token parsing using TypeScript parser, adapted from TS-ESLint's implementation
apps/oxlint/src-js/plugins/tokens.ts Refactors initTokens to use new parser, removes TS-ESLint lazy-loading, uses Oxc comments instead of TS-ESLint comments
apps/oxlint/src-js/plugins/types.ts Changes NodeOrToken type from Node | Token | CommentToken to Node | Token | Comment
apps/oxlint/src-js/plugins/source_code.ts Updates return type of tokensAndComments to use Comment instead of CommentToken
apps/oxlint/src-js/index.ts Removes CommentToken from public exports (breaking change)
apps/oxlint/test/tokens.test.ts Updates test setup to properly initialize parsing environment including context, buffers, and source text
apps/oxlint/test/fixtures/tokens/plugin.ts Adds assertions to verify tokens have start and end properties matching range
apps/oxlint/conformance/snapshot.md Shows improved test pass rates: 36 more tests passing overall, with 3 rules moving to fully passing
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)

apps/oxlint/src-js/plugins/tokens.ts:211

  • The TODO comment indicates that range[0] should be replaced with start now that tokens have start property. However, this TODO has not been addressed in this PR. The function still uses token.range[0] and comment.range[0] instead of token.start and comment.start. According to the stored memory about token property access, the codebase convention is to use start and end properties for system-produced tokens. This inconsistency should be resolved for better code consistency and performance (property access is faster than array indexing).
  // TODO: Replace `range[0]` with `start` throughout this function
  // once we have our own tokens which have `start` property

  // Fast paths for file with no comments, or file which is only comments
  const commentsLength = comments.length;
  if (commentsLength === 0) {
    tokensAndComments = tokens;
    return;
  }

  const tokensLength = tokens.length;
  if (tokensLength === 0) {
    tokensAndComments = comments;
    return;
  }

  // File contains both tokens and comments.
  // Fill `tokensAndComments` with the 2 arrays interleaved in source order.
  tokensAndComments = [];

  let tokenIndex = 0,
    commentIndex = 0,
    token = tokens[0],
    comment = comments[0],
    tokenStart = token.range[0],
    commentStart = comment.range[0];

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@charliecreates charliecreates bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new TypeScript-native tokenization introduces a hard regression in debug/test builds: debugCheckValidRanges now throws on end === start, and the conformance snapshot shows multiple Invalid token range: … failures, indicating this is currently breaking suites. tokens_parse.ts depends on parent pointers for JSX token classification, so setParentNodes must be intentionally kept true to prevent subtle token-type regressions. Finally, the null as any + __proto__ approach in tokens_parse.ts weakens safety/maintainability and is worth tightening given this is a core parsing primitive.

Additional notes (1)
  • Maintainability | apps/oxlint/src-js/plugins/tokens.ts:168-187
    initTokensAndComments now sources comments from ast.comments (Oxc) but tokens come from TypeScript. You do validate ranges independently, but you are implicitly assuming both systems use the same indexing scheme and that both are based on the same exact sourceText.

Given the new snapshot failures and the presence of BOM handling in setupSourceForFile, it’s worth explicitly verifying that TS token offsets and Oxc comment offsets are aligned (especially around BOM and newline normalization). If they diverge, merging/sorting will produce incorrect order and potentially overlapping ranges.

Summary of changes

Summary of changes

Tokenization pipeline refactor

  • Replaced tokenization based on @typescript-eslint/typescript-estree with a new TypeScript-native tokenizer:
    • apps/oxlint/src-js/plugins/tokens.ts now calls parseTokens() instead of TS-ESLint parse().
    • Added apps/oxlint/src-js/plugins/tokens_parse.ts, copying/adapting TS-ESLint logic to walk a TS AST and convert TS tokens into the project’s Token shape (now including start/end and lazy loc).
    • Added a new lazy-loaded bundle entry apps/oxlint/src-js/plugins/typescript.cjs and updated apps/oxlint/tsdown.config.ts to build it.

Types and API surface changes

  • Removed the CommentToken type export and switched to using Comment from plugins/types.ts:
    • tokensAndComments is now typed as (Token | Comment)[].
    • NodeOrToken now includes Comment.
    • BaseToken now extends Span directly so tokens gain start/end.

Test and fixtures updates

  • Updated token fixture plugin to assert token.start/end === token.range[...].
  • Reworked tokens.test.ts to avoid mocking source_code.ts; it now uses real parse/buffer setup via parseRaw() and file context helpers.

Dependency changes

  • Removed @typescript-eslint/typescript-estree from apps/oxlint devDependencies and lockfile.

Conformance output changes

  • Snapshot updated for improved conformance results and new error outputs (notably Invalid token range: …).

@charliecreates charliecreates bot removed the request for review from CharlieHelps December 13, 2025 21:21
@overlookmotel overlookmotel force-pushed the 12-12-fix_linter_plugins_use_typescript_parser_directly_for_getting_tokens branch 2 times, most recently from eeff8a9 to e2d81cd Compare December 13, 2025 22:20
@graphite-app graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Dec 13, 2025
@graphite-app
Copy link
Contributor

graphite-app bot commented Dec 13, 2025

Merge activity

…ns (#16805)

Previously we used `@typescript-eslint/typescript-estree` parser to tokenize a source text. `@typescript-eslint/typescript-estree` uses TypeScript parser internally, and then builds the tokens array by traversing TypeScript's AST.

Instead, use TypeScript parser directly, and copy the code from TS-ESLint for converting tokens from TypeScript's format to TS-ESTree format.

### Why?

This has a few advantages:

1. Remove a dependency.
2. TS-ESLint's tokens did not have `start` or `end` properties, which we want for consistency with our AST. By reimplementing ourselves, we can add those properties to tokens.
3. TS-ESLint does a bunch of other work besides compiling the tokens array - it converts the AST too. We discard the AST that TS-ESLint produces, so this work is pointless. By cutting out TS-ESLint, we also cut out that unnecessary work, and do only what we need to - tokenizing.
4. TS-ESLint would throw an error if TypeScript produces any errors (even non-fatal ones), and there is no option to tell TS-ESLint not to do that. This caused a bunch of conformance tests to fail, because ESLint's tests include invalid code.
5. Can add workarounds for TypeScript bugs - e.g. not being able to parse JSX closing fragments containing line breaks (`<>hello<\n/>`).

### Implementation detail

We no longer need to get comments from TypeScript. We use the comments produced by Oxc parser instead.

### `CommentToken` type

A side effect is that there's no longer any need for separate `Comment` and `CommentToken` types - as we use our own comments now. So this PR removes `CommentToken` type.

Have marked this as a breaking change due to removal of this type.
@graphite-app graphite-app bot force-pushed the 12-13-refactor_linter_plugins_stronger_debug_checks_for_tokens branch from d608f7c to c06d243 Compare December 13, 2025 22:55
@graphite-app graphite-app bot force-pushed the 12-12-fix_linter_plugins_use_typescript_parser_directly_for_getting_tokens branch from e2d81cd to f814c82 Compare December 13, 2025 22:55
graphite-app bot pushed a commit that referenced this pull request Dec 13, 2025
Pure refactor. `| Comment` in `NodeOrToken | Comment` is redundant. Since #16805, `NodeOrToken` = `Node | Token | Comment` anyway. Shorten to just `NodeOrToken`.
graphite-app bot pushed a commit that referenced this pull request Dec 13, 2025
…ens methods (#16808)

#16805 added `start` and `end` properties to tokens. Use them instead of the slower and more verbose `range[0]` and `range[1]` in all tokens methods.

Note: Only use `start` and `end` for tokens and comments, as we produce them and know for sure that these properties are present. Continue to use `range` for objects passed to methods by user.
Base automatically changed from 12-13-refactor_linter_plugins_stronger_debug_checks_for_tokens to main December 13, 2025 23:00
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Dec 13, 2025
@graphite-app graphite-app bot merged commit f814c82 into main Dec 13, 2025
20 checks passed
@graphite-app graphite-app bot deleted the 12-12-fix_linter_plugins_use_typescript_parser_directly_for_getting_tokens branch December 13, 2025 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-cli Area - CLI A-linter Area - Linter A-linter-plugins Area - Linter JS plugins C-bug Category - Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants