-
-
Notifications
You must be signed in to change notification settings - Fork 763
fix(linter/plugins)!: use TypeScript parser directly for getting tokens #16805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(linter/plugins)!: use TypeScript parser directly for getting tokens #16805
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
c457c5c to
bfbf14d
Compare
6821b23 to
3d24979
Compare
Pure refactor of tests. Unit tests for `isSpaceBetween` were using `@typescript-eslint/typescript-estree` to parse code snippets. Instead use Oxc parser, and set up global state the same way as if the file was being linted, instead of mocking `source_code.ts` to get the source text into the linter's internals. This brings the tests closer to real usage. It also prepares the way for changing how tokens are parsed in a following PR (#16805).
bfbf14d to
d734b86
Compare
3d24979 to
babcbb4
Compare
babcbb4 to
a211e0f
Compare
a211e0f to
1973405
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request replaces the @typescript-eslint/typescript-estree parser with direct TypeScript parser usage for tokenization. The change removes an external dependency, adds start and end properties to tokens for consistency with Oxc AST, and enables parsing of invalid code that TS-ESLint would reject.
Key changes:
- Implements custom token parsing using TypeScript's parser directly instead of TS-ESLint
- Removes
CommentTokentype (breaking change) in favor of using Oxc'sCommenttype - Uses Oxc parser's comments instead of TypeScript's comments
- Adds
startandendproperties to tokens alongside existingrangeproperty
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Removes @typescript-eslint/typescript-estree and related dependencies |
| apps/oxlint/package.json | Removes @typescript-eslint/typescript-estree from dependencies |
| apps/oxlint/tsdown.config.ts | Updates bundle entry from ts_eslint.cjs to typescript.cjs and updates comments |
| apps/oxlint/src-js/plugins/typescript.cjs | New bundle entry that exports TypeScript module directly |
| apps/oxlint/src-js/plugins/ts_eslint.cjs | Deleted - replaced by typescript.cjs |
| apps/oxlint/src-js/plugins/tokens_parse.ts | New file implementing token parsing using TypeScript parser, adapted from TS-ESLint's implementation |
| apps/oxlint/src-js/plugins/tokens.ts | Refactors initTokens to use new parser, removes TS-ESLint lazy-loading, uses Oxc comments instead of TS-ESLint comments |
| apps/oxlint/src-js/plugins/types.ts | Changes NodeOrToken type from Node | Token | CommentToken to Node | Token | Comment |
| apps/oxlint/src-js/plugins/source_code.ts | Updates return type of tokensAndComments to use Comment instead of CommentToken |
| apps/oxlint/src-js/index.ts | Removes CommentToken from public exports (breaking change) |
| apps/oxlint/test/tokens.test.ts | Updates test setup to properly initialize parsing environment including context, buffers, and source text |
| apps/oxlint/test/fixtures/tokens/plugin.ts | Adds assertions to verify tokens have start and end properties matching range |
| apps/oxlint/conformance/snapshot.md | Shows improved test pass rates: 36 more tests passing overall, with 3 rules moving to fully passing |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)
apps/oxlint/src-js/plugins/tokens.ts:211
- The TODO comment indicates that
range[0]should be replaced withstartnow that tokens havestartproperty. However, this TODO has not been addressed in this PR. The function still usestoken.range[0]andcomment.range[0]instead oftoken.startandcomment.start. According to the stored memory about token property access, the codebase convention is to usestartandendproperties for system-produced tokens. This inconsistency should be resolved for better code consistency and performance (property access is faster than array indexing).
// TODO: Replace `range[0]` with `start` throughout this function
// once we have our own tokens which have `start` property
// Fast paths for file with no comments, or file which is only comments
const commentsLength = comments.length;
if (commentsLength === 0) {
tokensAndComments = tokens;
return;
}
const tokensLength = tokens.length;
if (tokensLength === 0) {
tokensAndComments = comments;
return;
}
// File contains both tokens and comments.
// Fill `tokensAndComments` with the 2 arrays interleaved in source order.
tokensAndComments = [];
let tokenIndex = 0,
commentIndex = 0,
token = tokens[0],
comment = comments[0],
tokenStart = token.range[0],
commentStart = comment.range[0];
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new TypeScript-native tokenization introduces a hard regression in debug/test builds: debugCheckValidRanges now throws on end === start, and the conformance snapshot shows multiple Invalid token range: … failures, indicating this is currently breaking suites. tokens_parse.ts depends on parent pointers for JSX token classification, so setParentNodes must be intentionally kept true to prevent subtle token-type regressions. Finally, the null as any + __proto__ approach in tokens_parse.ts weakens safety/maintainability and is worth tightening given this is a core parsing primitive.
Additional notes (1)
- Maintainability |
apps/oxlint/src-js/plugins/tokens.ts:168-187
initTokensAndCommentsnow sources comments fromast.comments(Oxc) but tokens come from TypeScript. You do validate ranges independently, but you are implicitly assuming both systems use the same indexing scheme and that both are based on the same exactsourceText.
Given the new snapshot failures and the presence of BOM handling in setupSourceForFile, it’s worth explicitly verifying that TS token offsets and Oxc comment offsets are aligned (especially around BOM and newline normalization). If they diverge, merging/sorting will produce incorrect order and potentially overlapping ranges.
Summary of changes
Summary of changes
Tokenization pipeline refactor
- Replaced tokenization based on
@typescript-eslint/typescript-estreewith a new TypeScript-native tokenizer:apps/oxlint/src-js/plugins/tokens.tsnow callsparseTokens()instead of TS-ESLintparse().- Added
apps/oxlint/src-js/plugins/tokens_parse.ts, copying/adapting TS-ESLint logic to walk a TS AST and convert TS tokens into the project’sTokenshape (now includingstart/endand lazyloc). - Added a new lazy-loaded bundle entry
apps/oxlint/src-js/plugins/typescript.cjsand updatedapps/oxlint/tsdown.config.tsto build it.
Types and API surface changes
- Removed the
CommentTokentype export and switched to usingCommentfromplugins/types.ts:tokensAndCommentsis now typed as(Token | Comment)[].NodeOrTokennow includesComment.BaseTokennow extendsSpandirectly so tokens gainstart/end.
Test and fixtures updates
- Updated token fixture plugin to assert
token.start/end === token.range[...]. - Reworked
tokens.test.tsto avoid mockingsource_code.ts; it now uses real parse/buffer setup viaparseRaw()and file context helpers.
Dependency changes
- Removed
@typescript-eslint/typescript-estreefromapps/oxlintdevDependencies and lockfile.
Conformance output changes
- Snapshot updated for improved conformance results and new error outputs (notably
Invalid token range: …).
eeff8a9 to
e2d81cd
Compare
Merge activity
|
…ns (#16805) Previously we used `@typescript-eslint/typescript-estree` parser to tokenize a source text. `@typescript-eslint/typescript-estree` uses TypeScript parser internally, and then builds the tokens array by traversing TypeScript's AST. Instead, use TypeScript parser directly, and copy the code from TS-ESLint for converting tokens from TypeScript's format to TS-ESTree format. ### Why? This has a few advantages: 1. Remove a dependency. 2. TS-ESLint's tokens did not have `start` or `end` properties, which we want for consistency with our AST. By reimplementing ourselves, we can add those properties to tokens. 3. TS-ESLint does a bunch of other work besides compiling the tokens array - it converts the AST too. We discard the AST that TS-ESLint produces, so this work is pointless. By cutting out TS-ESLint, we also cut out that unnecessary work, and do only what we need to - tokenizing. 4. TS-ESLint would throw an error if TypeScript produces any errors (even non-fatal ones), and there is no option to tell TS-ESLint not to do that. This caused a bunch of conformance tests to fail, because ESLint's tests include invalid code. 5. Can add workarounds for TypeScript bugs - e.g. not being able to parse JSX closing fragments containing line breaks (`<>hello<\n/>`). ### Implementation detail We no longer need to get comments from TypeScript. We use the comments produced by Oxc parser instead. ### `CommentToken` type A side effect is that there's no longer any need for separate `Comment` and `CommentToken` types - as we use our own comments now. So this PR removes `CommentToken` type. Have marked this as a breaking change due to removal of this type.
d608f7c to
c06d243
Compare
e2d81cd to
f814c82
Compare
Pure refactor. `| Comment` in `NodeOrToken | Comment` is redundant. Since #16805, `NodeOrToken` = `Node | Token | Comment` anyway. Shorten to just `NodeOrToken`.
…ens methods (#16808) #16805 added `start` and `end` properties to tokens. Use them instead of the slower and more verbose `range[0]` and `range[1]` in all tokens methods. Note: Only use `start` and `end` for tokens and comments, as we produce them and know for sure that these properties are present. Continue to use `range` for objects passed to methods by user.

Previously we used
@typescript-eslint/typescript-estreeparser to tokenize a source text.@typescript-eslint/typescript-estreeuses TypeScript parser internally, and then builds the tokens array by traversing TypeScript's AST.Instead, use TypeScript parser directly, and copy the code from TS-ESLint for converting tokens from TypeScript's format to TS-ESTree format.
Why?
This has a few advantages:
startorendproperties, which we want for consistency with our AST. By reimplementing ourselves, we can add those properties to tokens.<>hello<\n/>).Implementation detail
We no longer need to get comments from TypeScript. We use the comments produced by Oxc parser instead.
CommentTokentypeA side effect is that there's no longer any need for separate
CommentandCommentTokentypes - as we use our own comments now. So this PR removesCommentTokentype.Have marked this as a breaking change due to removal of this type.