Skip to content

Conversation

@ryan-m-walker
Copy link
Contributor

Summary

This PR is related to #7994

Similar to the other listed PR we need to increase the size of the TokenKind limit to handling growing language grammars. This issue was caught while working on the CSS if feature for issue #6725 in unit tests such as ok::grit_metavariable::metavar_css in crates/biome_parser/src/token_set.rs due to an "attempt to shift left with overflow" panic. This PR adds an extra item to the TokenKind array increasing the limit from 256 kinds to 384. I also broke this work out into a smaller separate PR since it seems like a deeper change with potentially more widespread impact and wanted it to be easier to review and scrutinize.

Test Plan

  • Added several unit tests to the file to test basic functionality across usage of each item in the TokenSet bitfield array.
  • All other unit tests in the repo are passing.

Docs

n/a

AI Assistance Disclosure

Claude Code was used for basic iteration and code research but almost all of this code was written directly by me and verified with unit tests written by me as well.

@changeset-bot
Copy link

changeset-bot bot commented Nov 5, 2025

⚠️ No Changeset found

Latest commit: 04f24bb

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

Walkthrough

TokenSet storage expanded from two 128-bit lanes to three, increasing capacity. All constructors, operations, and the EMPTY constant updated to support the new three-lane representation. Union operations now combine three lanes instead of two. A new public contains method checks membership across all lanes with extended range handling (0..=127, 128..=255, 256..=383). The internal mask function was extended to distribute bits across three 128-bit words with boundary checks and panic on out-of-range. Comprehensive tests validate mask behaviour, contains logic, union operations, and boundary conditions.

Suggested labels

A-Core

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly summarises the main change: expanding TokenSet storage capacity from two to three 128-bit lanes to support more token kinds.
Description check ✅ Passed The description clearly relates to the changeset, explaining the motivation (CSS if feature overflow panic), the fix (increasing TokenKind limit from 256 to 384), and test coverage.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 166be7c and 04f24bb.

📒 Files selected for processing (1)
  • crates/biome_parser/src/token_set.rs (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: Bench (biome_graphql_formatter)
  • GitHub Check: Bench (biome_graphql_parser)
  • GitHub Check: Documentation
  • GitHub Check: Test (depot-ubuntu-24.04-arm-16)
  • GitHub Check: End-to-end tests
  • GitHub Check: Test (depot-windows-2022-16)
  • GitHub Check: Test Node.js API
  • GitHub Check: Check Dependencies
  • GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
  • GitHub Check: Lint project (depot-windows-2022)
  • GitHub Check: autofix
  • GitHub Check: Bench (biome_json_formatter)
  • GitHub Check: Bench (biome_json_analyze)
  • GitHub Check: Bench (biome_json_parser)
  • GitHub Check: Bench (biome_css_formatter)
  • GitHub Check: Bench (biome_css_analyze)
  • GitHub Check: Bench (biome_css_parser)
  • GitHub Check: Bench (biome_js_analyze)
  • GitHub Check: Bench (biome_js_parser)
  • GitHub Check: Bench (biome_js_formatter)
🔇 Additional comments (1)
crates/biome_parser/src/token_set.rs (1)

126-220: Belt-and-braces boundary tests get a thumbs-up

These lane-by-lane checks make the widened bitfield feel bulletproof—great work covering every edge.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/biome_parser/src/token_set.rs (1)

25-34: Add documentation for the public method.

The implementation is correct, but this new public method lacks a doc comment explaining its purpose and behaviour.

Apply this diff to add documentation:

+    /// Returns `true` if the set contains the specified token kind.
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// # use biome_parser::TokenSet;
+    /// # use biome_rowan::SyntaxKind;
+    /// // let set = token_set![Kind::Foo, Kind::Bar];
+    /// // assert!(set.contains(Kind::Foo));
+    /// ```
     pub fn contains(&self, kind: K) -> bool {

Optional: Consider making this const fn for consistency with union and to enable use in const contexts, though the current implementation is perfectly adequate.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7278ed1 and 166be7c.

📒 Files selected for processing (1)
  • crates/biome_parser/src/token_set.rs (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: Test (depot-windows-2022-16)
  • GitHub Check: Bench (biome_graphql_parser)
  • GitHub Check: Bench (biome_graphql_formatter)
  • GitHub Check: Test Node.js API
  • GitHub Check: Bench (biome_js_parser)
  • GitHub Check: Bench (biome_js_formatter)
  • GitHub Check: Bench (biome_css_parser)
  • GitHub Check: autofix
  • GitHub Check: Bench (biome_js_analyze)
  • GitHub Check: Bench (biome_css_analyze)
  • GitHub Check: Bench (biome_css_formatter)
  • GitHub Check: Bench (biome_json_parser)
  • GitHub Check: Bench (biome_json_formatter)
  • GitHub Check: Bench (biome_json_analyze)
  • GitHub Check: Documentation
  • GitHub Check: Test (depot-ubuntu-24.04-arm-16)
  • GitHub Check: Check Dependencies
🔇 Additional comments (4)
crates/biome_parser/src/token_set.rs (4)

5-8: Struct expansion looks sound.

The three-lane storage correctly increases capacity to 384 token kinds, and the EMPTY constant properly initialises all lanes to zero.


14-23: Union operation correctly extended.

All three lanes are properly combined with bitwise OR.


48-56: Mask function correctly distributes across three lanes.

The bit shifting and offset calculations are accurate for each range (0..=127, 128..=255, 256..=383), with appropriate panic behaviour for invalid input.


68-235: Excellent test coverage!

The test suite thoroughly validates all three lanes with boundary tests, membership checks, union operations, and panic behaviour. Well done on the comprehensive testing—this gives confidence that the three-lane expansion works correctly.

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 5, 2025

CodSpeed Performance Report

Merging #7997 will not alter performance

Comparing ryan-m-walker:chore/increase-size-of-token-set (04f24bb) with main (0b28f5f)

Summary

✅ 53 untouched
⏩ 85 skipped1

Footnotes

  1. 85 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@ryan-m-walker
Copy link
Contributor Author

Will fix the lint issue later today when I have time

@ematipico ematipico merged commit 5c55347 into biomejs:main Nov 5, 2025
24 checks passed
ematipico pushed a commit to hamirmahal/biome that referenced this pull request Nov 19, 2025
l0ngvh pushed a commit to l0ngvh/biome that referenced this pull request Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants