Skip to content

Prepare Tree-sitter for Master Merge#3398

Draft
sdottaka wants to merge 80 commits into
masterfrom
feature/tree-sitter-refactor
Draft

Prepare Tree-sitter for Master Merge#3398
sdottaka wants to merge 80 commits into
masterfrom
feature/tree-sitter-refactor

Conversation

@sdottaka

@sdottaka sdottaka commented Jun 2, 2026

Copy link
Copy Markdown
Member

The current feature/tree-sitter branch still has several issues that should be addressed before it can be merged into master.

This PR focuses on resolving the following items:

  • Move Tree-sitter-related parser files from CrystalEdit to WinMerge, since CrystalEdit itself does not use them. (Completed)

  • Improve syntax highlighting consistency with the existing CrystalEdit parsers.

  • Fix stability issues, including crashes.

  • Improve editing performance by avoiding full highlight cache rebuilds after every edit.

  • Rework the parser architecture to allow multiple parser implementations (e.g. CrystalEdit parsers and Tree-sitter) through a common language-services layer. This includes:

    • Introduce the LangServices namespace containing common types and interfaces such as TEXTBLOCK, TextDefinition, ITextBlock, ISyntaxParser, and ISyntaxParserFactory.
    • Move common syntax-highlighting types out of CrystalEdit-specific parser code so they can be shared by all parser implementations.
    • Decouple parser creation from CCrystalTextView through ISyntaxParserFactory.
    • Introduce SyntaxParserRegistry as a singleton to manage parser factory registration and parser creation.
    • When multiple parsers support the same language, parser selection priority is determined by the factory registration order.
    • Ensure the language-services layer can be used independently of CCrystalTextView and crystallineparser.h, enabling use in non-UI contexts such as comment-difference ignoring.
  • Replace the current "Enable Tree-sitter" checkbox with parser selection modes:

    1. Disable Tree-sitter
    2. Prefer CrystalEdit parsers and fall back to Tree-sitter
    3. Prefer Tree-sitter and fall back to CrystalEdit parsers

    These modes are implemented by changing the parser factory registration order.

  • Stop downloading and building Tree-sitter language modules during every release build. Instead, add dedicated Visual Studio projects (.vcxproj) for each supported language module and build them as part of the WinMerge solution.

  • Include Tree-sitter runtime DLLs and language-module DLLs in all installer packages (x86 IS5, x64 IS5, and x64 IS6).

The goal is to make the Tree-sitter implementation stable, performant, maintainable, and ready for integration into the main branch.

Thorium and others added 30 commits March 26, 2026 19:49
Integrate tree-sitter as an optional syntax highlighting engine that
supplements the existing keyword-based CrystalEdit parsers. When a
grammar DLL and highlight query (.scm) are present in the
TreeSitterGrammars directory, tree-sitter provides full AST-based
highlighting; otherwise the existing parser runs unchanged.

Core components:
- TreeSitterParser.h/.cpp: CTreeSitterParser, CTreeSitterColorMap,
  CTreeSitterLanguage, and TreeSitterRegistry classes
- ParseLine virtual override in CMergeEditView for tree-sitter results
- Incremental parsing via ts_tree_edit() on each edit operation
- Lazy reparse with dirty flag (fires once per paint cycle)
- Status bar indicator showing [TS:language] in encoding pane
- Post-build step to copy grammar DLLs from Release to Debug/Test

Supported languages: bash, c, c-sharp, cpp, css, dtd, flow, fsharp,
fsharp_signature, go, html, java, javascript, json, php, php_only,
python, ruby, rust, tsx, typescript, xml.

Grammar DLLs are built separately via build-grammars.ps1.
- build-grammars.ps1: downloads and compiles grammar DLLs from GitHub
  releases using MSVC cl.exe/link.exe
- grammars.json: defines 17 grammar repos and release tags
- fsharp-highlights.scm: F# syntax highlight queries for tree-sitter
Wire in scope-aware highlighting (locals.scm) and language injection
(injections.scm) alongside the existing highlights.scm support.

- CTreeSitterLanguage: add LoadQuery() helper, load all three .scm files
- CTreeSitterParser: add RunLocalsQuery() for scope/def/ref tracking,
  RunInjectionQuery() for embedded language highlighting, GetSetProperty()
  for #set! predicate parsing; RunHighlightQuery() cross-references locals
- TreeSitterRegistry: add GetLanguageForName() for injection language lookup
- build-grammars.ps1: resolve and copy locals.scm and injections.scm files
- Fix type mismatch (RefInfo vs PendingRef) and remove dead code
- Add tree-sitter shared items to solution and projects
- Update SampleStatic project to include tree-sitter
- Fix build-grammars.ps1 to use Git Bash explicitly
- Add missing <algorithm> include
- Minor solution cleanup and add Italian translation
* fix: bundle inherited tree-sitter queries for grammars

Agent-Logs-Url: https://github.com/Thorium/winmerge/sessions/234ce03d-a145-4b8c-b4c2-37eed3e33cf0

Co-authored-by: Thorium <229355+Thorium@users.noreply.github.com>

* refine tree-sitter query bundling helpers

Agent-Logs-Url: https://github.com/Thorium/winmerge/sessions/234ce03d-a145-4b8c-b4c2-37eed3e33cf0

Co-authored-by: Thorium <229355+Thorium@users.noreply.github.com>

* polish tree-sitter query bundle handling

Agent-Logs-Url: https://github.com/Thorium/winmerge/sessions/234ce03d-a145-4b8c-b4c2-37eed3e33cf0

Co-authored-by: Thorium <229355+Thorium@users.noreply.github.com>

* Earlier CoPilot feedback addressed.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Thorium <229355+Thorium@users.noreply.github.com>
* Doc - Italian language - Updated (#3319)

* Update Italian.po

* Fix issue #3321: [BUG] Incorrect string used with beta releases

* Show error message when entering path in header bar (#3322)

* Prioritize explicitly selected plugins over archive detection (#3324)

* Prioritize explicitly selected plugins over archive detection

* Update Src/7zCommon.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update Src/7zCommon.cpp

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Use 7-Zip IsArc API for archive detection and refactor format guessing logic (#3323)

* Use 7-Zip IsArc API for archive detection and refactor format guessing logic

* Update ArchiveSupport/Merge7z/Merge7zCommon.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Restore extension-only fallback in GuessFormatEx and handle NEED_MORE result

Agent-Logs-Url: https://github.com/WinMerge/winmerge/sessions/47af4d0f-fc0a-4e33-ab81-8ec95c0f599e

Co-authored-by: sdottaka <98126+sdottaka@users.noreply.github.com>

* Use 7-Zip IsArc API for archive detection and refactor format guessing logic (2)

* Use 7-Zip IsArc API for archive detection and refactor format guessing logic (3)

* Prioritize explicitly selected plugins over archive detection

* Update Src/7zCommon.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update Src/7zCommon.cpp

* Update Merge7zCommon.cpp

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sdottaka <98126+sdottaka@users.noreply.github.com>

* Merge7z: Bump revision to 2600.1

* Merge7z: Bump revision to 2600.1 (2)

* Update French Manual (#3325)

* Refactor: unify open parameters and move recurse to OpenFolderParams (#3326)

* Update Manual/French.po

* Refactor: unify open parameters and move recurse to OpenFolderParams (#3326) (2)

(cherry picked from commit 83af229)

* Add Folder comparison mode with archive extraction support (#3320)

* Update Manual/French.po

* Update Brazilian.po (#3328)

Added translation for "Add Folder comparison mode with archive extraction support (#3320)"

* Update German.po (#3329)

* update zh-cn translation (#3331)

* Update Turkish.po (#3333)

New string entries

* Update Korean (#3334)

* Code review fixes for 5 oldest source files#3327 #1

* Code review fixes for 5 oldest source files#3327 #2

* Update Turkish.po

* Update TranslationsStatus

* Update ChangeLog&ReleaseNotes

* Italian language (#3335)

* Stabilize tree-sitter highlight precedence

Make overlapping captures resolve deterministically so syntax colors stay consistent across panes and languages. Also accept local.* capture prefixes so newer query conventions keep local symbol highlighting working.

* Unify tree-sitter block ordering

Use one parser-wide block order counter so injected-language highlights cannot collide with primary highlight ordering when the final precedence tie-breaker runs.

---------

Co-authored-by: bovirus <1262554+bovirus@users.noreply.github.com>
Co-authored-by: Takashi Sawanaka <sdottaka@users.sourceforge.net>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: sdottaka <98126+sdottaka@users.noreply.github.com>
Co-authored-by: t3chnob0y <t3chnob0y@users.noreply.github.com>
Co-authored-by: Marcellomco <70959309+Marcellomco@users.noreply.github.com>
Co-authored-by: René T. Nicolaus <12006431+Havoc7891@users.noreply.github.com>
Co-authored-by: YG <1246410+yingang@users.noreply.github.com>
Co-authored-by: bilimiyorum <131397022+bilimiyorum@users.noreply.github.com>
Co-authored-by: VenusGirl❤ <venusgirl@outlook.com>
* Finish tree-sitter runtime integration for compare views

Wire the runtime grammar bundle, compare-view UI, and same-file navigation together so tree-sitter features are actually available in built binaries. This also updates the F# grammar bundle to include tags and disables Go to Definition when the current caret position cannot resolve.

* Fix tree-sitter follow-up packaging issues

Guard the WiX grammar component reference when harvested files are absent, and remove the redundant TreeSitterWrapper include to avoid the _T macro redefinition warning.
# Conflicts:
#	ArchiveSupport/Merge7z/BuildArc.cmd
#	Docs/Users/ChangeLog.html
#	Docs/Users/ChangeLog.md
#	Docs/Users/ReleaseNotes.html
#	Docs/Users/ReleaseNotes.md
#	DownloadDeps.cmd
#	Src/FilepathEdit.cpp
#	Src/Merge.vcxproj.filters
#	Src/res/new_folder.bmp
#	Translations/TranslationsStatus.md
#	Translations/WinMerge/Arabic.po
#	Translations/WinMerge/Basque.po
#	Translations/WinMerge/Brazilian.po
#	Translations/WinMerge/Bulgarian.po
#	Translations/WinMerge/Catalan.po
#	Translations/WinMerge/ChineseSimplified.po
#	Translations/WinMerge/ChineseTraditional.po
#	Translations/WinMerge/Corsican.po
#	Translations/WinMerge/Croatian.po
#	Translations/WinMerge/Czech.po
#	Translations/WinMerge/Danish.po
#	Translations/WinMerge/Dutch.po
#	Translations/WinMerge/English.pot
#	Translations/WinMerge/Finnish.po
#	Translations/WinMerge/French.po
#	Translations/WinMerge/Galician.po
#	Translations/WinMerge/German.po
#	Translations/WinMerge/Greek.po
#	Translations/WinMerge/Hebrew.po
#	Translations/WinMerge/Hungarian.po
#	Translations/WinMerge/Italian.po
#	Translations/WinMerge/Japanese.po
#	Translations/WinMerge/Korean.po
#	Translations/WinMerge/Lithuanian.po
#	Translations/WinMerge/Norwegian.po
#	Translations/WinMerge/Persian.po
#	Translations/WinMerge/Polish.po
#	Translations/WinMerge/Portuguese.po
#	Translations/WinMerge/Romanian.po
#	Translations/WinMerge/Russian.po
#	Translations/WinMerge/Serbian.po
#	Translations/WinMerge/Sinhala.po
#	Translations/WinMerge/Slovak.po
#	Translations/WinMerge/Slovenian.po
#	Translations/WinMerge/Spanish.po
#	Translations/WinMerge/Swedish.po
#	Translations/WinMerge/Tamil.po
#	Translations/WinMerge/Turkish.po
#	Translations/WinMerge/Ukrainian.po
#	Translations/WinMerge/Vietnamese.po
…s and FolderCompare projects are not yet buildable. MFC dependencies still need to be removed from TreeSitterParser.
* Fix tree-sitter go to definition from context menus

Update right-click navigation to resolve the symbol under the mouse and prefer tagged type definitions when the position-based lookup stays on the current line.

* Update tree-sitter context-menu definition handling
# Conflicts:
#	Src/Merge.vcxproj
#	Src/MergeDoc.cpp
#	Src/MergeDoc.h
Replace ITextBuffer* parameter in NotifyEdit with TextEdit struct.
Move notification to buffer layer (AddUndoRecord) for consistency.
- Move TreeSitterParser and TreeSitterWrapper from Externals/crystaledit/editlib to Src/
- Move tree-sitter library from Externals/crystaledit/editlib/ to Externals/ (top-level)
- Remove TreeSitter references from editlibparsers.vcxitems (CrystalEdit shared items)
- Update include paths in WinMerge source files to reference local TreeSitter headers
- Update project files and solution configuration

This decouples tree-sitter from CrystalEdit, making CrystalEdit a pure text editor
library while keeping tree-sitter as a WinMerge-specific feature.
…esign

Remove stored buffer reference from CTreeSitterParser and pass ITextBuffer*
explicitly to methods that need it. This eliminates hidden state and makes
buffer dependencies explicit at call sites.

Changes:
- Remove m_pBuffer, SetBuffer(), and GetBuffer() from CTreeSitterParser
- Add ITextBuffer* parameter to FindDefinition() and TryGetTagDefinitionByNameAt()
- Introduce TreeSitterParseContext struct to hold both parser and buffer references
- Update MergeDoc to create and own TreeSitterParseContext instances
- Update ParseLineTreeSitter() to use context for lazy reparse with explicit buffer
- Update all call sites in MergeEditView to pass buffer parameter
Keep only the highest priority highlight when multiple captures match
the same token range, preventing conflicting color indices.
- Add ISyntaxParser::FindMatchingBrace() with default false implementation
- Implement FindMatchingBrace() in CrystalLineParserAdapter using legacy logic
- Implement FindMatchingBrace() in TreeSitterParserAdapter using AST structure
axParser- Refactor CCrystalTextView::OnMatchBrace() to delegate to parser
- Add m_nCurrentTextType to track current parser type for UI state
- Remove redundant m_CurSourceDef->flags writes from menu handlers
- Update OnUpdateSourceType/OnToggleSourceHeader to use m_nCurrentTextType

This reduces CCrystalTextView's dependency on m_CurSourceDef and provides
cleaner abstraction for syntax-aware brace matching.
Replace m_CurSourceDef->type with m_nCurrentTextType in UI update handlers

Update CopyProperties to use m_nCurrentTextType instead of m_CurSourceDef

Change OnMatchBrace fallback to read comment syntax from m_nCurrentTextType

Add null safety check to ParseLine legacy fallback

Document m_CurSourceDef as legacy-only (used when m_pSyntaxParser is null)

This further reduces dependency on m_CurSourceDef, confining it to legacy parser fallback scenarios only.
sdottaka added 25 commits June 13, 2026 19:54
…-refactor

# Conflicts:
#	Translations/WinMerge/Portuguese.po
# Conflicts:
#	Externals/crystaledit/editlib/ccrystaltextview.cpp
#	Externals/crystaledit/editlib/ccrystaltextview.h
#	Externals/crystaledit/editlib/editlib.vcxitems.filters
#	Externals/crystaledit/editlib/parsers/html.cpp
#	Src/DiffWrapper.cpp
#	Src/Merge.cpp
#	Src/Merge.vcxproj.filters
#	Src/MergeDoc.cpp
#	Src/MergeEditView.cpp
#	Src/SyntaxParserHelper.cpp
#	Testing/FolderCompare/FolderCompare.vcxproj.filters
#	Testing/GoogleTest/UnitTests/UnitTests.vcxproj.filters
#	Translations/WinMerge/Portuguese.po
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants