Skip to content

Support highlighting Markdown/RST in comments #665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 8, 2025

Conversation

taminomara
Copy link
Contributor

@taminomara taminomara commented Jul 25, 2025

This PR adds an optional feature that highlights Markdown, MySt (a Markdown extension used with Sphinx), and RST syntax in comments and provides Go To Definition implementation for cross-references. This feature is disabled by default; it can be enabled by specifying doc.syntax in .emmyrc.json.

image

Caveats

  • We use Semantic Tokens to provide syntax highlighting. Since there are no Semantic Tokens for bold and italic text, those are not highlighted.

@taminomara
Copy link
Contributor Author

Note: this is a rather large PR, see individual commits to get more digestible diffs.

@CppCXY
Copy link
Member

CppCXY commented Jul 25, 2025

Sorry, but I do not agree with any changes to the AST, especially those that use regular expressions to match and modify the usage of regions.

@taminomara
Copy link
Contributor Author

especially those that use regular expressions to match and modify the usage of regions.

I can redo those without regexes, but might you tell me what exactly you disagree with and why?

  • requiring spaces after region markers,
  • adding new tokens,
  • adding new nodes,
  • the whole concept in general?

@taminomara
Copy link
Contributor Author

...

  • handling of whitespaces in beginning of comments?

@CppCXY
Copy link
Member

CppCXY commented Jul 25, 2025

You just want to highlight the documentation in comments, so you only need to parse the comments once during semantic token request and highlight them in the appropriate places, rather than modifying the entire structure of the syntax tree.

@taminomara
Copy link
Contributor Author

Hm, I see your point. I can implement this. Although there will be complications if we want to support Go To Definition in comments.

I see it like this:

  • make a cache for comment markup, in case if IDE requests semantic tokens twice;
  • in Go To Definition request, if the user clicked inside of the documentation comment, we can check the comment markup cache to see if there's a cross reference in that place;
  • in Document Selection Range request, we can either ignore comments, or use comment markup cache once again.

Would you agree with this implementation?

@CppCXY
Copy link
Member

CppCXY commented Jul 25, 2025

This is a multithreaded language server, and caching introduces data races

@CppCXY
Copy link
Member

CppCXY commented Jul 25, 2025

Even if you re-parse everything, it will still be faster than using a cache.

@taminomara
Copy link
Contributor Author

Comment markup cache doesn't need its own AST, it will only save highlighted ranges.

@taminomara
Copy link
Contributor Author

caching introduces data races

I can make cache async/multi-threaded, that's not a big issue.

Even if you re-parse everything, it will still be faster than using a cache.

That's possible, I'll need to benchmark this.

@taminomara taminomara marked this pull request as draft July 25, 2025 13:02
@taminomara
Copy link
Contributor Author

I'll start with re-parsing on every request, and see if caching is warranted later.

Two more questions:

  1. do you agree with changes in 06cc982? I've struggled with this behavior quite a bit, because no other language server does this.
  2. would you be interested in a separate PR that requires a space/end-of-line after "#region", provided I don't use regexes for that? Current behavior seems a bit too unspecific to me.

@taminomara taminomara force-pushed the doc-hl branch 2 times, most recently from 5cb40f6 to 13a4679 Compare July 29, 2025 17:06
@taminomara taminomara marked this pull request as ready for review July 29, 2025 17:06
@taminomara
Copy link
Contributor Author

taminomara commented Jul 29, 2025

Reworked this PR to avoid changes to AST.

Caching wasn't required, it's fast enough to re-parse docs on every relevant request.

@CppCXY
Copy link
Member

CppCXY commented Jul 30, 2025

When the code block area uses Lua syntax, should we use luaLexer to parse it and highlight keywords, strings, and numbers?

@taminomara
Copy link
Contributor Author

When the code block area uses Lua syntax, should we use luaLexer to parse it and highlight keywords, strings, and numbers?

Done:

image

@taminomara taminomara marked this pull request as draft July 31, 2025 02:03
@CppCXY
Copy link
Member

CppCXY commented Jul 31, 2025

Your changes to error have caused the error location to change.
image

local p

--- @return fun(): (integer?, any?, ...)
local function f(tasks)
end

@CppCXY
Copy link
Member

CppCXY commented Jul 31, 2025

I got error:


thread 'tokio-runtime-worker' panicked at crates\emmylua_parser\src\text\text_range.rs:18:9:
assertion failed: start_offset <= end_offset
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'tokio-runtime-worker' panicked at crates\emmylua_parser\src\text\text_range.rs:18:9:
assertion failed: start_offset <= end_offset

---```lua
---
--- local t = 213
---```
---
--- .. code-block:: lua
---
---    local t = 123
---    yes = 1123
local t = 123

@taminomara taminomara marked this pull request as ready for review July 31, 2025 16:39
@taminomara
Copy link
Contributor Author

Fixed bugs, added more tests, implemented autocomplete for ref targets, improved go to definition to handle modules.

@CppCXY
Copy link
Member

CppCXY commented Aug 1, 2025

image Why does the highlighting look like this?

@taminomara
Copy link
Contributor Author

based on conversations I've had with people, this will be polarising to a lot of people

Just for context, is that because highlighting in comments is distracting for some people, or are there other concerns?

Does it only parse comments with --- or all comments? If not I suggest this is limited to only --- comments.

I've disabled highlighting for any comment that doesn't start with three dashes. It should also detect comment blocks adorned with long dashed lines:

---------------   <- not highlighted
--- Comment       <- highlighted
---------------   <- highlighted
--- Comment       <- highlighted
---------------   <- not highlighted

---------------   <- not highlighted
-- Comment        <- not highlighted
---------------   <- not highlighted
-- Comment 2      <- not highlighted
---------------   <- not highlighted

Can we configure whether to use markdown or rst?

Yes. Also, std will use markdown even if user configured rst.

@lewis6991
Copy link
Collaborator

Just for context, is that because highlighting in comments is distracting for some people, or are there other concerns?

Yes, and I've also had someone say they want treesitter (which has more highlights and is faster) to handle this kind of highlighting, not the server.

@taminomara
Copy link
Contributor Author

Yes, and I've also had someone say they want treesitter (which has more highlights and is faster) to handle this kind of highlighting, not the server.

Then there should be a way to disable highlighting even if emmyrc specifies otherwise. I think I'll do it as a separate PR though.

@taminomara
Copy link
Contributor Author

I ran parsing and completions on my codebase, fixed some minor things. I believe I've done all features I wanted for this PR, it's ready for merge.

In next PR I'll add option to force-disable markdown parsing even if .emmyrc enables it; it will be coupled with updates to VSCode plugin to expose this new option to users.

@clason
Copy link

clason commented Aug 2, 2025

Please don't invent new off-spec options that only VS Code is set up to handle it. It's enough to make this opt-in via .emmrc.json (or the LSP configuration requests), anything beyond that makes it much more (and too) difficult to reason about.

If you want to do anything, split the doc.syntax option into two: one for enabling/disabling it, and one that controls the type of syntax. This way, the latter can be set per-configuration in a local .emmyrc.json while enabling this extension can be done in the client configuration.

I would also (strongly) recommend to name and structure the options in a way to make clear that they relate to semantic tokens. You probably also want to document the (off-spec?) tokens you send for documentation.

@taminomara
Copy link
Contributor Author

If you want to do anything, split the doc.syntax option into two

Do you think it's needed? I like this idea, but I don't have a strong opinion here.

I would also (strongly) recommend to name and structure the options in a way to make clear that they relate to semantic tokens.

It's not just semantic tokens, though, but also hovers, completion, text ranges, and go to definition. Plus, option names must reflect that they affect highlighting markup within descriptions, and don't affect highlighting @tags, types, and so on.

So, this is what I propose then (ideas for better naming are welcome):

  • doc.syntax for overall syntax configuration,
  • semantic_tokens.enableForDocumentationMarkup to enable/disable semantic tokens,
  • not sure if options for other sub-systems are necessary, but we can have hover.enableForDocumentationMarkup and others.

As for defaults, I want to avoid scenario where a user sets doc.syntax and doesn't see anything because semantic_tokens.enableForDocumentationMarkup is disabled. So we can either have doc.syntax = "none" and semantic_tokens.enableForDocumentationMarkup = true, or doc.syntax = "md" and semantic_tokens.enableForDocumentationMarkup = false. IMO, the latter option seems more reasonable.

Please don't invent new off-spec options that only VS Code is set up to handle it.

With settings being more granular this is indeed not necessary. Bear in mind, though, that there's already lots of off-spec things happening in VSCode extension, mostly because (1) VSCode doesn't support overlapping semantic tokens, and (2) VSCode doesn't allow customizing LSP configuration call the same way neovim does. So some changes will have to be made regardless.

You probably also want to document the (off-spec?) tokens you send for documentation.

Actually, no off-spec tokens so far. Maybe I'll add some for italic/bold text, but that will be later).

@clason
Copy link

clason commented Aug 2, 2025

I don't think the "active" LSP features need to be opt-in, as they need to be actively triggered on a documentation location anyway. (It would be different if references also included documentation locations -- which I feel they shouldn't either way.)

I still -- strongly -- recommend the option split, since the syntax format is a project choice while the highlighting is a user choice. This needs to be respected. If this requires making doc semantic tokens opt-out, so be it.

And I understand that the extension needs special support for this; that is fine. I amjust concerned with the server configuration not being driven by that -- everything should be (also) configurable through standard interfaces (config file and ConfigurationChanged notifications).

taminomara added a commit to taminomara/emmylua-analyzer-rust that referenced this pull request Aug 3, 2025
Right now, our LSP only requests settings when client is VSCode or Neovim. To make matters worse, it requests *different* settings for these clients:

- for VSCode, it requests VSCode-specific about files and extensions,
- for Neovim, it requests settings in scope `"Lua"`, and uses them as overrides for `.emmyrc`.

This PR aims to unify this behavior and make it easier for users to override settings without having to create `.emmyrc.json`. This is especially handy when you want to override a setting for a single project, but don't want to change project's `.emmyrc.json`. For example, switching documentation highlighting on per-project basis will become easier (see EmmyLuaLs#665).

# Changes

- When client is VSCode, LSP will request settings under `"emmylua.rc"` scope, as well as VSCode-specific settings mentioned above.

  I'll add `"emmylua.rc"` category to the plugin settings page in a separate PR when this is merged; until then, these changes will not affect user experience.

- When client is Neovim, LSP will request settings under `"emmylua.rc"` scope. If there's no settings under this scope, it will fall back to using the `"Lua"` scope.

  Currently, both LuaLs and EmmyLua use the same scope `"Lua"`. This is good for backwards compatibility and for users switching from one to another, but risks introducing forward incompatibilities in the long run.

  The new option will allow users to configure EmmyLua without affecting LuaLs.

  I'll send a PR to update nvim-lspconfig's documentation for EmmyLua when this is merged.

- For any other client, LSP will request settings under `"emmylua.rc"` scope.

  Currently, there are no clients that provide this scope (EmmyLua2 for Intellij doesn't provide any config scopes), so it shouldn't break anything, but it will provide future client implementations with a better way to override LSP configuration.
taminomara added a commit to taminomara/emmylua-analyzer-rust that referenced this pull request Aug 3, 2025
Right now, our LSP only requests settings when client is VSCode or Neovim. To make matters worse, it requests *different* settings for these clients:

- for VSCode, it requests VSCode-specific about files and extensions,
- for Neovim, it requests settings in scope `"Lua"`, and uses them as overrides for `.emmyrc`.

This PR aims to unify this behavior and make it easier for users to override settings without having to create `.emmyrc.json`. This is especially handy when you want to override a setting for a single project, but don't want to change project's `.emmyrc.json`. For example, switching documentation highlighting on per-project basis will become easier (see discussion in EmmyLuaLs#665).

# Changes

- When client is VSCode, LSP will request settings under `"emmylua"` scope, as well as VSCode-specific settings mentioned above.

  I'll add relevant settings to the plugin settings page in a separate PR when this is merged; until then, these changes will not affect user experience.

- When client is Neovim, LSP will request settings under `"emmylua"` scope. If there's no such scope, it will fall back to using the `"Lua"` scope.

  Currently, both LuaLs and EmmyLua use the same scope `"Lua"`. This is good for backwards compatibility and for users switching from one to another, but risks introducing forward incompatibilities in the long run.

  The new option will allow users to configure EmmyLua without affecting LuaLs.

  I'll send a PR to update nvim-lspconfig's documentation for EmmyLua when this is merged.

- For any other client, LSP will request settings under `"emmylua"` scope.

  Currently, there are no clients that provide this scope (EmmyLua2 for Intellij doesn't provide any config scopes), so it shouldn't break anything, but it will provide future client implementations with a better way to override LSP configuration.
@taminomara
Copy link
Contributor Author

taminomara commented Aug 3, 2025

Added semanticTokens.renderDocumentationMarkup, default is false.

The reason for this name is for consistency with possible future options.

For example, currently we don't provide documentation modifier for semantic tokens related to documentation comments. In future, we might implement this, but we'll need to make it opt-out because at least in VSCode, most themes render documentation comments in horrendously bright colors. Option name renderDocumentationAsComments will sound consistent with renderDocumentationMarkup.

And I understand that the extension needs special support for this; that is fine. I amjust concerned with the server configuration not being driven by that -- everything should be (also) configurable through standard interfaces (config file and ConfigurationChanged notifications).

I've created #680 to address this issue. When it's merged, VSCode and NeoVim (and other editors) will be able to configure things using the same mechanism.

taminomara added a commit to taminomara/emmylua-analyzer-rust that referenced this pull request Aug 3, 2025
Right now, our LSP only requests settings when client is VSCode or NeoVim. To make matters worse, it requests *different* settings for these clients:

- for VSCode, it requests VSCode-specific about files and extensions,
- for NeoVim, it requests settings in scope `"Lua"`, and uses them as overrides for `.emmyrc`.

This PR aims to unify this behavior and make it easier for users to override settings without having to create `.emmyrc.json`. This is especially handy when you want to override a setting for a single project, but don't want to change project's `.emmyrc.json`. For example, switching documentation highlighting on per-project basis will become easier (see discussion in EmmyLuaLs#665).

# Changes

- When client is VSCode, LSP will request settings under `"emmylua"` scope, as well as VSCode-specific settings mentioned above.

  I'll add relevant settings to the plugin settings page in a separate PR when this is merged; until then, these changes will not affect user experience.

- When client is NeoVim, LSP will request settings under `"emmylua"` scope. If there's no such scope, it will fall back to using the `"Lua"` scope.

  Currently, both LuaLs and EmmyLua use the same scope `"Lua"`. It works in NeoVim because options for every LSP server are independent. In VSCode, however, they all share the same namespace, meaning that we can't reuse `"Lua"`. I'd like to keep things consistent, hence this rename.

  I'll send a PR to update nvim-lspconfig's documentation for EmmyLua when this is merged.

- For any other client, LSP will request settings under `"emmylua"` scope.

  Currently, there are no clients that provide this scope (EmmyLua2 for Intellij doesn't provide any config scopes), so it shouldn't break anything, but it will provide future client implementations with a better way to override LSP configuration.

# Tests

Tested this with VSCode and NeoVim.
@CppCXY
Copy link
Member

CppCXY commented Aug 4, 2025

please donot change my local keyword color
now:
image

before:
image

@CppCXY
Copy link
Member

CppCXY commented Aug 6, 2025

please resolve conflicts

@CppCXY
Copy link
Member

CppCXY commented Aug 7, 2025

If everything works well, I will merge it before the release tomorrow, but please resolve the conflicts first.

Simplify reader's implementation, rename methods to be more consistent,
add 1-character lookbehind support, expose unconsumed (tail) range and text.
Fix lifetimes, fix `current_token_range` not being reset.
This will allow parsing code snippets in comments line-by-line.
See more info in PR or merge commit.
@taminomara
Copy link
Contributor Author

Rebased.

Since you want to release tomorrow, I've also added support for bold/italic highlighting via emmy annotator, see also EmmyLua/VSCode-EmmyLua#248.

@CppCXY CppCXY merged commit 66c5874 into EmmyLuaLs:main Aug 8, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants