Skip to content

Fix invalid JSXExpressions having identifier-ish things in their trivia, improve error messages for comma expressions in JSX #31480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 26, 2019

Conversation

andrewbranch
Copy link
Member

I'm not totally sure why that debug assertion was there and specific to identifiers, but removing it only seems to require that the synthetically added identifier get its escapedText set.

Fixes #25487

@andrewbranch andrewbranch requested a review from sheetalkamat May 20, 2019 19:35
@weswigham
Copy link
Member

weswigham commented May 20, 2019

While that may be what's wrong with the node (and what causes a crash when the assertion is removed), based on the PR the assertion was added in ( #21581 ) there absolutely should not be an identifier within a trivia span as far as we know - meaning the node end position is incorrect for the preceding node (since it includes this identifier-scannable span). AFAIK we should only have a stream of tokens (eg, parens or keywords) when scanning over this range.

@andrewbranch
Copy link
Member Author

andrewbranch commented May 20, 2019

I mean, how else would you parse this?

<div>{<div />x}</div>

The parser hits the x when doing a parseExpected(SyntaxKind.CloseBraceToken) and then moves on. The end of the node is set to after x, so when the language service adds synthetic nodes, it scans x as an identifier. I can't see any reason to treat this differently than, say

<div>{<div />""}</div>

which parses the same, but since it's a token, doesn't hit the debug assertion.

This can cause a problem if we scan an Identifier because we don't give it any text here

So I gave it text here ¯\(ツ)

@weswigham
Copy link
Member

weswigham commented May 20, 2019

The parser hits the x when doing a parseExpected(SyntaxKind.CloseBraceToken) and then moves on.

parseExpected only advances if a match was found - the issue is the scanJsxText inside parseJsxExpression that's forcing the cursor forward without assigning the intervening text to a real node (because we do actually parse JSX text out). There's no discussion of it in #4228 (which introduced it) so it's a bit of an exercise in software archaeology.

(Also it looks like we usually more gracefully handle similar parse errors by synthesizing a binary expression with a comma operator - see parseJsxElementOrSelfClosingElementOrFragment)

@andrewbranch
Copy link
Member Author

synthesizing a binary expression with a comma operator

A-ha, this is exactly what I was missing. I can do that.

So as a general rule, what kinds of syntax is the parser allowed to skip over? To me, it feels just as odd for a string literal token to end up in another syntax's trivia as it does for an identifier. But the debug assertion only cares about identifiers. I'm trying to figure out how I could have known, without feedback, that this solution is better than “fixing” synthetic identifiers, which also does seem to solve the problem. Is it that the parser should try as hard as possible to create a node for everything it sees, even when invalid?

@weswigham
Copy link
Member

So as a general rule, what kinds of syntax is the parser allowed to skip over?

Keywords/punctuation, whitespace and comments. (There is still some punctuation we capture for transformer reasons.) Things we don't make real AST nodes for when parsing well-formed code.

But the debug assertion only cares about identifier

I think the assertion is just an early error to warn about the crash on access - finding a string literal or almost anything that would usually have a real AST node in the trivia is an error. The category of "trivia" is primarily just comments and whitespace and is expanded to include unparsed keywords and punctuation in this context.

Is it that the parser should try as hard as possible to create a node for everything it sees, even when invalid?

Yeah, usually I think we just stuff invalid junk we want to consume into a "missing node" which is marked appropriately for incremental reparsing.

I'm trying to figure out how I could have known, without feedback, that this solution is better than “fixing” synthetic identifiers, which also does seem to solve the problem.

Deep software archaeology. When editing code I didn't write, I'm in the habit of looking up the blames of relevant code (in this case, both the PR adding the assertion and the PR adding the scanning in jsx expression parsing) I want to change to find the justification for it being as it is and seeing if the current state is well-reasoned or apparently ad-hoc. In the case of the later, finding and fixing the root cause of the prior ad-hoc change is often able to create higher quality code than stacking more ad-hoc fixes (which is what I'm suggesting here).

@andrewbranch
Copy link
Member Author

🕵 related: #31510

Copy link
Member Author

@andrewbranch andrewbranch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RyanCavanaugh looks like all these horrifying baselines that changed were generated by you. Would you mind taking a look?

@@ -16,5 +16,4 @@ const elem = <div className={class1, class2}/>;
// This should be a parse error
var class1 = "foo";
var class2 = "bar";
var elem = <div className={class1} class2/>;
/>;;
var elem = <div className={class1, class2}/>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This emit is invalid although slightly less so than the previous? We could potentially put these inside a ParenthesizedExpression, but we're already emitting a grammar error, and it seems like there's no requirement for the emit to be totally valid if the input wasn't totally valid.

declare namespace JSX { interface Element { } }

function foo() {
var x = <div> { </div>
~~~
!!! error TS17008: JSX element 'div' has no corresponding closing tag.
~~
!!! error TS1109: Expression expected.
}
// Shouldn't see any errors down here
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't. I think this gets significantly better.

@@ -123,7 +123,7 @@ var x = <div>one</div>, <div>two</div>;
var x = <div>one</div> /* intervening comment */, /* intervening comment */ <div>two</div>;
;
//// [20.jsx]
<a>{"str"}}</a>;
<a>{"str"};}</a>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, but barely weirder than the previous

@@ -28,18 +28,21 @@ var foo = /** @class */ (function () {
return foo;
}());
var x;
x = <any> {test} <any></any> };
x = <any> {test}: <any></any> };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TypeScript generating this was x = <any> { test: <any></any> };. Garbage in, garbage out?

~~~~~~
!!! error TS2695: Left side of comma operator is unused and has no side effects.
~~~~~~~~~~~~~~
!!! error TS18007: JSX expressions may not use the comma operator. Did you mean to write an array?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two errors make a lot more sense to me than the previous three, although I could do without the first. I considered making the error range start on the comma operator instead of at the beginning of the expression. Thoughts?

// Only an AssignmentExpression is valid here per the JSX spec,
// but we can unambiguously parse a comma sequence and provide
// a better error message in grammar checking.
node.expression = parseExpression();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we successfully parse something we know is illegal, is that always a grammar error, or is there ever a reason why that should be a parse error? I know a parse error will prevent the node from being reused incrementally—in this case the tree should be well-formed, so I think reusing it isn’t a problem.

@andrewbranch andrewbranch changed the title Allow language service to parse identifiers in weird places instead of crashing Fix invalid JSXExpressions having identifier-ish things in their trivia, improve error messages for comma expressions in JSX May 21, 2019
@andrewbranch
Copy link
Member Author

Also it looks like we usually more gracefully handle similar parse errors by synthesizing a binary expression with a comma operator - see parseJsxElementOrSelfClosingElementOrFragment

Thinking this was going to be the way forward, I improved error messages for comma expressions as JSX expressions.

However, I don't think it's quite feasible to parse extra stuff in a JSXExpression as the right-hand side of a binary expression, because we have no idea how much extra stuff there might be, what kind of stuff it is, or if the closing } will indeed ever be found. The trick we do for <div /><div /> only works because it’s a special case where a JSX element follows a JSX element—that kind of logic wouldn't work very well if we needed to couple a JSX element and N other subsequent syntaxes.

By conditionally scanning for more JSX text based on whether the expected } was found, we still end up with a pretty gross parse tree, but in <div>{<div />x}</div> the x} becomes JSXText instead of trivia, so the crash is avoided and every token is accounted for.

@andrewbranch andrewbranch requested a review from weswigham June 13, 2019 21:58
@andrewbranch
Copy link
Member Author

@weswigham did you ever have a chance to look at my second iteration of this? I think it’s much more aligned with your initial recommendation.

@andrewbranch andrewbranch merged commit 871bdee into microsoft:master Jun 26, 2019
@andrewbranch andrewbranch deleted the bug/25487 branch June 26, 2019 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Debug Failure. Did not expect JsxExpression to have an Identifier in its trivia
4 participants