Skip to content

Parser rewriting #4313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Parser rewriting #4313

wants to merge 3 commits into from

Conversation

Nikita-str
Copy link
Contributor

This Pull Request potentially(if it will be done) closes #4301 & #4089 & #1402 and allow to forget about stack overflow in the parser.

The main idea

The main idea is next:

  • all parse nodes will return ControlFlow with next variants: Done or SubParse{parse_node, continuation_point}
  • there is a main loop that handles ControlFlows
  • instead of calling a sub-parse-node when it necessary a parse node returns corresponding ControlFlow

So parsing will be looks like:

main loop -> parse node A
main loop <- parse node A (cmd: call sub parse node B)
main loop -> parse node B
<... maybe here parse node B calls some sub nodes through main loop ...>
main loop <- parse node B (cmd: Done)
main loop -> parse node A
<... maybe here parse node A calls again sub node B or call another sub parse node C through main loop ...>
main loop <- parse node A (cmd: Done)
main loop -> return last result (parsed `StatementList` for `ScriptBody`)

Some more details & explanation

Simple rewrite case (actually this case can be simplified by adding Pass command to ControlFlow).
Next one:

fn parse(self, cursor: &mut Cursor<R>, interner: &mut Interner) -> ParseResult<Self::Output> {
let tok = cursor.peek(0, interner).or_abrupt()?;
match tok.kind() {
TokenKind::Keyword((Keyword::Function | Keyword::Async | Keyword::Class, _)) => {
HoistableDeclaration::new(self.allow_yield, self.allow_await, false)
.parse(cursor, interner)
}
TokenKind::Keyword((Keyword::Const | Keyword::Let, _)) => {
LexicalDeclaration::new(true, self.allow_yield, self.allow_await, false)
.parse(cursor, interner)
.map(Into::into)
}
_ => Err(Error::expected(
[
Keyword::Function.to_string(),
Keyword::Async.to_string(),
Keyword::Class.to_string(),
Keyword::Const.to_string(),
Keyword::Let.to_string(),
],
tok.to_string(interner),
tok.span(),
"export declaration",
)),
}
}

Transforms to this:
fn parse_loop(&mut self, state: &mut ParseState<'_, R>, continue_point: usize) -> ParseResult<ControlFlow<R>> {
if continue_point == 1 {
parse_cmd![[POP LOCAL]: state => Empty];
parse_cmd![[PEEK NODE]: state match Declaration]; // pass value up
return Ok(ControlFlow::Done)
} else if continue_point > 1 {
return state.continue_point_error(continue_point)
}
let tok = state.cursor.peek(0, state.interner).or_abrupt()?;
match tok.kind() {
TokenKind::Keyword((Keyword::Function | Keyword::Async | Keyword::Class, _)) => {
let node = HoistableDeclaration::new(self.allow_yield, self.allow_await, false);
parse_cmd![[SUB PARSE]: node; state <= Empty (1)];
}
TokenKind::Keyword((Keyword::Const | Keyword::Let, _)) => {
let node = LexicalDeclaration::new(true, self.allow_yield, self.allow_await, false);
parse_cmd![[SUB PARSE]: node; state <= Empty (1)];
}
_ => return Err(Error::expected(
[
Keyword::Function.to_string(),
Keyword::Async.to_string(),
Keyword::Class.to_string(),
Keyword::Const.to_string(),
Keyword::Let.to_string(),
],
tok.to_string(state.interner),
tok.span(),
"export declaration",
)),
}
}

On first call of a parse node the continue_point is equal to 0. So we ignore first if and do all actions as before, except for not going deeper but return from the parse call (by calling parse_cmd![[SUB PARSE]: node]: it change state and return valid ControlFlow) into ParseLoop and then ParseLoop call the sub-parsing node next.


I don't sure -- should I name the continue_points in the function, or stay it like magic numbers? (0, 1, 2, and so on; mostly used only 0 and 1 but each time they have different sense depending on which sub calls were called)


Some small pieces of code will be repeated often but they cannot be moved to a function due to different enum variants, so I moved them into macro_rule parse_cmd!, but I don't sure if such syntax is acceptable, so that's worth to discuss too.


Right now rewritten functions looks like:

if continue_point = 1 {
    <.. continue_point_1 case ..>
} else if continue_point = 2 {
    <.. continue_point_2 case ..>
} else if ... { <...> }

<.. continue_point_0 case (execution path on first call of a parse node) ..>

Maybe it's better to write as next to make the code easier to understand, I don't sure:

if continue_point = 0 {
    ..
} else if continue_point = 1 {
    ..
} else if ... { .. }

⚠️ This approach will make it harder to find sub-parse-node calls because you cannot search where parse called (it will be called from ParseLoop always), to find it you have to see where the Node created.

Why is it a draft PR?

⚠️ This will solve the stackoverflow problem only when all parse nodes are rewritten. Because if any node call sub-parse-node outside of ParseLoop then it should be parsed in old way. Now only 3 parse nodes (out of ~100) have been rewritten to check if there will be any problems with this approach (it seems there will not be any).

When all nodes will be rewritten with parse_loop, the parse function & trait impl will be removed.


I'm expecting some discussion around this approach, and I would prefer to wait for approval before proceeding with rewriting the rest of the nodes, since this looks like a task for several days.

@jasonwilliams
Copy link
Member

jasonwilliams commented Jun 29, 2025

Thank you @Nikita-str for putting the time and effort into writing this up. I support moving to a trampoline style parsing with a loop and state machine. The recursive descent parsing doesn’t suit rust so much (no tail call optimisation) and causes multiple issues.

I understand that statically it’s not as easy to find sub parse nodes, but I hope we can improve debugging but having a debug_log which prints which parser is currently running when a flag is passed (we don’t have this today).

Do you know if this can be worked on incrementally? Or do you plan to just keep updating this? Maybe maintainers can help if they have access, we could have a check list at the top and keep rebasing as we go along (the parser doesn’t beg touched as much these days so I don’t think it will conflict that often).

im interested to hear from the other maintainers.

@jasonwilliams jasonwilliams requested a review from HalidOdat June 29, 2025 08:44
@Nikita-str
Copy link
Contributor Author

this can be worked on incrementally?

I think, incrementally is not an option, because if any node A call sub-parse-node B outside of ParseLoop then the Loop cannot handle any subsequent calls from node B (or otherwise the Loop will return handling on wrong position), and therefore it's parsed in the old way (this is why both parse and parse_loop are now preserved for rewritten nodes). So such changes should be done for all nodes to make sense.


maintainers can help if they have access

That would be good, actually. We can do it in parallel. It's a very easily separable task, and with a few extra hands this can be done in 1-2 days.

@jedel1043
Copy link
Member

I feel like this may be over engineering something that should have a very simple solution.

Seeing the original issue, the parser works fine on release mode, so there must be some set of optimizations that the compiler is doing to reduce the amount of recursion. My suggestion would be that we should try to look at the assembly of a release build, then carefully try to force the compiler to apply the same optimizations in debug mode.

@Nikita-str
Copy link
Contributor Author

My suggestion would be that we should try to look at the assembly of a release build, then carefully try to force the compiler to apply the same optimizations in debug mode.

@jedel1043
The problem here, is that in debug mode (with opt-level = 0) the stack structures take too many memory and it use up the stack too quickly. Also with opt-level = 0 there is no auto function splitting (to make less stack allocation) & no auto passing through some functions (in code A call B then B call C, then with opt-level = 1 (or more) it can be transformed into A call C directly).

To temporarily solve it you even need really splitting the functions (after making some structure size less -- what seems good to me; only structures' size reduction is not enough).

For more details see this PR: first part of commits is seems ok to me, and second part is ugly (because you need splitting the function to solve the problem with opt-level = 0).


But maybe, in current PR you need to do something like the splitting (you need to continue execution from continue_point)... but at least it would not be a temporary solution, but a long-term one.

@jedel1043
Copy link
Member

jedel1043 commented Jun 29, 2025

The problem here, is that in debug mode (with opt-level = 0) the stack structures take too many memory and it use up the stack too quickly. Also with opt-level = 0 there is no auto function splitting (to make less stack allocation) & no auto passing through some functions (in code A call B then B call C, then with opt-level = 1 (or more) it can be transformed into A call C directly).

If the problem is the size of ast nodes, I think we should just stop having ast nodes in the stack, and use something like oxc_allocator to have an arena of all ast nodes. This way we don't need to deal with trampolining functions, and it also gives the nice bonus of making the drop of ast trees really cheap.

@jasonwilliams
Copy link
Member

jasonwilliams commented Jun 29, 2025

The problem here, is that in debug mode (with opt-level = 0) the stack structures take too many memory and it use up the stack too quickly. Also with opt-level = 0 there is no auto function splitting (to make less stack allocation) & no auto passing through some functions (in code A call B then B call C, then with opt-level = 1 (or more) it can be transformed into A call C directly).

If the problem is the size of ast nodes, I think we should just stop having ast nodes in the stack, and use something like oxc_allocator to have an arena of all ast nodes. This way we don't need to deal with trampolining functions, and it also gives the nice bonus of making the drop of ast trees really cheap.

I don’t think having heap allocated nodes would help with performance of the parser as we need to visit them a lot so it would create a new problem with back and forth trips to the heap (this would need to be benchmarked).

The idea of specialized allocation for nodes on the AST also seems odd if we drop the whole tree anyway once we’ve moved onto the VM stage, unless I’m misunderstanding something.

Seeing the original issue, the parser works fine on release mode, so there must be some set of optimizations that the compiler is doing to reduce the amount of recursion

I’m guessing the compiler is aggressively inlining during release mode so we don’t end up with deep stacks. So if we wanted to recreate this in debug mode we would need to add inline(always) in a lot of places across the parser. I can’t say I’m sure if that’s a proper fix for this problem.

I’d be open to seeing how a loop & state machine fairs in performance to what we have today.

@jedel1043
Copy link
Member

I don’t think having heap allocated nodes would help with performance of the parser as we need to visit them a lot so it would create a new problem with back and forth trips to the heap (this would need to be benchmarked).

It's not for performance though. The problem that we have right now is that our AST nodes use too much space in the stack, which gives little room for function calls. Moving every AST node to the heap makes it such that more recursive calls can be made. We would obviously still need to rewrite the parser to be smarter about recursive operators ([[[[[]]]]] and such, maybe it can be done with a pratt parser), but it should be a smaller change than having to convert everything to trampolines.

@jasonwilliams
Copy link
Member

jasonwilliams commented Jun 29, 2025

I don’t think having heap allocated nodes would help with performance of the parser as we need to visit them a lot so it would create a new problem with back and forth trips to the heap (this would need to be benchmarked).

It's not for performance though. The problem that we have right now is that our AST nodes use too much space in the stack, which gives little room for function calls. Moving every AST node to the heap makes it such that more recursive calls can be made. We would obviously still need to rewrite the parser to be smarter about recursive operators ([[[[[]]]]] and such, maybe it can be done with a pratt parser), but it should be a smaller change than having to convert everything to trampolines.

Oh yeah I know the original issue isn’t about performance. I was saying that to move nodes to the heap could end up harming parser performance despite fixing this specific issue. The trade off wouldn’t be worth it.

instead, as you say we would be best finding ways to make the parser smarter around recursive operations. Or have a parser that wasn’t subject to these problems. The answer may be a bit of both.

@jedel1043
Copy link
Member

jedel1043 commented Jun 29, 2025

Oh, yeah I know the original issue isn’t about performance. I was saying that to move nodes to the heap could end up harming parser performance despite fixing this specific issue. The trade off wouldn’t be worth it.

It could also improve performance! My thought is that right now any inner expression needs to be allocated on the heap, which distributes expressions throughout the heap, causing a lot of cache misses when you try to access related expressions. If we use bumpalo, related expressions will live near each other, making multiple accesses very cheap if the expressions fit in the same cache line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parser overflows stack in WASM
3 participants