Add `--fail-on-error` option and fail exports on errors #99254

dbnicholson · 2024-11-14T22:34:02Z

This adds an OS error handler that optionally sets the exit code to EXIT_FAILURE. The option can be enabled with the --fail-on-error CLI flag, but it's also set when exporting a project. The error handling is a little funky to make sure that a script calling get_tree().quit() doesn't inadvertently set the exit code back to EXIT_SUCCESS. I'm also not really sure about the public is_error_occurred/set_error_occurred handlers, but they were necessary for testing.

Fixes: #94957

Calinou · 2024-11-15T18:49:45Z

core/os/os.cpp

Shouldn't this behavior only apply with --fail-on-error? This may break some user expectations otherwise, as the exit code can be specified as an optional parameter to get_tree().quit().

It sorta does. Currently _error_occurred is only set by the error handler when _fail_on_error is also true. So, implicitly _error_occurred also means _fail_on_error. Originally _error_occurred was just a private boolean, but I later added the accessors when writing the test so I could check and restore the value without failing the whole process. So, _error_occurred no longer guarantees _fail_on_error since some other part of Godot could call set_error_occurred(true). So, maybe the conditional should be expanded to check both flags to ensure this only happens with --fail-on-error and that an error was detected.

The whole point of this part is about how users call get_tree().quit(). Since Godot will carry on in the face of script errors, you could have a reasonable quit() or quit(0) in an unrelated location that would be executed and revert the exit code to 0. Here's a contrived example:

extends SceneTree func _init(): var a = Node.new() a.free() var b: Node = a func _process(_delta): quit()

Without the above block, this script would exit 0 even with --fail-on-error. You may be wondering why _error_occurred is needed at all. If this only checked _fail_on_error, then that once quit(1) was called, it couldn't be changed back with quit(0). In other words, _error_occurred is being used as a proxy for "the exit code was changed to a failure because an error was handled". It's trying to maintain flexibility for users calling quit() while preventing them from inadvertently changing it back to 0 since they have no way to know an error occurred.

It would be nice if there was a way to stop the main loop without also setting the exit code. For example, if the quit() default exit code was -1, then users could call a bare quit() and not influence the exit code. Then it would be less likely for this block to be needed since someone would have to explicitly be calling quit(0), and that seems less likely than quit(). Would likely still break someone's usage and it would change the API.

Sorry, that was a big tangent about quit(). I agree that guarding set_exit_code should only happen when fail on error is enabled and pushed a fixup with that change. I can squash it into the first commit if you agree.

I had one more thought here. Since fail_on_error and error_occurred are independent, then the error handler should always set error_occurred and set_error_occurred should check whether fail_on_error is set before changing the exit code. It's a minor change, but that way if something calls OS::set_error_occurred(true), it doesn't fail the process unless fail_on_error is true.

Similarly, if the error handler always sets error_occurred, then you can reliably query it from other places. I could see in the future where the flags are in the bound interface and it would be really useful to check OS.error_occurred in a script to know if there was an error somewhere.

Does that make sense? I pushed a fixup commit with that change.

Calinou

Tested locally on Linux, it works as expected. Code looks good to me.

However, this is always printed when --fail-on-error is used when SceneTree.quit() is called (since its default value is 0):

WARNING: Cannot set exit code to 0 after an error has occurred.
     at: set_exit_code (./core/os/os.cpp:221)

Code used:

func _ready() -> void:
	push_error("Error")
	get_tree().quit()

This does not occur if the engine is exited via other methods, e.g. --quit.

I agree we should readd the OS.exit_code property and add SceneTree.quit(-1) to allow not overriding the exit code, but this should be done in a separate PR since it has a greater impact on the codebase (and potential compatibility quirks).

main/main.cpp

misc/dist/shell/_godot.zsh-completion

misc/dist/shell/godot.fish

dbnicholson · 2024-11-18T21:59:21Z

However, this is always printed when --fail-on-error is used when SceneTree.quit() is called (since its default value is 0):
WARNING: Cannot set exit code to 0 after an error has occurred.
     at: set_exit_code (./core/os/os.cpp:221)
Code used:
func _ready() -> void:
	push_error("Error")
	get_tree().quit()

Should I drop the warning? It was my minor attempt at highlighting to users why quit() may not be doing what they expect, but I'm not sure it's adding any value. I could also change it to a WARN_VERBOSE or put it under DEV_ENABLED or something.

This does not occur if the engine is exited via other methods, e.g. --quit.

It still exit's unsuccessfully with --fail-on-error --quit, though, right?

Calinou · 2024-11-18T22:15:52Z

Should I drop the warning?

Yes, I'd move it to WARN_VERBOSE().

It still exit's unsuccessfully with --fail-on-error --quit, though, right?

Yes, exit code is still 1 as evidenced by calling echo $? afterwards.

dbnicholson · 2024-11-18T22:25:43Z

I think all the comments are addressed. If it looks good to you, I'll squash the fixups.

core/os/os.h

dbnicholson · 2024-12-20T12:05:52Z

@Calinou ping. Anything I can do to help get this merged?

Calinou

Tested locally, it works great now 🙂

Currently the exit code will always be EXIT_SUCCESS so long as nothing explicitly sets it otherwise. That's generally good as you want the engine to report success in the face of non-fatal errors. However, when you want to fix errors, you want the process to exit with a failure when there are errors. Add a boolean `fail_on_error` OS setting that defaults to false to match the current semantics. When set to true, an error handler will set the exit code to EXIT_FAILURE. In order to keep an unknowing component from setting the exit code back to EXIT_SUCCESS, an `error_occurred` flag is also set from the error handler.

This allows setting the OS `fail_on_error` flag in all builds. The primary use case is running it with scripts in CI so that error can be caught without scraping the process output. Co-authored-by: Hugo Locurcio <[email protected]>

Exporting a project with errors should fail the process to help prevent users from publishing broken games.

dbnicholson · 2024-12-21T23:56:36Z

Tested locally, it works great now 🙂

Thanks for testing! I squashed all the fixups now, but there's no change.

akien-mga · 2025-01-14T16:33:24Z

Status update: I want to find some time to review this myself, as the error propagation in Main is pretty tricky and has been the source of multiple bugs in the past. We're entering the beta stage and feature freeze for 4.4 so this would be merged for 4.5 early on most likely, once I get to approve it.

dbnicholson requested review from a team as code owners November 14, 2024 22:34

dbnicholson changed the title ~~Add --fail-on-error option~~ Add --fail-on-error option and fail exports on errors Nov 14, 2024

dbnicholson force-pushed the fail-on-errors branch from 0c80aa2 to e79503d Compare November 14, 2024 23:36

AThousandShips added bug topic:editor topic:export labels Nov 15, 2024

AThousandShips added this to the 4.4 milestone Nov 15, 2024

Calinou reviewed Nov 15, 2024

View reviewed changes

dbnicholson mentioned this pull request Nov 18, 2024

Exit MainLoop script with a custom error code godotengine/godot-proposals#7557

Open

Calinou approved these changes Nov 18, 2024

View reviewed changes