Skip to content

feat(clis,filler): Add trace types to allow trace analysis and gas optimizations #1979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

marioevz
Copy link
Member

@marioevz marioevz commented Jul 31, 2025

🗒️ Description

Introduces a new flag to the test filler that allows re-running the specified tests with lower gas limits to find the lowest value which still produces the same test output, result, and the exact same traces.

- Trace Types (efdf63a)

Introduces trace types to use pydantic to parse the traces from the transition tool.

The types accept the formats used in evmone and execution-specs, and account for their slight differences.

They also implement "equivalence" check methods that compare two different traces:

  • The trace lines are compared and always expected to be equal.
  • For each trace line, values including the program counter, the gas cost, the memory size, the stack, depth, refund, opcode name and errors, are compared and expected to be exactly equal.
  • The remaining gas value is ignored with the reasoning that the remaining gas depends on the gas limit, and therefore changing it in the slightest would result in two traces that are impossible to compare.
  • A "post-processing" mode can be included where the top stack element is ignored if the previous opcode is the Op.GAS, which allows more flexibility because most tests only use Op.GAS to use it for a Op.CALL*.

Downsides:

  • If the gas stack value produced by the Op.GAS opcode is not consumed in the very next operation, the value propagates and the post processing is ineffective, but this should not be a big issue since most tests that use the gas opcode will consume it immediately on the next CALL* opcode.

- Gas optimization in state tests (7b1d0bd)

Implements a procedure to find the minimum gas-limit value where the test still works and yields the same result and traces.

The algorithm works as follows:

  • Execute the test with the current gas limit specified in the test.
  • Reduce the gas limit by one, re-execute the test: if the runs are still equivalent proceed, otherwise abort and deem the test impossible to lower its gas.
  • Perform a binary search by lowering the gas by half, re-executing the test, if the test fails set the new minimum to the last gas limit value plus 1, if the test passes set the new maximum to the last gas limit value.
  • Iterate until the difference between the minimum and maximum is zero, or the minimum is higher than 16,777,216

- Fill command's flags to enable and configure optimize-gas feature (d178bdb)

Flags --optimize-gas, --optimize-gas-output and --optimize-gas-post-processing are added to the fill command to enable and configure this new feature.

- modify_static_test_gas_limits.py command (dc4c5fc)

It's a simple command to take the output from the fill's optimize-gas mode and apply it to static tests.

The command has a dry-run mode and aborts if the static test contains more than one gas value, in order to not override incorrect values.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

@marioevz marioevz added type:feat type: Feature scope:fill Scope: fill command labels Jul 31, 2025
@marioevz marioevz requested a review from spencer-tb July 31, 2025 21:23
@@ -78,6 +78,7 @@ Users can select any of the artifacts depending on their testing needs for their
- 🔀 Disabled writing debugging information to the EVM "dump directory" to improve performance. To obtain debug output, the `--evm-dump-dir` flag must now be explicitly set. As a consequence, the now redundant `--skip-evm-dump` option was removed ([#1874](https://github.com/ethereum/execution-spec-tests/pull/1874)).
- ✨ Generate unique addresses with Python for compatible static tests, instead of using hard-coded addresses from legacy static test fillers ([#1781](https://github.com/ethereum/execution-spec-tests/pull/1781)).
- ✨ Added support for the `--benchmark-gas-values` flag in the `fill` command, allowing a single genesis file to be used across different gas limit settings when generating fixtures. ([#1895](https://github.com/ethereum/execution-spec-tests/pull/1895)).
- ✨ Added `--optimize-gas` flag that allows to binary search the minimum gas limit value for a transaction in a test that still yields the same test result ([#1979](https://github.com/ethereum/execution-spec-tests/pull/1979)).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR you wrote that the flags --optimize-gas, --optimize-gas-output and --optimize-gas-post-processing have been added. So they all should be mentioned in the changelog


## Post-Processing Mode

Enable post-processing to handle opcodes that put the current gas in the stack (like `GAS` opcode):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"opcodes that put the current gas in the stack (like GAS opcode)"

it would be better if you would put the full list of opcodes that are relevant here. it is also unclear to me what exactly this flag does (it "handles opcodes", but what does that mean). it's also unclear whether this is mandatory or not, because in a later paragraph you refer to it as "optional post-processing". is it even optional when GAS opcode is used? i feel like this flag maybe does not have to exist, can you not scan the code for opcodes and then dynamically toggle this when certain opcodes are found? less flags make this gas optimization feature easier to use

@felix314159
Copy link
Collaborator

Can you make it so that it logs into the terminal which optimize-gas-output file has been created? E.g. if i run uv run fill --optimize-gas ./tests/static/state_tests/stRandom/randomStatetest137Filler.json --clean --fill-static-tests --evm-bin=evmone-t8n i get no indication in the terminal that --optimize-gas has done anything. It would be better if above or below the Generated html report line in similar fashion would show sth like Generated optimize-gas-output.json (or whatever name was specified): <path to file>

## Limitations

- Only works with state tests (not blockchain tests)
- Requires trace collection to be enabled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading this I thought that the command would fail if I don't provide --trace but that does not seem to be the case, so it seems to add that automatically?

@felix314159
Copy link
Collaborator

felix314159 commented Aug 6, 2025

Maybe I missed sth but from the output of the resulting optimize-gas-output.json file I can't tell whether the test is currently using the optimal gas limit or not. Is this intended? You probably then also have a separate script (not in this PR) which runs over this json file, compares it with the actual gas limit currently used in the test and then updates the test to use the determined minimal value? Maybe this question will be answered when im done with this PR and read through 1980

@felix314159
Copy link
Collaborator

When I run uv run fill --optimize-gas ./tests/static/state_tests/stRandom/randomStatetest135Filler.json --clean --fill-static-tests --evm-bin=evmone-t8n I get 2 failed with Exception: Impossible to compare. But from the error message alone it not clear what caused this problem. We should adjust the exception message so that it is clear that the error only occurred because --optimize-gas was toggled

Copy link
Collaborator

@felix314159 felix314159 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great future with a few rough edges, ty for taking the time to make this and sorry for the email spam :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:fill Scope: fill command type:feat type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants