Skip to content

Conversation

MariusVanDerWijden
Copy link
Member

During my benchmarks on Holesky, around 10% of all CPU time was spent in PUSH2

ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
    16.38s     20.35s (flat, cum) 10.31% of Total
     740ms      740ms    976:	return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
         .          .    977:		var (
      40ms       40ms    978:			codeLen = len(scope.Contract.Code)
     970ms      970ms    979:			start   = min(codeLen, int(*pc+1))
     200ms      200ms    980:			end     = min(codeLen, start+pushByteSize)
         .          .    981:		)
     670ms      2.39s    982:		a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
         .          .    983:
         .          .    984:		// Missing bytes: pushByteSize - len(pushData)
     410ms      410ms    985:		if missing := pushByteSize - (end - start); missing > 0 {
         .          .    986:			a.Lsh(a, uint(8*missing))
         .          .    987:		}
    12.69s     14.94s    988:		scope.Stack.push2(*a)
      10ms       10ms    989:		*pc += size
     650ms      650ms    990:		return nil, nil
         .          .    991:	}
         .          .    992:}

Which is quite crazy. We have a handwritten encoder for PUSH1 already, this PR adds one for PUSH2.

PUSH2 is the second most used opcode as shown here: https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since it is used by solidity quite significantly. Its used ~20 times as much as PUSH20 and PUSH32.

Benchmarks

BenchmarkPush/makePush-14         	94196547	        12.27 ns/op	       0 B/op	       0 allocs/op
BenchmarkPush/push-14             	429976924	         2.829 ns/op	       0 B/op	       0 allocs/op

This PR also adds a fuzzer to sanity check the handwritten opcode

@MariusVanDerWijden
Copy link
Member Author

Back of the envelope calculation, this will save us between 0.2 and 0.5 ms per block, which is quite significant for such a small change

integer = new(uint256.Int)
)
if *pc+2 < codeLen {
scope.Stack.push(integer.SetUint64(uint64(binary.BigEndian.Uint16(scope.Contract.Code[*pc+1 : *pc+3]))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty certain there is a way for uint256 to parse the bytes from scope.Contract.Code which does not go via binary.BigEndian. (If there isn't, I feel like there should be)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well there's func (z *Int) SetBytes2(in []byte) *Int , but performance-wise it's about the same as what you do here. It would maybe look a little bit cleaner:

    scope.Stack.push(integer.SetBytes2(scope.contract.Code[*pc+1:]))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied, yep thats a cleaner way

holiman
holiman previously approved these changes Feb 26, 2025
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MariusVanDerWijden
Copy link
Member Author

Did another benchmark, a full block of PUSH2 POP instructions will go from 63745478 ns to 16348237 ns (63ms to 16ms) so a pretty significant improvement

s1na
s1na previously approved these changes Apr 8, 2025
Copy link
Contributor

@s1na s1na left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jwasinger
Copy link
Contributor

jwasinger commented Apr 8, 2025

It's pretty unfortunate that we have to resort to workarounds like this because the Go compiler is too restrictive to allow disabling the inliner for certain functions..

Copy link
Contributor

@fjl fjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clean up the tests

@fjl fjl self-assigned this Apr 8, 2025
@fjl fjl added this to the 1.15.8 milestone Apr 8, 2025
@fjl fjl merged commit 5cc9137 into ethereum:master Apr 8, 2025
3 of 4 checks passed
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
    16.38s     20.35s (flat, cum) 10.31% of Total
     740ms      740ms    976:	return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
         .          .    977:		var (
      40ms       40ms    978:			codeLen = len(scope.Contract.Code)
     970ms      970ms    979:			start   = min(codeLen, int(*pc+1))
     200ms      200ms    980:			end     = min(codeLen, start+pushByteSize)
         .          .    981:		)
     670ms      2.39s    982:		a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
         .          .    983:
         .          .    984:		// Missing bytes: pushByteSize - len(pushData)
     410ms      410ms    985:		if missing := pushByteSize - (end - start); missing > 0 {
         .          .    986:			a.Lsh(a, uint(8*missing))
         .          .    987:		}
    12.69s     14.94s    988:		scope.Stack.push2(*a)
      10ms       10ms    989:		*pc += size
     650ms      650ms    990:		return nil, nil
         .          .    991:	}
         .          .    992:}
```

Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.

PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.

# Benchmarks

```
BenchmarkPush/makePush-14         	94196547	        12.27 ns/op	       0 B/op	       0 allocs/op
BenchmarkPush/push-14             	429976924	         2.829 ns/op	       0 B/op	       0 allocs/op
``` 

---------

Co-authored-by: jwasinger <[email protected]>
sduchesneau pushed a commit to streamingfast/go-ethereum that referenced this pull request May 22, 2025
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
    16.38s     20.35s (flat, cum) 10.31% of Total
     740ms      740ms    976:	return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
         .          .    977:		var (
      40ms       40ms    978:			codeLen = len(scope.Contract.Code)
     970ms      970ms    979:			start   = min(codeLen, int(*pc+1))
     200ms      200ms    980:			end     = min(codeLen, start+pushByteSize)
         .          .    981:		)
     670ms      2.39s    982:		a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
         .          .    983:
         .          .    984:		// Missing bytes: pushByteSize - len(pushData)
     410ms      410ms    985:		if missing := pushByteSize - (end - start); missing > 0 {
         .          .    986:			a.Lsh(a, uint(8*missing))
         .          .    987:		}
    12.69s     14.94s    988:		scope.Stack.push2(*a)
      10ms       10ms    989:		*pc += size
     650ms      650ms    990:		return nil, nil
         .          .    991:	}
         .          .    992:}
```

Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.

PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.

# Benchmarks

```
BenchmarkPush/makePush-14         	94196547	        12.27 ns/op	       0 B/op	       0 allocs/op
BenchmarkPush/push-14             	429976924	         2.829 ns/op	       0 B/op	       0 allocs/op
``` 

---------

Co-authored-by: jwasinger <[email protected]>
jakub-freebit pushed a commit to fblch/go-ethereum that referenced this pull request Jul 3, 2025
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
    16.38s     20.35s (flat, cum) 10.31% of Total
     740ms      740ms    976:	return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
         .          .    977:		var (
      40ms       40ms    978:			codeLen = len(scope.Contract.Code)
     970ms      970ms    979:			start   = min(codeLen, int(*pc+1))
     200ms      200ms    980:			end     = min(codeLen, start+pushByteSize)
         .          .    981:		)
     670ms      2.39s    982:		a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
         .          .    983:
         .          .    984:		// Missing bytes: pushByteSize - len(pushData)
     410ms      410ms    985:		if missing := pushByteSize - (end - start); missing > 0 {
         .          .    986:			a.Lsh(a, uint(8*missing))
         .          .    987:		}
    12.69s     14.94s    988:		scope.Stack.push2(*a)
      10ms       10ms    989:		*pc += size
     650ms      650ms    990:		return nil, nil
         .          .    991:	}
         .          .    992:}
```

Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.

PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.

# Benchmarks

```
BenchmarkPush/makePush-14         	94196547	        12.27 ns/op	       0 B/op	       0 allocs/op
BenchmarkPush/push-14             	429976924	         2.829 ns/op	       0 B/op	       0 allocs/op
``` 

---------

Co-authored-by: jwasinger <[email protected]>
howjmay pushed a commit to iotaledger/go-ethereum that referenced this pull request Aug 27, 2025
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
    16.38s     20.35s (flat, cum) 10.31% of Total
     740ms      740ms    976:	return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
         .          .    977:		var (
      40ms       40ms    978:			codeLen = len(scope.Contract.Code)
     970ms      970ms    979:			start   = min(codeLen, int(*pc+1))
     200ms      200ms    980:			end     = min(codeLen, start+pushByteSize)
         .          .    981:		)
     670ms      2.39s    982:		a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
         .          .    983:
         .          .    984:		// Missing bytes: pushByteSize - len(pushData)
     410ms      410ms    985:		if missing := pushByteSize - (end - start); missing > 0 {
         .          .    986:			a.Lsh(a, uint(8*missing))
         .          .    987:		}
    12.69s     14.94s    988:		scope.Stack.push2(*a)
      10ms       10ms    989:		*pc += size
     650ms      650ms    990:		return nil, nil
         .          .    991:	}
         .          .    992:}
```

Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.

PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.

# Benchmarks

```
BenchmarkPush/makePush-14         	94196547	        12.27 ns/op	       0 B/op	       0 allocs/op
BenchmarkPush/push-14             	429976924	         2.829 ns/op	       0 B/op	       0 allocs/op
``` 

---------

Co-authored-by: jwasinger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants