-
Notifications
You must be signed in to change notification settings - Fork 21.2k
core/vm: optimize push2 opcode #31267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Back of the envelope calculation, this will save us between 0.2 and 0.5 ms per block, which is quite significant for such a small change |
core/vm/instructions.go
Outdated
integer = new(uint256.Int) | ||
) | ||
if *pc+2 < codeLen { | ||
scope.Stack.push(integer.SetUint64(uint64(binary.BigEndian.Uint16(scope.Contract.Code[*pc+1 : *pc+3])))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty certain there is a way for uint256
to parse the bytes from scope.Contract.Code
which does not go via binary.BigEndian
. (If there isn't, I feel like there should be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, well there's func (z *Int) SetBytes2(in []byte) *Int
, but performance-wise it's about the same as what you do here. It would maybe look a little bit cleaner:
scope.Stack.push(integer.SetBytes2(scope.contract.Code[*pc+1:]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied, yep thats a cleaner way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: jwasinger <[email protected]>
Did another benchmark, a full block of PUSH2 POP instructions will go from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It's pretty unfortunate that we have to resort to workarounds like this because the Go compiler is too restrictive to allow disabling the inliner for certain functions.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clean up the tests
During my benchmarks on Holesky, around 10% of all CPU time was spent in PUSH2 ``` ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go 16.38s 20.35s (flat, cum) 10.31% of Total 740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) { . . 977: var ( 40ms 40ms 978: codeLen = len(scope.Contract.Code) 970ms 970ms 979: start = min(codeLen, int(*pc+1)) 200ms 200ms 980: end = min(codeLen, start+pushByteSize) . . 981: ) 670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end]) . . 983: . . 984: // Missing bytes: pushByteSize - len(pushData) 410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 { . . 986: a.Lsh(a, uint(8*missing)) . . 987: } 12.69s 14.94s 988: scope.Stack.push2(*a) 10ms 10ms 989: *pc += size 650ms 650ms 990: return nil, nil . . 991: } . . 992:} ``` Which is quite crazy. We have a handwritten encoder for PUSH1 already, this PR adds one for PUSH2. PUSH2 is the second most used opcode as shown here: https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since it is used by solidity quite significantly. Its used ~20 times as much as PUSH20 and PUSH32. # Benchmarks ``` BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op ``` --------- Co-authored-by: jwasinger <[email protected]>
During my benchmarks on Holesky, around 10% of all CPU time was spent in PUSH2 ``` ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go 16.38s 20.35s (flat, cum) 10.31% of Total 740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) { . . 977: var ( 40ms 40ms 978: codeLen = len(scope.Contract.Code) 970ms 970ms 979: start = min(codeLen, int(*pc+1)) 200ms 200ms 980: end = min(codeLen, start+pushByteSize) . . 981: ) 670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end]) . . 983: . . 984: // Missing bytes: pushByteSize - len(pushData) 410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 { . . 986: a.Lsh(a, uint(8*missing)) . . 987: } 12.69s 14.94s 988: scope.Stack.push2(*a) 10ms 10ms 989: *pc += size 650ms 650ms 990: return nil, nil . . 991: } . . 992:} ``` Which is quite crazy. We have a handwritten encoder for PUSH1 already, this PR adds one for PUSH2. PUSH2 is the second most used opcode as shown here: https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since it is used by solidity quite significantly. Its used ~20 times as much as PUSH20 and PUSH32. # Benchmarks ``` BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op ``` --------- Co-authored-by: jwasinger <[email protected]>
During my benchmarks on Holesky, around 10% of all CPU time was spent in PUSH2 ``` ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go 16.38s 20.35s (flat, cum) 10.31% of Total 740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) { . . 977: var ( 40ms 40ms 978: codeLen = len(scope.Contract.Code) 970ms 970ms 979: start = min(codeLen, int(*pc+1)) 200ms 200ms 980: end = min(codeLen, start+pushByteSize) . . 981: ) 670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end]) . . 983: . . 984: // Missing bytes: pushByteSize - len(pushData) 410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 { . . 986: a.Lsh(a, uint(8*missing)) . . 987: } 12.69s 14.94s 988: scope.Stack.push2(*a) 10ms 10ms 989: *pc += size 650ms 650ms 990: return nil, nil . . 991: } . . 992:} ``` Which is quite crazy. We have a handwritten encoder for PUSH1 already, this PR adds one for PUSH2. PUSH2 is the second most used opcode as shown here: https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since it is used by solidity quite significantly. Its used ~20 times as much as PUSH20 and PUSH32. # Benchmarks ``` BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op ``` --------- Co-authored-by: jwasinger <[email protected]>
During my benchmarks on Holesky, around 10% of all CPU time was spent in PUSH2 ``` ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go 16.38s 20.35s (flat, cum) 10.31% of Total 740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) { . . 977: var ( 40ms 40ms 978: codeLen = len(scope.Contract.Code) 970ms 970ms 979: start = min(codeLen, int(*pc+1)) 200ms 200ms 980: end = min(codeLen, start+pushByteSize) . . 981: ) 670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end]) . . 983: . . 984: // Missing bytes: pushByteSize - len(pushData) 410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 { . . 986: a.Lsh(a, uint(8*missing)) . . 987: } 12.69s 14.94s 988: scope.Stack.push2(*a) 10ms 10ms 989: *pc += size 650ms 650ms 990: return nil, nil . . 991: } . . 992:} ``` Which is quite crazy. We have a handwritten encoder for PUSH1 already, this PR adds one for PUSH2. PUSH2 is the second most used opcode as shown here: https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since it is used by solidity quite significantly. Its used ~20 times as much as PUSH20 and PUSH32. # Benchmarks ``` BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op ``` --------- Co-authored-by: jwasinger <[email protected]>
During my benchmarks on Holesky, around 10% of all CPU time was spent in PUSH2
Which is quite crazy. We have a handwritten encoder for PUSH1 already, this PR adds one for PUSH2.
PUSH2 is the second most used opcode as shown here: https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since it is used by solidity quite significantly. Its used ~20 times as much as PUSH20 and PUSH32.
Benchmarks
This PR also adds a fuzzer to sanity check the handwritten opcode