Skip to content

perf(evm): bypass virtual stack and cache InterpreterExecContext for …#419

Open
starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
starwarfan:perf/bypass-virtual-stack
Open

perf(evm): bypass virtual stack and cache InterpreterExecContext for …#419
starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
starwarfan:perf/bypass-virtual-stack

Conversation

@starwarfan
Copy link
Copy Markdown
Contributor

…interpreter mode

Interpreter mode manages call depth via InterpreterExecContext::FrameStack and never emits native code, so it does not need the virtual stack's mprotect/setjmp/longjmp round-trip that was designed for JIT stack guard pages. Skip it entirely when Mode == InterpMode.

Additionally, cache the InterpreterExecContext in a thread-local for top-level calls (depth == 0) to avoid re-allocating the ~33 KB EVMFrame on every callEVMInInterpMode invocation. Nested calls (CALL/CREATE re-entering via Host->call()) still create a fresh context for safety.

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

  • N
  • Y

2. What is the scope of this PR (e.g. component or file name):

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

  • Affects user behaviors
  • Contains CI/CD configuration changes
  • Contains documentation changes
  • Contains experimental features
  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Other

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

  • N
  • Y

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

  • Unit test
  • Integration test
  • Benchmark (add benchmark stats below)
  • Manual test (add detailed scripts or steps below)
  • Other

6. Release note

None

…interpreter mode

Interpreter mode manages call depth via InterpreterExecContext::FrameStack
and never emits native code, so it does not need the virtual stack's
mprotect/setjmp/longjmp round-trip that was designed for JIT stack guard
pages. Skip it entirely when Mode == InterpMode.

Additionally, cache the InterpreterExecContext in a thread-local for
top-level calls (depth == 0) to avoid re-allocating the ~33 KB EVMFrame
on every callEVMInInterpMode invocation. Nested calls (CALL/CREATE
re-entering via Host->call()) still create a fresh context for safety.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@github-actions
Copy link
Copy Markdown

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 1.35 1.33 -1.8% PASS
total/main/blake2b_huff/empty 0.02 0.02 +0.1% PASS
total/main/blake2b_shifts/8415nulls 9.53 9.64 +1.2% PASS
total/main/sha1_divs/5311 4.33 4.27 -1.3% PASS
total/main/sha1_divs/empty 0.05 0.05 -2.1% PASS
total/main/sha1_shifts/5311 2.27 2.26 -0.6% PASS
total/main/sha1_shifts/empty 0.03 0.03 -1.1% PASS
total/main/snailtracer/benchmark 47.46 47.60 +0.3% PASS
total/main/structarray_alloc/nfts_rank 0.92 0.91 -0.6% PASS
total/main/swap_math/insufficient_liquidity 0.00 0.00 -0.2% PASS
total/main/swap_math/received 0.00 0.00 -1.3% PASS
total/main/swap_math/spent 0.00 0.00 +1.2% PASS
total/main/weierstrudel/1 0.23 0.23 -0.1% PASS
total/main/weierstrudel/15 2.49 2.48 -0.5% PASS
total/micro/JUMPDEST_n0/empty 1.22 1.51 +23.7% PASS
total/micro/jump_around/empty 0.04 0.04 -0.5% PASS
total/micro/loop_with_many_jumpdests/empty 18.46 23.03 +24.7% PASS
total/micro/memory_grow_mload/by1 0.08 0.08 -2.1% PASS
total/micro/memory_grow_mload/by16 0.10 0.09 -2.0% PASS
total/micro/memory_grow_mload/by32 0.10 0.11 +3.2% PASS
total/micro/memory_grow_mload/nogrow 0.08 0.08 -5.1% PASS
total/micro/memory_grow_mstore/by1 0.09 0.09 -0.7% PASS
total/micro/memory_grow_mstore/by16 0.10 0.10 -1.5% PASS
total/micro/memory_grow_mstore/by32 0.11 0.11 -1.6% PASS
total/micro/memory_grow_mstore/nogrow 0.08 0.08 -1.4% PASS
total/micro/signextend/one 0.23 0.23 -0.6% PASS
total/micro/signextend/zero 0.23 0.22 -5.6% PASS
total/synth/ADD/b0 1.62 1.33 -18.1% PASS
total/synth/ADD/b1 1.23 1.19 -3.1% PASS
total/synth/ADDRESS/a0 7.68 6.00 -21.9% PASS
total/synth/ADDRESS/a1 7.92 6.14 -22.6% PASS
total/synth/AND/b0 1.56 1.26 -18.8% PASS
total/synth/AND/b1 1.08 1.05 -2.6% PASS
total/synth/BYTE/b0 4.64 4.60 -0.8% PASS
total/synth/BYTE/b1 3.71 3.67 -1.3% PASS
total/synth/CALLDATASIZE/a0 4.66 2.83 -39.3% PASS
total/synth/CALLDATASIZE/a1 4.90 2.95 -39.8% PASS
total/synth/CALLER/a0 7.72 6.01 -22.2% PASS
total/synth/CALLER/a1 7.92 6.14 -22.4% PASS
total/synth/CALLVALUE/a0 3.08 2.89 -6.4% PASS
total/synth/CALLVALUE/a1 3.09 2.89 -6.4% PASS
total/synth/CODESIZE/a0 4.88 3.37 -31.0% PASS
total/synth/CODESIZE/a1 5.22 3.39 -35.0% PASS
total/synth/DUP1/d0 1.07 0.71 -33.4% PASS
total/synth/DUP1/d1 0.84 0.65 -21.9% PASS
total/synth/DUP10/d0 0.85 0.72 -15.7% PASS
total/synth/DUP10/d1 0.84 0.61 -26.5% PASS
total/synth/DUP11/d0 0.86 0.72 -15.6% PASS
total/synth/DUP11/d1 0.84 0.62 -26.5% PASS
total/synth/DUP12/d0 0.84 0.71 -16.0% PASS
total/synth/DUP12/d1 0.84 0.62 -26.5% PASS
total/synth/DUP13/d0 0.85 0.71 -16.0% PASS
total/synth/DUP13/d1 0.84 0.61 -26.6% PASS
total/synth/DUP14/d0 0.86 0.71 -17.3% PASS
total/synth/DUP14/d1 0.84 0.62 -26.5% PASS
total/synth/DUP15/d0 0.86 0.71 -16.9% PASS
total/synth/DUP15/d1 0.84 0.61 -26.6% PASS
total/synth/DUP16/d0 0.85 0.72 -16.3% PASS
total/synth/DUP16/d1 0.84 0.61 -26.6% PASS
total/synth/DUP2/d0 0.92 0.72 -21.5% PASS
total/synth/DUP2/d1 0.84 0.62 -26.5% PASS
total/synth/DUP3/d0 0.83 0.71 -14.4% PASS
total/synth/DUP3/d1 0.84 0.61 -26.6% PASS
total/synth/DUP4/d0 0.85 0.71 -16.4% PASS
total/synth/DUP4/d1 0.84 0.62 -26.5% PASS
total/synth/DUP5/d0 0.85 0.71 -16.5% PASS
total/synth/DUP5/d1 0.84 0.62 -26.6% PASS
total/synth/DUP6/d0 0.84 0.71 -16.2% PASS
total/synth/DUP6/d1 0.84 0.61 -26.5% PASS
total/synth/DUP7/d0 0.85 0.71 -16.0% PASS
total/synth/DUP7/d1 0.84 0.62 -26.6% PASS
total/synth/DUP8/d0 0.85 0.71 -15.6% PASS
total/synth/DUP8/d1 0.84 0.62 -26.5% PASS
total/synth/DUP9/d0 0.85 0.71 -16.9% PASS
total/synth/DUP9/d1 0.84 0.62 -26.6% PASS
total/synth/EQ/b0 2.32 2.28 -1.4% PASS
total/synth/EQ/b1 1.28 1.26 -1.3% PASS
total/synth/GAS/a0 4.89 3.22 -34.1% PASS
total/synth/GAS/a1 5.21 3.28 -37.1% PASS
total/synth/GT/b0 2.20 2.20 -0.0% PASS
total/synth/GT/b1 1.00 0.98 -2.0% PASS
total/synth/ISZERO/u0 1.23 0.76 -37.8% PASS
total/synth/JUMPDEST/n0 1.22 1.51 +23.9% PASS
total/synth/LT/b0 2.18 2.17 -0.5% PASS
total/synth/LT/b1 0.97 0.97 -0.2% PASS
total/synth/MSIZE/a0 5.86 4.14 -29.4% PASS
total/synth/MSIZE/a1 6.18 4.31 -30.2% PASS
total/synth/MUL/b0 4.43 4.40 -0.5% PASS
total/synth/MUL/b1 4.78 4.64 -3.0% PASS
total/synth/NOT/u0 1.40 1.09 -22.2% PASS
total/synth/OR/b0 1.55 1.25 -19.3% PASS
total/synth/OR/b1 1.09 1.05 -3.4% PASS
total/synth/PC/a0 4.66 2.80 -40.0% PASS
total/synth/PC/a1 4.91 2.95 -39.9% PASS
total/synth/PUSH1/p0 1.11 0.88 -21.2% PASS
total/synth/PUSH1/p1 1.02 0.87 -14.9% PASS
total/synth/PUSH10/p0 1.12 0.89 -20.7% PASS
total/synth/PUSH10/p1 1.05 0.91 -13.8% PASS
total/synth/PUSH11/p0 1.12 0.89 -21.1% PASS
total/synth/PUSH11/p1 1.05 0.89 -14.8% PASS
total/synth/PUSH12/p0 1.13 0.89 -21.4% PASS
total/synth/PUSH12/p1 1.05 0.91 -13.4% PASS
total/synth/PUSH13/p0 1.12 0.89 -20.6% PASS
total/synth/PUSH13/p1 1.05 0.90 -14.3% PASS
total/synth/PUSH14/p0 1.11 0.88 -21.1% PASS
total/synth/PUSH14/p1 1.05 0.90 -14.1% PASS
total/synth/PUSH15/p0 1.13 0.89 -20.9% PASS
total/synth/PUSH15/p1 1.04 0.91 -12.2% PASS
total/synth/PUSH16/p0 1.13 0.89 -21.0% PASS
total/synth/PUSH16/p1 1.06 0.91 -14.5% PASS
total/synth/PUSH17/p0 1.13 0.89 -20.8% PASS
total/synth/PUSH17/p1 1.06 0.96 -9.6% PASS
total/synth/PUSH18/p0 1.12 0.89 -20.7% PASS
total/synth/PUSH18/p1 1.06 0.91 -14.2% PASS
total/synth/PUSH19/p0 1.13 0.89 -20.9% PASS
total/synth/PUSH19/p1 1.06 0.91 -14.4% PASS
total/synth/PUSH2/p0 1.11 0.87 -21.5% PASS
total/synth/PUSH2/p1 1.03 0.88 -14.7% PASS
total/synth/PUSH20/p0 1.13 0.89 -21.3% PASS
total/synth/PUSH20/p1 1.06 0.91 -14.1% PASS
total/synth/PUSH21/p0 1.13 0.90 -21.1% PASS
total/synth/PUSH21/p1 1.07 0.92 -14.0% PASS
total/synth/PUSH22/p0 1.12 0.89 -20.7% PASS
total/synth/PUSH22/p1 1.07 0.92 -13.8% PASS
total/synth/PUSH23/p0 1.15 0.90 -22.2% PASS
total/synth/PUSH23/p1 1.07 0.92 -14.6% PASS
total/synth/PUSH24/p0 1.14 0.90 -21.0% PASS
total/synth/PUSH24/p1 1.08 0.93 -14.0% PASS
total/synth/PUSH25/p0 1.14 0.90 -21.3% PASS
total/synth/PUSH25/p1 1.08 0.93 -13.6% PASS
total/synth/PUSH26/p0 1.13 0.90 -20.5% PASS
total/synth/PUSH26/p1 1.08 0.94 -13.3% PASS
total/synth/PUSH27/p0 1.14 0.90 -21.1% PASS
total/synth/PUSH27/p1 1.09 0.94 -13.2% PASS
total/synth/PUSH28/p0 1.14 0.90 -21.2% PASS
total/synth/PUSH28/p1 1.08 0.93 -13.6% PASS
total/synth/PUSH29/p0 1.14 0.90 -20.9% PASS
total/synth/PUSH29/p1 1.10 0.95 -13.6% PASS
total/synth/PUSH3/p0 1.14 0.88 -22.7% PASS
total/synth/PUSH3/p1 1.04 0.88 -14.7% PASS
total/synth/PUSH30/p0 1.12 0.89 -21.0% PASS
total/synth/PUSH30/p1 1.09 0.94 -13.9% PASS
total/synth/PUSH31/p0 1.14 0.91 -20.6% PASS
total/synth/PUSH31/p1 1.09 1.05 -3.7% PASS
total/synth/PUSH32/p0 1.14 0.90 -21.0% PASS
total/synth/PUSH32/p1 1.11 0.95 -14.8% PASS
total/synth/PUSH4/p0 1.12 0.88 -21.5% PASS
total/synth/PUSH4/p1 1.04 0.90 -13.7% PASS
total/synth/PUSH5/p0 1.12 0.88 -21.2% PASS
total/synth/PUSH5/p1 1.04 0.95 -9.0% PASS
total/synth/PUSH6/p0 1.10 0.88 -20.0% PASS
total/synth/PUSH6/p1 1.04 0.90 -14.0% PASS
total/synth/PUSH7/p0 1.12 0.88 -21.0% PASS
total/synth/PUSH7/p1 1.03 0.88 -14.9% PASS
total/synth/PUSH8/p0 1.12 0.89 -20.8% PASS
total/synth/PUSH8/p1 1.05 0.90 -13.9% PASS
total/synth/PUSH9/p0 1.13 0.88 -21.5% PASS
total/synth/PUSH9/p1 1.05 0.90 -14.4% PASS
total/synth/RETURNDATASIZE/a0 4.86 3.46 -28.8% PASS
total/synth/RETURNDATASIZE/a1 5.18 3.47 -33.0% PASS
total/synth/SAR/b0 3.78 3.81 +0.6% PASS
total/synth/SAR/b1 3.95 3.93 -0.3% PASS
total/synth/SGT/b0 2.21 2.20 -0.1% PASS
total/synth/SGT/b1 1.14 0.96 -16.0% PASS
total/synth/SHL/b0 2.79 2.76 -0.8% PASS
total/synth/SHL/b1 1.29 1.28 -0.8% PASS
total/synth/SHR/b0 2.83 2.79 -1.4% PASS
total/synth/SHR/b1 1.26 1.25 -0.8% PASS
total/synth/SIGNEXTEND/b0 3.22 3.03 -6.0% PASS
total/synth/SIGNEXTEND/b1 2.96 3.09 +4.2% PASS
total/synth/SLT/b0 2.19 2.20 +0.7% PASS
total/synth/SLT/b1 1.14 1.03 -9.3% PASS
total/synth/SUB/b0 1.61 1.34 -16.9% PASS
total/synth/SUB/b1 1.23 1.20 -2.5% PASS
total/synth/SWAP1/s0 1.85 1.08 -41.9% PASS
total/synth/SWAP10/s0 1.24 1.08 -13.1% PASS
total/synth/SWAP11/s0 1.25 1.08 -13.5% PASS
total/synth/SWAP12/s0 1.24 1.08 -13.0% PASS
total/synth/SWAP13/s0 1.25 1.08 -13.5% PASS
total/synth/SWAP14/s0 1.25 1.09 -12.9% PASS
total/synth/SWAP15/s0 1.24 1.08 -12.8% PASS
total/synth/SWAP16/s0 1.23 1.08 -12.4% PASS
total/synth/SWAP2/s0 1.39 1.08 -22.5% PASS
total/synth/SWAP3/s0 1.24 1.08 -13.3% PASS
total/synth/SWAP4/s0 1.24 1.08 -12.9% PASS
total/synth/SWAP5/s0 1.24 1.07 -14.2% PASS
total/synth/SWAP6/s0 1.24 1.08 -12.9% PASS
total/synth/SWAP7/s0 1.25 1.08 -13.9% PASS
total/synth/SWAP8/s0 1.25 1.08 -13.4% PASS
total/synth/SWAP9/s0 1.25 1.08 -13.6% PASS
total/synth/XOR/b0 1.31 1.00 -24.0% PASS
total/synth/XOR/b1 1.04 1.01 -2.9% PASS
total/synth/loop_v1 3.63 3.61 -0.7% PASS
total/synth/loop_v2 3.64 3.63 -0.4% PASS

Summary: 194 benchmarks, 0 regressions


✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 1.53 1.53 -0.0% PASS
total/main/blake2b_huff/empty 0.07 0.07 -6.5% PASS
total/main/blake2b_shifts/8415nulls 6.41 6.38 -0.6% PASS
total/main/sha1_divs/5311 3.53 3.50 -0.8% PASS
total/main/sha1_divs/empty 0.05 0.05 -1.1% PASS
total/main/sha1_shifts/5311 3.71 3.71 +0.0% PASS
total/main/sha1_shifts/empty 0.05 0.05 -0.6% PASS
total/main/snailtracer/benchmark 57.83 53.52 -7.5% PASS
total/main/structarray_alloc/nfts_rank 0.31 0.30 -1.4% PASS
total/main/swap_math/insufficient_liquidity 0.02 0.02 -1.7% PASS
total/main/swap_math/received 0.02 0.02 -0.9% PASS
total/main/swap_math/spent 0.02 0.02 -1.5% PASS
total/main/weierstrudel/1 0.37 0.35 -3.5% PASS
total/main/weierstrudel/15 3.25 3.21 -1.1% PASS
total/micro/JUMPDEST_n0/empty 0.13 0.13 -0.4% PASS
total/micro/jump_around/empty 0.62 0.62 +0.0% PASS
total/micro/loop_with_many_jumpdests/empty 1.96 1.96 -0.3% PASS
total/micro/memory_grow_mload/by1 0.19 0.18 -3.2% PASS
total/micro/memory_grow_mload/by16 0.21 0.19 -6.8% PASS
total/micro/memory_grow_mload/by32 0.22 0.21 -3.8% PASS
total/micro/memory_grow_mload/nogrow 0.19 0.17 -6.4% PASS
total/micro/memory_grow_mstore/by1 0.19 0.18 -5.7% PASS
total/micro/memory_grow_mstore/by16 0.20 0.20 -3.3% PASS
total/micro/memory_grow_mstore/by32 0.22 0.21 -2.5% PASS
total/micro/memory_grow_mstore/nogrow 0.19 0.18 -5.4% PASS
total/micro/signextend/one 0.35 0.34 -2.6% PASS
total/micro/signextend/zero 0.35 0.34 -2.9% PASS
total/synth/ADD/b0 0.01 0.01 -2.9% PASS
total/synth/ADD/b1 0.01 0.01 -2.8% PASS
total/synth/ADDRESS/a0 0.16 0.16 -0.4% PASS
total/synth/ADDRESS/a1 0.16 0.16 -0.4% PASS
total/synth/AND/b0 0.01 0.01 -2.9% PASS
total/synth/AND/b1 0.01 0.01 -2.8% PASS
total/synth/BYTE/b0 1.95 1.95 -0.0% PASS
total/synth/BYTE/b1 2.32 2.29 -1.4% PASS
total/synth/CALLDATASIZE/a0 0.08 0.08 -1.1% PASS
total/synth/CALLDATASIZE/a1 0.08 0.08 -1.3% PASS
total/synth/CALLER/a0 0.16 0.16 -0.4% PASS
total/synth/CALLER/a1 0.16 0.16 -0.4% PASS
total/synth/CALLVALUE/a0 0.28 0.28 +0.1% PASS
total/synth/CALLVALUE/a1 0.28 0.28 -0.0% PASS
total/synth/CODESIZE/a0 0.08 0.08 -0.7% PASS
total/synth/CODESIZE/a1 0.08 0.08 -0.8% PASS
total/synth/DUP1/d0 0.01 0.01 -2.8% PASS
total/synth/DUP1/d1 0.01 0.01 -2.8% PASS
total/synth/DUP10/d0 0.01 0.01 -3.0% PASS
total/synth/DUP10/d1 0.01 0.01 -3.1% PASS
total/synth/DUP11/d0 0.01 0.01 -2.9% PASS
total/synth/DUP11/d1 0.01 0.01 -2.9% PASS
total/synth/DUP12/d0 0.01 0.01 -2.8% PASS
total/synth/DUP12/d1 0.01 0.01 -2.9% PASS
total/synth/DUP13/d0 0.01 0.01 -2.9% PASS
total/synth/DUP13/d1 0.01 0.01 -2.7% PASS
total/synth/DUP14/d0 0.01 0.01 -2.8% PASS
total/synth/DUP14/d1 0.01 0.01 -2.9% PASS
total/synth/DUP15/d0 0.01 0.01 -2.9% PASS
total/synth/DUP15/d1 0.01 0.01 -2.8% PASS
total/synth/DUP16/d0 0.01 0.01 -2.9% PASS
total/synth/DUP16/d1 0.01 0.01 -2.8% PASS
total/synth/DUP2/d0 0.01 0.01 -2.9% PASS
total/synth/DUP2/d1 0.01 0.01 -2.8% PASS
total/synth/DUP3/d0 0.01 0.01 -2.9% PASS
total/synth/DUP3/d1 0.01 0.01 -2.9% PASS
total/synth/DUP4/d0 0.01 0.01 -2.8% PASS
total/synth/DUP4/d1 0.01 0.01 -2.9% PASS
total/synth/DUP5/d0 0.01 0.01 -2.9% PASS
total/synth/DUP5/d1 0.01 0.01 -2.8% PASS
total/synth/DUP6/d0 0.01 0.01 -2.8% PASS
total/synth/DUP6/d1 0.01 0.01 -2.8% PASS
total/synth/DUP7/d0 0.01 0.01 -2.8% PASS
total/synth/DUP7/d1 0.01 0.01 -2.9% PASS
total/synth/DUP8/d0 0.01 0.01 -2.8% PASS
total/synth/DUP8/d1 0.01 0.01 -2.8% PASS
total/synth/DUP9/d0 0.01 0.01 -2.6% PASS
total/synth/DUP9/d1 0.01 0.01 -2.8% PASS
total/synth/EQ/b0 0.01 0.01 -3.2% PASS
total/synth/EQ/b1 0.01 0.01 -2.8% PASS
total/synth/GAS/a0 0.79 0.79 -0.1% PASS
total/synth/GAS/a1 0.79 0.79 -0.0% PASS
total/synth/GT/b0 0.01 0.01 -2.8% PASS
total/synth/GT/b1 0.01 0.01 -2.6% PASS
total/synth/ISZERO/u0 0.01 0.01 -5.4% PASS
total/synth/JUMPDEST/n0 0.13 0.13 -0.7% PASS
total/synth/LT/b0 0.01 0.01 -2.9% PASS
total/synth/LT/b1 0.01 0.01 -2.9% PASS
total/synth/MSIZE/a0 0.01 0.01 -5.4% PASS
total/synth/MSIZE/a1 0.01 0.01 -5.3% PASS
total/synth/MUL/b0 0.01 0.01 -2.9% PASS
total/synth/MUL/b1 0.01 0.01 -2.9% PASS
total/synth/NOT/u0 0.01 0.01 -5.4% PASS
total/synth/OR/b0 0.01 0.01 -2.9% PASS
total/synth/OR/b1 0.01 0.01 -2.9% PASS
total/synth/PC/a0 0.01 0.01 -5.4% PASS
total/synth/PC/a1 0.01 0.01 -5.4% PASS
total/synth/PUSH1/p0 0.01 0.01 -2.5% PASS
total/synth/PUSH1/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH10/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH10/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH11/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH11/p1 0.01 0.01 -2.7% PASS
total/synth/PUSH12/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH12/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH13/p0 0.01 0.01 -2.7% PASS
total/synth/PUSH13/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH14/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH14/p1 0.01 0.01 -2.7% PASS
total/synth/PUSH15/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH15/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH16/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH16/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH17/p0 0.01 0.01 -2.8% PASS
total/synth/PUSH17/p1 0.01 0.01 -2.7% PASS
total/synth/PUSH18/p0 0.01 0.01 -2.7% PASS
total/synth/PUSH18/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH19/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH19/p1 0.01 0.01 -2.7% PASS
total/synth/PUSH2/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH2/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH20/p0 0.01 0.01 -2.8% PASS
total/synth/PUSH20/p1 0.01 0.01 -3.4% PASS
total/synth/PUSH21/p0 0.01 0.01 -3.2% PASS
total/synth/PUSH21/p1 0.01 0.01 -2.5% PASS
total/synth/PUSH22/p0 1.32 1.08 -18.4% PASS
total/synth/PUSH22/p1 1.17 1.24 +5.6% PASS
total/synth/PUSH23/p0 1.23 1.08 -12.7% PASS
total/synth/PUSH23/p1 1.14 1.23 +7.8% PASS
total/synth/PUSH24/p0 1.30 1.08 -17.4% PASS
total/synth/PUSH24/p1 1.16 1.23 +6.4% PASS
total/synth/PUSH25/p0 1.32 1.08 -18.6% PASS
total/synth/PUSH25/p1 1.14 1.21 +6.8% PASS
total/synth/PUSH26/p0 1.30 0.82 -37.2% PASS
total/synth/PUSH26/p1 1.15 1.23 +7.1% PASS
total/synth/PUSH27/p0 1.21 1.08 -11.1% PASS
total/synth/PUSH27/p1 1.14 1.21 +6.0% PASS
total/synth/PUSH28/p0 1.32 1.08 -18.5% PASS
total/synth/PUSH28/p1 1.15 1.22 +5.6% PASS
total/synth/PUSH29/p0 1.32 1.08 -18.4% PASS
total/synth/PUSH29/p1 1.14 1.22 +6.6% PASS
total/synth/PUSH3/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH3/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH30/p0 1.32 1.09 -17.5% PASS
total/synth/PUSH30/p1 1.18 1.21 +3.1% PASS
total/synth/PUSH31/p0 1.30 1.08 -17.0% PASS
total/synth/PUSH31/p1 1.26 1.31 +3.4% PASS
total/synth/PUSH32/p0 1.19 1.08 -9.6% PASS
total/synth/PUSH32/p1 1.19 1.25 +5.4% PASS
total/synth/PUSH4/p0 0.01 0.01 -2.8% PASS
total/synth/PUSH4/p1 0.01 0.01 -2.7% PASS
total/synth/PUSH5/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH5/p1 0.01 0.01 -2.8% PASS
total/synth/PUSH6/p0 0.01 0.01 -2.7% PASS
total/synth/PUSH6/p1 0.01 0.01 -2.7% PASS
total/synth/PUSH7/p0 0.01 0.01 -2.7% PASS
total/synth/PUSH7/p1 0.01 0.01 -2.7% PASS
total/synth/PUSH8/p0 0.01 0.01 -2.7% PASS
total/synth/PUSH8/p1 0.01 0.01 -2.7% PASS
total/synth/PUSH9/p0 0.01 0.01 -2.6% PASS
total/synth/PUSH9/p1 0.01 0.01 -2.7% PASS
total/synth/RETURNDATASIZE/a0 0.53 0.53 +0.3% PASS
total/synth/RETURNDATASIZE/a1 0.49 0.49 +0.1% PASS
total/synth/SAR/b0 3.79 3.78 -0.2% PASS
total/synth/SAR/b1 4.28 4.28 -0.2% PASS
total/synth/SGT/b0 0.01 0.01 -3.2% PASS
total/synth/SGT/b1 0.01 0.01 -2.8% PASS
total/synth/SHL/b0 3.07 3.05 -0.5% PASS
total/synth/SHL/b1 1.69 1.74 +2.9% PASS
total/synth/SHR/b0 3.10 3.09 -0.2% PASS
total/synth/SHR/b1 1.64 1.65 +0.6% PASS
total/synth/SIGNEXTEND/b0 3.11 3.50 +12.5% PASS
total/synth/SIGNEXTEND/b1 3.23 3.65 +12.8% PASS
total/synth/SLT/b0 0.01 0.01 -2.9% PASS
total/synth/SLT/b1 0.01 0.01 -2.8% PASS
total/synth/SUB/b0 0.01 0.01 -3.2% PASS
total/synth/SUB/b1 0.01 0.01 -2.9% PASS
total/synth/SWAP1/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP10/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP11/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP12/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP13/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP14/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP15/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP16/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP2/s0 0.01 0.01 -0.1% PASS
total/synth/SWAP3/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP4/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP5/s0 0.01 0.01 -0.0% PASS
total/synth/SWAP6/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP7/s0 0.01 0.01 -0.0% PASS
total/synth/SWAP8/s0 0.01 0.01 -0.0% PASS
total/synth/SWAP9/s0 0.01 0.01 -0.0% PASS
total/synth/XOR/b0 0.01 0.01 -2.8% PASS
total/synth/XOR/b1 0.01 0.01 -2.8% PASS
total/synth/loop_v1 1.41 1.42 +0.5% PASS
total/synth/loop_v2 1.41 1.42 +1.1% PASS

Summary: 194 benchmarks, 0 regressions


Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes EVM execution in interpreter mode by removing virtual-stack overhead and reusing interpreter execution context for top-level calls to reduce per-call allocations.

Changes:

  • Cache evm::InterpreterExecContext in thread-local storage for top-level interpreter calls to avoid repeated allocation of the large EVMFrame.
  • Bypass ZEN_ENABLE_VIRTUAL_STACK handling when RunMode::InterpMode is active, executing directly on the physical stack.
  • Add explanatory comments and performance rationale for interpreter-mode behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +684 to +689
TLCtxInUse = true;
evm::BaseInterpreter Interpreter(*TLCtx);
TLCtx->allocTopFrame(&Msg);
Interpreter.interpret();
Result = std::move(const_cast<evmc::Result &>(TLCtx->getExeResult()));
TLCtxInUse = false;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLCtxInUse is not exception-safe: if allocTopFrame() or interpret() throws, the flag will remain true for the rest of the thread lifetime. That can permanently disable reuse (or change reentrancy behavior) and makes later calls take the nested path unexpectedly. Consider using an RAII guard (scope-exit) or try { ... } catch (...) { TLCtxInUse = false; throw; } to ensure the flag is reset on all exit paths.

Suggested change
TLCtxInUse = true;
evm::BaseInterpreter Interpreter(*TLCtx);
TLCtx->allocTopFrame(&Msg);
Interpreter.interpret();
Result = std::move(const_cast<evmc::Result &>(TLCtx->getExeResult()));
TLCtxInUse = false;
struct TLCtxGuard {
bool &Flag;
explicit TLCtxGuard(bool &F) : Flag(F) { Flag = true; }
~TLCtxGuard() { Flag = false; }
} Guard(TLCtxInUse);
evm::BaseInterpreter Interpreter(*TLCtx);
TLCtx->allocTopFrame(&Msg);
Interpreter.interpret();
Result = std::move(const_cast<evmc::Result &>(TLCtx->getExeResult()));

Copilot uses AI. Check for mistakes.
Comment on lines +674 to +683
static thread_local evm::InterpreterExecContext *TLCtx = nullptr;
static thread_local bool TLCtxInUse = false;

if (!TLCtxInUse) {
// Top-level call: reuse the cached context
if (!TLCtx) {
TLCtx = new evm::InterpreterExecContext(&Inst);
} else {
TLCtx->resetForNewCall(&Inst);
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cached InterpreterExecContext is allocated with new into a thread_local raw pointer and is never freed. Even though it’s “only” once per thread, this is still an intentional leak and can matter for short-lived worker threads / fuzzing. Prefer static thread_local std::unique_ptr<evm::InterpreterExecContext> (or std::optional if default-constructible) so the context is reclaimed on thread exit and ownership is explicit.

Copilot uses AI. Check for mistakes.
Comment on lines +736 to +741
// Interpreter mode does not need a virtual stack: it manages call depth
// via InterpreterExecContext::FrameStack and never emits native code that
// could overflow the physical stack in an unbounded way. Skipping the
// virtual-stack allocation/mprotect/setjmp round-trip on every call
// eliminates ~50 % of the per-execution overhead measured on ERC-20
// transfers.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says interpreter mode “manages call depth via InterpreterExecContext::FrameStack”, but CALL/CREATE in the interpreter go through Host->call(NewMsg) (see CallHandler::doExecute()), i.e. they re-enter execution rather than pushing additional frames onto FrameStack. Since FrameStack is currently only used for the single top frame (allocTopFrame()/freeBackFrame()), this rationale seems inaccurate/misleading—please reword to reflect the actual depth management mechanism (evmc_message.depth / host recursion).

Suggested change
// Interpreter mode does not need a virtual stack: it manages call depth
// via InterpreterExecContext::FrameStack and never emits native code that
// could overflow the physical stack in an unbounded way. Skipping the
// virtual-stack allocation/mprotect/setjmp round-trip on every call
// eliminates ~50 % of the per-execution overhead measured on ERC-20
// transfers.
// Interpreter mode does not need a virtual stack: CALL/CREATE re-enter
// execution via the host with an incremented evmc_message.depth rather
// than pushing additional frames onto InterpreterExecContext::FrameStack,
// so call depth is bounded by the EVM depth limit rather than unbounded
// native recursion. Skipping the virtual-stack allocation/mprotect/setjmp
// round-trip on every call eliminates ~50 % of the per-execution overhead
// measured on ERC-20 transfers.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants