perf(evm): bypass virtual stack and cache InterpreterExecContext for …#419
perf(evm): bypass virtual stack and cache InterpreterExecContext for …#419starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
Conversation
…interpreter mode Interpreter mode manages call depth via InterpreterExecContext::FrameStack and never emits native code, so it does not need the virtual stack's mprotect/setjmp/longjmp round-trip that was designed for JIT stack guard pages. Skip it entirely when Mode == InterpMode. Additionally, cache the InterpreterExecContext in a thread-local for top-level calls (depth == 0) to avoid re-allocating the ~33 KB EVMFrame on every callEVMInInterpMode invocation. Nested calls (CALL/CREATE re-entering via Host->call()) still create a fresh context for safety. Co-Authored-By: Claude Opus 4.6 <[email protected]>
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
There was a problem hiding this comment.
Pull request overview
This PR optimizes EVM execution in interpreter mode by removing virtual-stack overhead and reusing interpreter execution context for top-level calls to reduce per-call allocations.
Changes:
- Cache
evm::InterpreterExecContextin thread-local storage for top-level interpreter calls to avoid repeated allocation of the largeEVMFrame. - Bypass
ZEN_ENABLE_VIRTUAL_STACKhandling whenRunMode::InterpModeis active, executing directly on the physical stack. - Add explanatory comments and performance rationale for interpreter-mode behavior.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TLCtxInUse = true; | ||
| evm::BaseInterpreter Interpreter(*TLCtx); | ||
| TLCtx->allocTopFrame(&Msg); | ||
| Interpreter.interpret(); | ||
| Result = std::move(const_cast<evmc::Result &>(TLCtx->getExeResult())); | ||
| TLCtxInUse = false; |
There was a problem hiding this comment.
TLCtxInUse is not exception-safe: if allocTopFrame() or interpret() throws, the flag will remain true for the rest of the thread lifetime. That can permanently disable reuse (or change reentrancy behavior) and makes later calls take the nested path unexpectedly. Consider using an RAII guard (scope-exit) or try { ... } catch (...) { TLCtxInUse = false; throw; } to ensure the flag is reset on all exit paths.
| TLCtxInUse = true; | |
| evm::BaseInterpreter Interpreter(*TLCtx); | |
| TLCtx->allocTopFrame(&Msg); | |
| Interpreter.interpret(); | |
| Result = std::move(const_cast<evmc::Result &>(TLCtx->getExeResult())); | |
| TLCtxInUse = false; | |
| struct TLCtxGuard { | |
| bool &Flag; | |
| explicit TLCtxGuard(bool &F) : Flag(F) { Flag = true; } | |
| ~TLCtxGuard() { Flag = false; } | |
| } Guard(TLCtxInUse); | |
| evm::BaseInterpreter Interpreter(*TLCtx); | |
| TLCtx->allocTopFrame(&Msg); | |
| Interpreter.interpret(); | |
| Result = std::move(const_cast<evmc::Result &>(TLCtx->getExeResult())); |
| static thread_local evm::InterpreterExecContext *TLCtx = nullptr; | ||
| static thread_local bool TLCtxInUse = false; | ||
|
|
||
| if (!TLCtxInUse) { | ||
| // Top-level call: reuse the cached context | ||
| if (!TLCtx) { | ||
| TLCtx = new evm::InterpreterExecContext(&Inst); | ||
| } else { | ||
| TLCtx->resetForNewCall(&Inst); | ||
| } |
There was a problem hiding this comment.
The cached InterpreterExecContext is allocated with new into a thread_local raw pointer and is never freed. Even though it’s “only” once per thread, this is still an intentional leak and can matter for short-lived worker threads / fuzzing. Prefer static thread_local std::unique_ptr<evm::InterpreterExecContext> (or std::optional if default-constructible) so the context is reclaimed on thread exit and ownership is explicit.
| // Interpreter mode does not need a virtual stack: it manages call depth | ||
| // via InterpreterExecContext::FrameStack and never emits native code that | ||
| // could overflow the physical stack in an unbounded way. Skipping the | ||
| // virtual-stack allocation/mprotect/setjmp round-trip on every call | ||
| // eliminates ~50 % of the per-execution overhead measured on ERC-20 | ||
| // transfers. |
There was a problem hiding this comment.
The comment says interpreter mode “manages call depth via InterpreterExecContext::FrameStack”, but CALL/CREATE in the interpreter go through Host->call(NewMsg) (see CallHandler::doExecute()), i.e. they re-enter execution rather than pushing additional frames onto FrameStack. Since FrameStack is currently only used for the single top frame (allocTopFrame()/freeBackFrame()), this rationale seems inaccurate/misleading—please reword to reflect the actual depth management mechanism (evmc_message.depth / host recursion).
| // Interpreter mode does not need a virtual stack: it manages call depth | |
| // via InterpreterExecContext::FrameStack and never emits native code that | |
| // could overflow the physical stack in an unbounded way. Skipping the | |
| // virtual-stack allocation/mprotect/setjmp round-trip on every call | |
| // eliminates ~50 % of the per-execution overhead measured on ERC-20 | |
| // transfers. | |
| // Interpreter mode does not need a virtual stack: CALL/CREATE re-enter | |
| // execution via the host with an incremented evmc_message.depth rather | |
| // than pushing additional frames onto InterpreterExecContext::FrameStack, | |
| // so call depth is bounded by the EVM depth limit rather than unbounded | |
| // native recursion. Skipping the virtual-stack allocation/mprotect/setjmp | |
| // round-trip on every call eliminates ~50 % of the per-execution overhead | |
| // measured on ERC-20 transfers. |
…interpreter mode
Interpreter mode manages call depth via InterpreterExecContext::FrameStack and never emits native code, so it does not need the virtual stack's mprotect/setjmp/longjmp round-trip that was designed for JIT stack guard pages. Skip it entirely when Mode == InterpMode.
Additionally, cache the InterpreterExecContext in a thread-local for top-level calls (depth == 0) to avoid re-allocating the ~33 KB EVMFrame on every callEVMInInterpMode invocation. Nested calls (CALL/CREATE re-entering via Host->call()) still create a fresh context for safety.
1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
6. Release note