perf(evm): add runtime known-u64 DIV fast path#429
perf(evm): add runtime known-u64 DIV fast path#429ZR74 wants to merge 10 commits intoDTVMStack:mainfrom
Conversation
…ltipass JIT Optimize dMIR generation when PUSH operands are u64 constants (PUSH1-PUSH8), which account for ~90% of all PUSH operations in typical EVM contracts. - Add isConstU64() tracking to Operand class for compile-time width detection - Implement constant folding for all binary ops (ADD, SUB, MUL, AND, OR, XOR, EQ, LT, GT, SLT, SGT, ISZERO, NOT, SHL, SHR, SAR) when both operands are constants, reducing to zero dMIR instructions - Add u64 fast paths for AND/OR/XOR: skip upper 3 limb operations using identity/annihilation properties (50-87% x86 reduction) - Add u64 fast paths for MUL: 4x1 schoolbook (~75% savings) and 1x1 (~95% savings) specializations, dramatically reducing register pressure - Add u64 fast paths for EQ/LT/GT: upper-limb OR-fold pattern (~50% savings) - Add u64 fast paths for ADD/SUB: shared zero constant for upper limbs, preserving carry chain correctness (~17-33% savings) Benchmark results (evmone-bench, multipass mode): - sha1_shifts: -14.1% faster - weierstrudel/15: -2.7% faster - blake2b_shifts: -2.3% faster - snailtracer: -1.9% faster - No regressions on non-qualifying workloads Made-with: Cursor
…fast paths Replace expensive runtime calls for EVM DIV/SDIV/MOD/SMOD/ADDMOD/MULMOD with compile-time and inline alternatives when operand properties allow: - Constant folding: evaluate at compile time when all operands are constants - Power-of-2 divisor: DIV(x,2^n)->SHR(n,x), MOD(x,2^n)->AND(x,2^n-1) - New evm_udiv128_by64/evm_urem128_by64 MIR intrinsics with x86 DIV64r lowering - u64 divisor: cascading 4x DIV64r for inline u256/u64 division (no runtime call) - u64 dividend: OR-fold + select for fast quotient/remainder when dividend < 2^64 - SDIV/SMOD with u64 const divisor: negate-divide-restore sign handling Benchmarks show -3.7% snailtracer, -9.7% sha1_shifts, -2~5% swap_math. All 1799 EVM state tests pass. Made-with: Cursor
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR adds a specialized codegen path for EVM 256-bit division/modulo when the divisor is known to fit in u64, by introducing new MIR ops that map directly to x86-64’s DIV instruction and wiring these ops through MIR visiting and x86 lowering. It also extends the EVM MIR builder with additional constant folding and several u64-oriented fast paths around arithmetic/bitwise/compare/shift operations to reduce runtime helper calls and unnecessary multi-limb work.
Changes:
- Add new MIR opcodes/instructions (
evm_udiv128_by64,evm_urem128_by64) and route them through MIR visitor infrastructure and CGIR lowering dispatch. - Implement x86 lowering for the new 128/64 division + remainder extraction using
DIV64r(RDX:RAX semantics). - Enhance
EVMMirBuilderwith constant folding and u64 fast paths, including an inline cascading 128/64 division algorithm for DIV/MOD when the divisor is a constant u64.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/compiler/target/x86/x86lowering.h | Declare new x86 lowering entrypoints and add remainder-register tracking map. |
| src/compiler/target/x86/x86lowering.cpp | Lower new MIR div/rem ops via DIV64r, returning quotient and caching remainder. |
| src/compiler/mir/pass/visitor.h | Add visitor dispatch + operand visitation for new MIR instruction kinds. |
| src/compiler/mir/opcodes.def | Define new opcode names for the MIR opcode enum generation. |
| src/compiler/mir/opcode.h | Extend OP_OTHER_EXPR_END to include the new ops in the opcode range. |
| src/compiler/mir/instructions.h | Define new MIR instruction classes for 128/64 udiv and remainder extraction. |
| src/compiler/mir/instructions.cpp | Add printing support for the new MIR instructions. |
| src/compiler/mir/instruction.h | Add new MInstruction kinds for the div/rem ops. |
| src/compiler/evm_frontend/evm_mir_compiler.h | Add const-u64 detection, intx<->U256 helpers, and declarations for new fast-path helpers. |
| src/compiler/evm_frontend/evm_mir_compiler.cpp | Implement cascading 128/64 div/mod, div/mod/sdiv/smod constant folding, and multiple u64 fast paths. |
| src/compiler/cgir/lowering.h | Dispatch lowering for the new MIR instruction kinds into target lowering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| CgRegister | ||
| X86CgLowering::lowerEvmUdiv128By64Expr(const EvmUdiv128By64Instruction &Inst) { | ||
| const MInstruction *Hi = Inst.getOperand<0>(); | ||
| const MInstruction *Lo = Inst.getOperand<1>(); | ||
| const MInstruction *Divisor = Inst.getOperand<2>(); | ||
|
|
||
| CgRegister HiReg = lowerExpr(*Hi); | ||
| CgRegister LoReg = lowerExpr(*Lo); | ||
| CgRegister DivReg = lowerExpr(*Divisor); | ||
|
|
||
| SmallVector<CgOperand, 2> CopyToRDX{ | ||
| CgOperand::createRegOperand(X86::RDX, true), | ||
| CgOperand::createRegOperand(HiReg, false), | ||
| }; | ||
| MF->createCgInstruction(*CurBB, TII.get(TargetOpcode::COPY), CopyToRDX); | ||
|
|
||
| SmallVector<CgOperand, 2> CopyToRAX{ | ||
| CgOperand::createRegOperand(X86::RAX, true), | ||
| CgOperand::createRegOperand(LoReg, false), | ||
| }; | ||
| MF->createCgInstruction(*CurBB, TII.get(TargetOpcode::COPY), CopyToRAX); | ||
|
|
||
| // DIV64r: RDX:RAX / r64 -> quotient in RAX, remainder in RDX | ||
| SmallVector<CgOperand, 1> DIVOperands{ | ||
| CgOperand::createRegOperand(DivReg, false), | ||
| }; | ||
| MF->createCgInstruction(*CurBB, TII.get(X86::DIV64r), DIVOperands); | ||
|
|
There was a problem hiding this comment.
DIV64r will raise a hardware #DE (SIGFPE) if the divisor is 0 or if the 128-bit dividend in RDX:RAX produces a quotient that doesn’t fit in 64 bits (i.e., Hi >= Divisor). This lowering assumes those preconditions but doesn’t enforce them; if MIR construction ever violates them, the compiled code can crash the process instead of following EVM semantics (DIV/MOD by 0 => 0). Consider adding a debug assertion / verifier rule that Divisor != 0 and Hi < Divisor for EvmUdiv128By64Instruction, or alternatively lower to a safe slow-path when the preconditions aren’t provably satisfied.
| typename EVMMirBuilder::Operand EVMMirBuilder::handleDiv(Operand DividendOp, | ||
| Operand DivisorOp) { | ||
| if (DividendOp.isConstant() && DivisorOp.isConstant()) { | ||
| intx::uint256 D = u256ValueToIntx(DivisorOp.getConstValue()); | ||
| if (D == 0) | ||
| return Operand(U256Value{0, 0, 0, 0}); | ||
| intx::uint256 N = u256ValueToIntx(DividendOp.getConstValue()); | ||
| return Operand(intxToU256Value(N / D)); | ||
| } | ||
|
|
||
| // DIV(x, 2^n) -> SHR(x, n) | ||
| if (DivisorOp.isConstant()) { | ||
| intx::uint256 D = u256ValueToIntx(DivisorOp.getConstValue()); | ||
| if (D != 0 && (D & (D - 1)) == 0) { | ||
| unsigned ShiftAmt = 0; | ||
| intx::uint256 Tmp = D; | ||
| while (Tmp > 1) { | ||
| Tmp >>= 1; | ||
| ++ShiftAmt; | ||
| } | ||
| Operand ShiftOp(U256Value{ShiftAmt, 0, 0, 0}); | ||
| return handleShift<BinaryOperator::BO_SHR_U>(ShiftOp, DividendOp); | ||
| } | ||
| } | ||
|
|
||
| // u64 divisor: inline cascading 128/64 division | ||
| if (DivisorOp.isConstU64()) { | ||
| uint64_t D = DivisorOp.getConstValue()[0]; | ||
| if (D != 0) { | ||
| if (!DividendOp.isConstant()) { | ||
| U256Inst A = extractU256Operand(DividendOp); | ||
| MType *I64Type = &Ctx.I64Type; | ||
| MInstruction *Zero = createIntConstInstruction(I64Type, 0); | ||
|
|
||
| MInstruction *UpperAny = createInstruction<BinaryInstruction>( | ||
| false, OP_or, I64Type, A[1], | ||
| createInstruction<BinaryInstruction>(false, OP_or, I64Type, A[2], | ||
| A[3])); | ||
| MInstruction *HasUpper = createInstruction<CmpInstruction>( | ||
| false, CmpInstruction::ICMP_NE, I64Type, UpperAny, Zero); | ||
|
|
||
| U256Var ResultVars = {}; | ||
| for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) { | ||
| ResultVars[I] = CurFunc->createVariable(I64Type); | ||
| } | ||
|
|
||
| auto storeResult = [&](const U256Inst &Values) { | ||
| for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) { | ||
| createInstruction<DassignInstruction>( | ||
| true, &(Ctx.VoidType), Values[I], ResultVars[I]->getVarIdx()); | ||
| } | ||
| }; | ||
|
|
||
| auto loadResult = [&]() -> U256Inst { | ||
| U256Inst Values = {}; | ||
| for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) { | ||
| Values[I] = loadVariable(ResultVars[I]); | ||
| } | ||
| return Values; | ||
| }; | ||
|
|
||
| MBasicBlock *KnownU64BB = createBasicBlock(); | ||
| MBasicBlock *SlowBB = createBasicBlock(); | ||
| MBasicBlock *AfterBB = createBasicBlock(); | ||
| createInstruction<BrIfInstruction>(true, Ctx, HasUpper, SlowBB, | ||
| KnownU64BB); | ||
| addSuccessor(SlowBB); | ||
| addSuccessor(KnownU64BB); | ||
|
|
||
| setInsertBlock(KnownU64BB); | ||
| MInstruction *DivConst = createIntConstInstruction(I64Type, D); | ||
| MInstruction *Quotient = createInstruction<BinaryInstruction>( | ||
| false, OP_udiv, I64Type, A[0], DivConst); | ||
| U256Inst FastResult = {Quotient, Zero, Zero, Zero}; | ||
| storeResult(FastResult); | ||
| createInstruction<BrInstruction>(true, Ctx, AfterBB); | ||
| addSuccessor(AfterBB); | ||
|
|
||
| setInsertBlock(SlowBB); | ||
| U256Inst SlowResult = | ||
| extractU256Operand(handleDivU64Divisor(DividendOp, D)); | ||
| storeResult(SlowResult); | ||
| createInstruction<BrInstruction>(true, Ctx, AfterBB); | ||
| addSuccessor(AfterBB); | ||
|
|
||
| setInsertBlock(AfterBB); | ||
| return Operand(loadResult(), EVMType::UINT256); | ||
| } | ||
| return handleDivU64Divisor(DividendOp, D); | ||
| } | ||
| } | ||
|
|
||
| // u64 dividend: OR-fold + select | ||
| if (DividendOp.isConstU64()) { | ||
| uint64_t A = DividendOp.getConstValue()[0]; | ||
| return handleDivU64Dividend(A, DivisorOp); | ||
| } | ||
|
|
There was a problem hiding this comment.
This PR introduces multiple new constant-folding and u64 fast paths for DIV/MOD/SDIV/SMOD (including new MIR ops evm_udiv128_by64/evm_urem128_by64 and new control-flow for the “known-u64 dividend” branch). There are no accompanying tests in this change set to validate correctness for edge cases (e.g., divisor=0, power-of-two divisor rewrite, dividends with high limbs set, signed negative dividends). Please add targeted tests (unit or existing EVM fixture-based) that exercise these new fast paths and compare results against the runtime slow path.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 1 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
|
conflict |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // If divisor has any upper limb set, b > a, so DIV = 0 | ||
| MInstruction *Upper = createInstruction<BinaryInstruction>( | ||
| false, OP_or, I64Type, B[1], | ||
| createInstruction<BinaryInstruction>(false, OP_or, I64Type, B[2], B[3])); | ||
| MInstruction *HasUpper = createInstruction<CmpInstruction>( | ||
| false, CmpInstruction::ICMP_NE, &Ctx.I64Type, Upper, Zero); | ||
|
|
||
| MInstruction *A0 = createIntConstInstruction(I64Type, Dividend); | ||
| MInstruction *Q64 = | ||
| createInstruction<BinaryInstruction>(false, OP_udiv, I64Type, A0, B[0]); | ||
| MInstruction *DivResult = | ||
| createInstruction<SelectInstruction>(false, I64Type, HasUpper, Zero, Q64); | ||
|
|
There was a problem hiding this comment.
handleDivU64Dividend() can generate an OP_udiv by B[0] without guarding against the runtime case where the divisor is 0 (and all upper limbs are 0). On x86 lowering this becomes DIV64r, which traps on division-by-zero, but EVM DIV(x,0) must evaluate to 0. Add an explicit B == 0 check (e.g., OR-fold all limbs) and select 0 when the divisor is zero before emitting OP_udiv.
| MInstruction *Upper = createInstruction<BinaryInstruction>( | ||
| false, OP_or, I64Type, B[1], | ||
| createInstruction<BinaryInstruction>(false, OP_or, I64Type, B[2], B[3])); | ||
| MInstruction *HasUpper = createInstruction<CmpInstruction>( | ||
| false, CmpInstruction::ICMP_NE, &Ctx.I64Type, Upper, Zero); | ||
|
|
||
| MInstruction *A0 = createIntConstInstruction(I64Type, Dividend); | ||
| MInstruction *R64 = | ||
| createInstruction<BinaryInstruction>(false, OP_urem, I64Type, A0, B[0]); | ||
| MInstruction *ModResult = | ||
| createInstruction<SelectInstruction>(false, I64Type, HasUpper, A0, R64); | ||
|
|
There was a problem hiding this comment.
handleModU64Dividend() can emit OP_urem by B[0] without guarding against the runtime case where the divisor is 0 (and upper limbs are 0). On x86 this lowers to DIV64r-based remainder and will trap, while EVM MOD(x,0) must evaluate to 0. Add a B == 0 check and select 0 as the result when the divisor is zero before evaluating OP_urem.
| OPCODE(evm_udiv128_by64) // RDX:RAX / r64 -> quotient (RAX) | ||
| OPCODE(evm_urem128_by64) // extract remainder (RDX) from evm_udiv128_by64 |
There was a problem hiding this comment.
The new MIR opcode comments for evm_udiv128_by64 / evm_urem128_by64 describe x86 register behavior (RDX:RAX / RAX,RDX). Since MIR is target-independent, this is misleading and may hide important invariants (e.g., x86 DIV traps if high limb >= divisor). Consider rewording these comments to describe the abstract operation (128/64 udiv and extracting remainder) and, if needed, document x86-specific lowering constraints in the x86 backend instead.
| OPCODE(evm_udiv128_by64) // RDX:RAX / r64 -> quotient (RAX) | |
| OPCODE(evm_urem128_by64) // extract remainder (RDX) from evm_udiv128_by64 | |
| OPCODE(evm_udiv128_by64) // unsigned 128-bit (hi:lo) divided by 64-bit divisor -> 64-bit quotient | |
| OPCODE(evm_urem128_by64) // unsigned remainder from the same 128/64 division as evm_udiv128_by64 |
| uint64_t D = DivisorOp.getConstValue()[0]; | ||
| if (D != 0) | ||
| if (D != 0) { | ||
| if (!DividendOp.isConstant()) { |
There was a problem hiding this comment.
line 1506 has check if (DividendOp.isConstant() && DivisorOp.isConstant()), the condition !DividendOp.isConstant() will always be true here.
1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
6. Release note