Skip to content

perf(evm): add runtime known-u64 DIV fast path#429

Open
ZR74 wants to merge 10 commits intoDTVMStack:mainfrom
ZR74:perf/known-u64-div
Open

perf(evm): add runtime known-u64 DIV fast path#429
ZR74 wants to merge 10 commits intoDTVMStack:mainfrom
ZR74:perf/known-u64-div

Conversation

@ZR74
Copy link
Copy Markdown
Contributor

@ZR74 ZR74 commented Mar 23, 2026

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

  • N
  • Y

2. What is the scope of this PR (e.g. component or file name):

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

  • Affects user behaviors
  • Contains CI/CD configuration changes
  • Contains documentation changes
  • Contains experimental features
  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Other

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

  • N
  • Y

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

  • Unit test
  • Integration test
  • Benchmark (add benchmark stats below)
  • Manual test (add detailed scripts or steps below)
  • Other

6. Release note

None

starwarfan and others added 7 commits March 16, 2026 12:56
…ltipass JIT

Optimize dMIR generation when PUSH operands are u64 constants (PUSH1-PUSH8),
which account for ~90% of all PUSH operations in typical EVM contracts.

- Add isConstU64() tracking to Operand class for compile-time width detection
- Implement constant folding for all binary ops (ADD, SUB, MUL, AND, OR, XOR,
  EQ, LT, GT, SLT, SGT, ISZERO, NOT, SHL, SHR, SAR) when both operands are
  constants, reducing to zero dMIR instructions
- Add u64 fast paths for AND/OR/XOR: skip upper 3 limb operations using
  identity/annihilation properties (50-87% x86 reduction)
- Add u64 fast paths for MUL: 4x1 schoolbook (~75% savings) and 1x1 (~95%
  savings) specializations, dramatically reducing register pressure
- Add u64 fast paths for EQ/LT/GT: upper-limb OR-fold pattern (~50% savings)
- Add u64 fast paths for ADD/SUB: shared zero constant for upper limbs,
  preserving carry chain correctness (~17-33% savings)

Benchmark results (evmone-bench, multipass mode):
- sha1_shifts: -14.1% faster
- weierstrudel/15: -2.7% faster
- blake2b_shifts: -2.3% faster
- snailtracer: -1.9% faster
- No regressions on non-qualifying workloads

Made-with: Cursor
…fast paths

Replace expensive runtime calls for EVM DIV/SDIV/MOD/SMOD/ADDMOD/MULMOD with
compile-time and inline alternatives when operand properties allow:

- Constant folding: evaluate at compile time when all operands are constants
- Power-of-2 divisor: DIV(x,2^n)->SHR(n,x), MOD(x,2^n)->AND(x,2^n-1)
- New evm_udiv128_by64/evm_urem128_by64 MIR intrinsics with x86 DIV64r lowering
- u64 divisor: cascading 4x DIV64r for inline u256/u64 division (no runtime call)
- u64 dividend: OR-fold + select for fast quotient/remainder when dividend < 2^64
- SDIV/SMOD with u64 const divisor: negate-divide-restore sign handling

Benchmarks show -3.7% snailtracer, -9.7% sha1_shifts, -2~5% swap_math.
All 1799 EVM state tests pass.

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 23, 2026 11:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a specialized codegen path for EVM 256-bit division/modulo when the divisor is known to fit in u64, by introducing new MIR ops that map directly to x86-64’s DIV instruction and wiring these ops through MIR visiting and x86 lowering. It also extends the EVM MIR builder with additional constant folding and several u64-oriented fast paths around arithmetic/bitwise/compare/shift operations to reduce runtime helper calls and unnecessary multi-limb work.

Changes:

  • Add new MIR opcodes/instructions (evm_udiv128_by64, evm_urem128_by64) and route them through MIR visitor infrastructure and CGIR lowering dispatch.
  • Implement x86 lowering for the new 128/64 division + remainder extraction using DIV64r (RDX:RAX semantics).
  • Enhance EVMMirBuilder with constant folding and u64 fast paths, including an inline cascading 128/64 division algorithm for DIV/MOD when the divisor is a constant u64.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/compiler/target/x86/x86lowering.h Declare new x86 lowering entrypoints and add remainder-register tracking map.
src/compiler/target/x86/x86lowering.cpp Lower new MIR div/rem ops via DIV64r, returning quotient and caching remainder.
src/compiler/mir/pass/visitor.h Add visitor dispatch + operand visitation for new MIR instruction kinds.
src/compiler/mir/opcodes.def Define new opcode names for the MIR opcode enum generation.
src/compiler/mir/opcode.h Extend OP_OTHER_EXPR_END to include the new ops in the opcode range.
src/compiler/mir/instructions.h Define new MIR instruction classes for 128/64 udiv and remainder extraction.
src/compiler/mir/instructions.cpp Add printing support for the new MIR instructions.
src/compiler/mir/instruction.h Add new MInstruction kinds for the div/rem ops.
src/compiler/evm_frontend/evm_mir_compiler.h Add const-u64 detection, intx<->U256 helpers, and declarations for new fast-path helpers.
src/compiler/evm_frontend/evm_mir_compiler.cpp Implement cascading 128/64 div/mod, div/mod/sdiv/smod constant folding, and multiple u64 fast paths.
src/compiler/cgir/lowering.h Dispatch lowering for the new MIR instruction kinds into target lowering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1209 to +1236
CgRegister
X86CgLowering::lowerEvmUdiv128By64Expr(const EvmUdiv128By64Instruction &Inst) {
const MInstruction *Hi = Inst.getOperand<0>();
const MInstruction *Lo = Inst.getOperand<1>();
const MInstruction *Divisor = Inst.getOperand<2>();

CgRegister HiReg = lowerExpr(*Hi);
CgRegister LoReg = lowerExpr(*Lo);
CgRegister DivReg = lowerExpr(*Divisor);

SmallVector<CgOperand, 2> CopyToRDX{
CgOperand::createRegOperand(X86::RDX, true),
CgOperand::createRegOperand(HiReg, false),
};
MF->createCgInstruction(*CurBB, TII.get(TargetOpcode::COPY), CopyToRDX);

SmallVector<CgOperand, 2> CopyToRAX{
CgOperand::createRegOperand(X86::RAX, true),
CgOperand::createRegOperand(LoReg, false),
};
MF->createCgInstruction(*CurBB, TII.get(TargetOpcode::COPY), CopyToRAX);

// DIV64r: RDX:RAX / r64 -> quotient in RAX, remainder in RDX
SmallVector<CgOperand, 1> DIVOperands{
CgOperand::createRegOperand(DivReg, false),
};
MF->createCgInstruction(*CurBB, TII.get(X86::DIV64r), DIVOperands);

Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DIV64r will raise a hardware #DE (SIGFPE) if the divisor is 0 or if the 128-bit dividend in RDX:RAX produces a quotient that doesn’t fit in 64 bits (i.e., Hi >= Divisor). This lowering assumes those preconditions but doesn’t enforce them; if MIR construction ever violates them, the compiled code can crash the process instead of following EVM semantics (DIV/MOD by 0 => 0). Consider adding a debug assertion / verifier rule that Divisor != 0 and Hi < Divisor for EvmUdiv128By64Instruction, or alternatively lower to a safe slow-path when the preconditions aren’t provably satisfied.

Copilot uses AI. Check for mistakes.
Comment on lines 1504 to +1601
typename EVMMirBuilder::Operand EVMMirBuilder::handleDiv(Operand DividendOp,
Operand DivisorOp) {
if (DividendOp.isConstant() && DivisorOp.isConstant()) {
intx::uint256 D = u256ValueToIntx(DivisorOp.getConstValue());
if (D == 0)
return Operand(U256Value{0, 0, 0, 0});
intx::uint256 N = u256ValueToIntx(DividendOp.getConstValue());
return Operand(intxToU256Value(N / D));
}

// DIV(x, 2^n) -> SHR(x, n)
if (DivisorOp.isConstant()) {
intx::uint256 D = u256ValueToIntx(DivisorOp.getConstValue());
if (D != 0 && (D & (D - 1)) == 0) {
unsigned ShiftAmt = 0;
intx::uint256 Tmp = D;
while (Tmp > 1) {
Tmp >>= 1;
++ShiftAmt;
}
Operand ShiftOp(U256Value{ShiftAmt, 0, 0, 0});
return handleShift<BinaryOperator::BO_SHR_U>(ShiftOp, DividendOp);
}
}

// u64 divisor: inline cascading 128/64 division
if (DivisorOp.isConstU64()) {
uint64_t D = DivisorOp.getConstValue()[0];
if (D != 0) {
if (!DividendOp.isConstant()) {
U256Inst A = extractU256Operand(DividendOp);
MType *I64Type = &Ctx.I64Type;
MInstruction *Zero = createIntConstInstruction(I64Type, 0);

MInstruction *UpperAny = createInstruction<BinaryInstruction>(
false, OP_or, I64Type, A[1],
createInstruction<BinaryInstruction>(false, OP_or, I64Type, A[2],
A[3]));
MInstruction *HasUpper = createInstruction<CmpInstruction>(
false, CmpInstruction::ICMP_NE, I64Type, UpperAny, Zero);

U256Var ResultVars = {};
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
ResultVars[I] = CurFunc->createVariable(I64Type);
}

auto storeResult = [&](const U256Inst &Values) {
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
createInstruction<DassignInstruction>(
true, &(Ctx.VoidType), Values[I], ResultVars[I]->getVarIdx());
}
};

auto loadResult = [&]() -> U256Inst {
U256Inst Values = {};
for (size_t I = 0; I < EVM_ELEMENTS_COUNT; ++I) {
Values[I] = loadVariable(ResultVars[I]);
}
return Values;
};

MBasicBlock *KnownU64BB = createBasicBlock();
MBasicBlock *SlowBB = createBasicBlock();
MBasicBlock *AfterBB = createBasicBlock();
createInstruction<BrIfInstruction>(true, Ctx, HasUpper, SlowBB,
KnownU64BB);
addSuccessor(SlowBB);
addSuccessor(KnownU64BB);

setInsertBlock(KnownU64BB);
MInstruction *DivConst = createIntConstInstruction(I64Type, D);
MInstruction *Quotient = createInstruction<BinaryInstruction>(
false, OP_udiv, I64Type, A[0], DivConst);
U256Inst FastResult = {Quotient, Zero, Zero, Zero};
storeResult(FastResult);
createInstruction<BrInstruction>(true, Ctx, AfterBB);
addSuccessor(AfterBB);

setInsertBlock(SlowBB);
U256Inst SlowResult =
extractU256Operand(handleDivU64Divisor(DividendOp, D));
storeResult(SlowResult);
createInstruction<BrInstruction>(true, Ctx, AfterBB);
addSuccessor(AfterBB);

setInsertBlock(AfterBB);
return Operand(loadResult(), EVMType::UINT256);
}
return handleDivU64Divisor(DividendOp, D);
}
}

// u64 dividend: OR-fold + select
if (DividendOp.isConstU64()) {
uint64_t A = DividendOp.getConstValue()[0];
return handleDivU64Dividend(A, DivisorOp);
}

Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces multiple new constant-folding and u64 fast paths for DIV/MOD/SDIV/SMOD (including new MIR ops evm_udiv128_by64/evm_urem128_by64 and new control-flow for the “known-u64 dividend” branch). There are no accompanying tests in this change set to validate correctness for edge cases (e.g., divisor=0, power-of-two divisor rewrite, dividends with high limbs set, signed negative dividends). Please add targeted tests (unit or existing EVM fixture-based) that exercise these new fast paths and compare results against the runtime slow path.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 23, 2026

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 1.50 1.52 +1.2% PASS
total/main/blake2b_huff/empty 0.02 0.02 +1.8% PASS
total/main/blake2b_shifts/8415nulls 11.64 11.46 -1.6% PASS
total/main/sha1_divs/5311 5.10 5.07 -0.5% PASS
total/main/sha1_divs/empty 0.06 0.06 -0.9% PASS
total/main/sha1_shifts/5311 2.86 2.84 -0.8% PASS
total/main/sha1_shifts/empty 0.04 0.04 -0.3% PASS
total/main/snailtracer/benchmark 52.44 52.69 +0.5% PASS
total/main/structarray_alloc/nfts_rank 1.00 1.03 +2.9% PASS
total/main/swap_math/insufficient_liquidity 0.00 0.00 +1.9% PASS
total/main/swap_math/received 0.01 0.01 +1.7% PASS
total/main/swap_math/spent 0.00 0.00 +1.6% PASS
total/main/weierstrudel/1 0.29 0.29 +0.4% PASS
total/main/weierstrudel/15 3.15 3.16 +0.5% PASS
total/micro/JUMPDEST_n0/empty 2.12 2.12 -0.0% PASS
total/micro/jump_around/empty 0.09 0.10 +1.0% PASS
total/micro/loop_with_many_jumpdests/empty 24.90 31.62 +27.0% REGRESSED
total/micro/memory_grow_mload/by1 0.09 0.09 -0.8% PASS
total/micro/memory_grow_mload/by16 0.10 0.10 +2.5% PASS
total/micro/memory_grow_mload/by32 0.11 0.11 +0.5% PASS
total/micro/memory_grow_mload/nogrow 0.09 0.09 -3.0% PASS
total/micro/memory_grow_mstore/by1 0.09 0.10 +3.2% PASS
total/micro/memory_grow_mstore/by16 0.11 0.11 -1.3% PASS
total/micro/memory_grow_mstore/by32 0.13 0.12 -1.5% PASS
total/micro/memory_grow_mstore/nogrow 0.09 0.09 -0.1% PASS
total/micro/signextend/one 0.25 0.24 -6.0% PASS
total/micro/signextend/zero 0.26 0.24 -6.3% PASS
total/synth/ADD/b0 3.52 3.53 +0.3% PASS
total/synth/ADD/b1 1.99 2.00 +0.2% PASS
total/synth/ADDRESS/a0 4.82 4.82 +0.0% PASS
total/synth/ADDRESS/a1 5.29 5.28 -0.2% PASS
total/synth/AND/b0 1.64 1.64 +0.0% PASS
total/synth/AND/b1 1.69 1.70 +0.3% PASS
total/synth/BYTE/b0 6.03 6.03 +0.0% PASS
total/synth/BYTE/b1 4.73 4.73 -0.1% PASS
total/synth/CALLDATASIZE/a0 3.20 3.20 +0.0% PASS
total/synth/CALLDATASIZE/a1 4.05 4.04 -0.3% PASS
total/synth/CALLER/a0 4.82 4.83 +0.1% PASS
total/synth/CALLER/a1 5.39 5.38 -0.2% PASS
total/synth/CALLVALUE/a0 3.02 3.02 -0.0% PASS
total/synth/CALLVALUE/a1 3.37 3.36 -0.3% PASS
total/synth/CODESIZE/a0 3.43 3.42 -0.1% PASS
total/synth/CODESIZE/a1 3.77 3.77 -0.2% PASS
total/synth/DUP1/d0 1.06 0.82 -22.8% PASS
total/synth/DUP1/d1 1.23 1.23 +0.1% PASS
total/synth/DUP10/d0 1.07 0.90 -15.6% PASS
total/synth/DUP10/d1 1.15 1.16 +0.3% PASS
total/synth/DUP11/d0 0.91 0.83 -9.6% PASS
total/synth/DUP11/d1 1.15 1.16 +0.3% PASS
total/synth/DUP12/d0 1.07 0.83 -22.7% PASS
total/synth/DUP12/d1 0.92 1.16 +25.6% PASS
total/synth/DUP13/d0 1.07 1.07 -0.1% PASS
total/synth/DUP13/d1 1.15 1.16 +0.3% PASS
total/synth/DUP14/d0 1.07 0.90 -15.5% PASS
total/synth/DUP14/d1 1.15 1.16 +0.3% PASS
total/synth/DUP15/d0 1.07 1.07 -0.1% PASS
total/synth/DUP15/d1 1.15 1.16 +0.2% PASS
total/synth/DUP16/d0 1.07 0.90 -15.5% PASS
total/synth/DUP16/d1 1.15 1.16 +0.2% PASS
total/synth/DUP2/d0 1.07 0.90 -15.1% PASS
total/synth/DUP2/d1 0.92 1.16 +25.7% PASS
total/synth/DUP3/d0 1.07 0.90 -15.4% PASS
total/synth/DUP3/d1 1.15 1.16 +0.3% PASS
total/synth/DUP4/d0 1.07 1.07 +0.1% PASS
total/synth/DUP4/d1 1.15 0.92 -20.4% PASS
total/synth/DUP5/d0 0.90 0.91 +0.5% PASS
total/synth/DUP5/d1 0.91 0.92 +0.3% PASS
total/synth/DUP6/d0 1.07 0.83 -22.8% PASS
total/synth/DUP6/d1 1.15 1.16 +0.3% PASS
total/synth/DUP7/d0 0.91 1.07 +17.7% PASS
total/synth/DUP7/d1 1.15 1.15 -0.0% PASS
total/synth/DUP8/d0 1.07 0.83 -22.4% PASS
total/synth/DUP8/d1 1.15 1.16 +0.3% PASS
total/synth/DUP9/d0 1.07 0.90 -15.4% PASS
total/synth/DUP9/d1 1.15 1.16 +0.2% PASS
total/synth/EQ/b0 2.73 2.73 +0.0% PASS
total/synth/EQ/b1 1.39 1.39 +0.2% PASS
total/synth/GAS/a0 3.68 3.67 -0.4% PASS
total/synth/GAS/a1 3.71 3.68 -0.8% PASS
total/synth/GT/b0 2.60 2.61 +0.0% PASS
total/synth/GT/b1 1.39 1.39 +0.3% PASS
total/synth/ISZERO/u0 0.99 0.99 -0.0% PASS
total/synth/JUMPDEST/n0 1.63 2.05 +25.4% PASS
total/synth/LT/b0 2.67 2.67 +0.0% PASS
total/synth/LT/b1 1.39 1.39 +0.2% PASS
total/synth/MSIZE/a0 4.25 4.25 +0.1% PASS
total/synth/MSIZE/a1 4.66 4.66 -0.1% PASS
total/synth/MUL/b0 5.29 5.29 +0.0% PASS
total/synth/MUL/b1 5.29 5.29 +0.0% PASS
total/synth/NOT/u0 1.68 1.68 +0.2% PASS
total/synth/OR/b0 1.63 1.63 +0.1% PASS
total/synth/OR/b1 1.71 1.71 +0.0% PASS
total/synth/PC/a0 3.18 3.18 -0.0% PASS
total/synth/PC/a1 4.03 4.03 +0.0% PASS
total/synth/PUSH1/p0 0.83 0.83 +0.0% PASS
total/synth/PUSH1/p1 1.14 1.14 +0.1% PASS
total/synth/PUSH10/p0 1.04 0.83 -20.0% PASS
total/synth/PUSH10/p1 1.20 1.23 +2.6% PASS
total/synth/PUSH11/p0 1.07 0.85 -20.6% PASS
total/synth/PUSH11/p1 1.23 1.21 -1.7% PASS
total/synth/PUSH12/p0 1.07 0.85 -20.4% PASS
total/synth/PUSH12/p1 1.23 1.24 +0.3% PASS
total/synth/PUSH13/p0 1.07 0.85 -20.4% PASS
total/synth/PUSH13/p1 1.20 1.21 +0.9% PASS
total/synth/PUSH14/p0 0.94 0.94 +0.2% PASS
total/synth/PUSH14/p1 1.23 1.20 -2.9% PASS
total/synth/PUSH15/p0 0.96 1.07 +11.2% PASS
total/synth/PUSH15/p1 1.30 1.30 -0.0% PASS
total/synth/PUSH16/p0 1.07 0.85 -20.5% PASS
total/synth/PUSH16/p1 1.22 0.97 -20.3% PASS
total/synth/PUSH17/p0 1.07 0.84 -21.0% PASS
total/synth/PUSH17/p1 1.22 1.20 -1.6% PASS
total/synth/PUSH18/p0 1.07 0.85 -20.6% PASS
total/synth/PUSH18/p1 1.23 1.23 -0.1% PASS
total/synth/PUSH19/p0 1.06 0.86 -19.4% PASS
total/synth/PUSH19/p1 1.21 1.24 +2.0% PASS
total/synth/PUSH2/p0 1.07 1.07 +0.0% PASS
total/synth/PUSH2/p1 1.15 1.15 -0.4% PASS
total/synth/PUSH20/p0 1.06 1.07 +0.1% PASS
total/synth/PUSH20/p1 1.24 1.24 -0.0% PASS
total/synth/PUSH21/p0 1.07 0.85 -20.2% PASS
total/synth/PUSH21/p1 1.23 1.24 +0.5% PASS
total/synth/PUSH22/p0 1.01 1.07 +5.5% PASS
total/synth/PUSH22/p1 1.23 1.23 -0.1% PASS
total/synth/PUSH23/p0 1.00 1.07 +6.3% PASS
total/synth/PUSH23/p1 1.25 1.24 -0.5% PASS
total/synth/PUSH24/p0 1.07 1.07 -0.0% PASS
total/synth/PUSH24/p1 1.23 1.23 +0.2% PASS
total/synth/PUSH25/p0 0.84 0.85 +0.1% PASS
total/synth/PUSH25/p1 1.21 1.21 -0.0% PASS
total/synth/PUSH26/p0 1.07 1.07 +0.1% PASS
total/synth/PUSH26/p1 1.24 1.24 -0.3% PASS
total/synth/PUSH27/p0 1.07 1.07 +0.0% PASS
total/synth/PUSH27/p1 1.24 1.24 +0.3% PASS
total/synth/PUSH28/p0 1.07 1.07 +0.1% PASS
total/synth/PUSH28/p1 1.23 1.24 +0.4% PASS
total/synth/PUSH29/p0 1.07 0.85 -20.7% PASS
total/synth/PUSH29/p1 1.23 1.25 +1.9% PASS
total/synth/PUSH3/p0 0.83 1.07 +28.2% PASS
total/synth/PUSH3/p1 1.21 1.21 +0.5% PASS
total/synth/PUSH30/p0 1.08 0.98 -9.3% PASS
total/synth/PUSH30/p1 1.24 1.24 -0.3% PASS
total/synth/PUSH31/p0 1.07 1.07 +0.2% PASS
total/synth/PUSH31/p1 1.34 1.37 +2.5% PASS
total/synth/PUSH32/p0 0.85 1.07 +25.7% PASS
total/synth/PUSH32/p1 1.24 1.26 +1.1% PASS
total/synth/PUSH4/p0 1.07 0.84 -21.0% PASS
total/synth/PUSH4/p1 1.20 1.21 +0.3% PASS
total/synth/PUSH5/p0 1.03 1.07 +3.8% PASS
total/synth/PUSH5/p1 1.21 1.21 -0.2% PASS
total/synth/PUSH6/p0 1.07 1.07 -0.1% PASS
total/synth/PUSH6/p1 1.22 1.23 +0.7% PASS
total/synth/PUSH7/p0 0.86 0.85 -1.7% PASS
total/synth/PUSH7/p1 1.24 1.21 -1.8% PASS
total/synth/PUSH8/p0 1.07 0.85 -20.1% PASS
total/synth/PUSH8/p1 1.23 1.20 -2.5% PASS
total/synth/PUSH9/p0 1.07 1.07 -0.0% PASS
total/synth/PUSH9/p1 1.21 1.24 +2.1% PASS
total/synth/RETURNDATASIZE/a0 3.34 3.27 -2.4% PASS
total/synth/RETURNDATASIZE/a1 3.61 3.60 -0.2% PASS
total/synth/SAR/b0 3.76 3.76 -0.0% PASS
total/synth/SAR/b1 4.27 4.27 -0.0% PASS
total/synth/SGT/b0 2.59 2.59 +0.0% PASS
total/synth/SGT/b1 1.63 1.63 -0.0% PASS
total/synth/SHL/b0 3.05 3.05 +0.0% PASS
total/synth/SHL/b1 1.63 1.63 +0.2% PASS
total/synth/SHR/b0 3.10 3.10 +0.0% PASS
total/synth/SHR/b1 1.51 1.51 +0.1% PASS
total/synth/SIGNEXTEND/b0 3.11 3.37 +8.3% PASS
total/synth/SIGNEXTEND/b1 3.55 3.55 -0.0% PASS
total/synth/SLT/b0 2.58 2.58 -0.0% PASS
total/synth/SLT/b1 1.63 1.63 +0.0% PASS
total/synth/SUB/b0 1.94 1.94 +0.1% PASS
total/synth/SUB/b1 1.97 1.98 +0.3% PASS
total/synth/SWAP1/s0 1.49 1.49 -0.0% PASS
total/synth/SWAP10/s0 1.51 1.50 -0.3% PASS
total/synth/SWAP11/s0 1.50 1.50 -0.0% PASS
total/synth/SWAP12/s0 1.50 1.50 -0.0% PASS
total/synth/SWAP13/s0 1.51 1.51 -0.0% PASS
total/synth/SWAP14/s0 1.51 1.51 +0.0% PASS
total/synth/SWAP15/s0 1.51 1.51 -0.0% PASS
total/synth/SWAP16/s0 1.51 1.51 -0.1% PASS
total/synth/SWAP2/s0 1.49 1.49 -0.0% PASS
total/synth/SWAP3/s0 1.49 1.49 -0.0% PASS
total/synth/SWAP4/s0 1.49 1.49 -0.1% PASS
total/synth/SWAP5/s0 1.49 1.49 -0.1% PASS
total/synth/SWAP6/s0 1.49 1.49 +0.0% PASS
total/synth/SWAP7/s0 1.50 1.50 +0.1% PASS
total/synth/SWAP8/s0 1.50 1.50 -0.2% PASS
total/synth/SWAP9/s0 1.50 1.50 +0.1% PASS
total/synth/XOR/b0 1.55 1.54 -0.0% PASS
total/synth/XOR/b1 1.55 1.55 +0.2% PASS
total/synth/loop_v1 4.75 4.74 -0.3% PASS
total/synth/loop_v2 4.74 4.74 -0.2% PASS

Summary: 194 benchmarks, 1 regressions


✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 1.54 1.53 -0.8% PASS
total/main/blake2b_huff/empty 0.07 0.07 +0.4% PASS
total/main/blake2b_shifts/8415nulls 5.23 5.28 +1.1% PASS
total/main/sha1_divs/5311 1.89 1.88 -0.8% PASS
total/main/sha1_divs/empty 0.03 0.03 -0.9% PASS
total/main/sha1_shifts/5311 2.76 2.74 -0.8% PASS
total/main/sha1_shifts/empty 0.04 0.04 -0.2% PASS
total/main/snailtracer/benchmark 52.48 52.75 +0.5% PASS
total/main/structarray_alloc/nfts_rank 0.29 0.30 +1.8% PASS
total/main/swap_math/insufficient_liquidity 0.02 0.02 -0.3% PASS
total/main/swap_math/received 0.02 0.02 -0.7% PASS
total/main/swap_math/spent 0.02 0.02 +0.1% PASS
total/main/weierstrudel/1 0.36 0.36 +0.2% PASS
total/main/weierstrudel/15 3.23 3.23 +0.3% PASS
total/micro/JUMPDEST_n0/empty 0.13 0.13 -0.1% PASS
total/micro/jump_around/empty 0.62 0.62 +1.4% PASS
total/micro/loop_with_many_jumpdests/empty 1.96 1.95 -0.0% PASS
total/micro/memory_grow_mload/by1 0.18 0.18 -0.9% PASS
total/micro/memory_grow_mload/by16 0.19 0.19 +0.5% PASS
total/micro/memory_grow_mload/by32 0.21 0.21 +0.2% PASS
total/micro/memory_grow_mload/nogrow 0.18 0.18 -0.1% PASS
total/micro/memory_grow_mstore/by1 0.19 0.19 -1.3% PASS
total/micro/memory_grow_mstore/by16 0.21 0.20 -1.4% PASS
total/micro/memory_grow_mstore/by32 0.22 0.22 -0.3% PASS
total/micro/memory_grow_mstore/nogrow 0.18 0.18 +0.6% PASS
total/micro/signextend/one 0.35 0.35 +0.2% PASS
total/micro/signextend/zero 0.35 0.35 +0.3% PASS
total/synth/ADD/b0 0.01 0.01 +0.3% PASS
total/synth/ADD/b1 0.01 0.01 +0.1% PASS
total/synth/ADDRESS/a0 0.16 0.16 +0.0% PASS
total/synth/ADDRESS/a1 0.16 0.16 +0.0% PASS
total/synth/AND/b0 0.01 0.01 -0.1% PASS
total/synth/AND/b1 0.01 0.01 +0.1% PASS
total/synth/BYTE/b0 1.96 1.95 -0.2% PASS
total/synth/BYTE/b1 2.33 2.32 -0.1% PASS
total/synth/CALLDATASIZE/a0 0.08 0.08 -1.1% PASS
total/synth/CALLDATASIZE/a1 0.08 0.08 -0.7% PASS
total/synth/CALLER/a0 0.16 0.16 +0.0% PASS
total/synth/CALLER/a1 0.16 0.16 +0.1% PASS
total/synth/CALLVALUE/a0 0.27 0.27 -0.0% PASS
total/synth/CALLVALUE/a1 0.28 0.28 -0.0% PASS
total/synth/CODESIZE/a0 0.08 0.08 +0.0% PASS
total/synth/CODESIZE/a1 0.08 0.08 +0.0% PASS
total/synth/DUP1/d0 0.01 0.01 +0.1% PASS
total/synth/DUP1/d1 0.01 0.01 +0.0% PASS
total/synth/DUP10/d0 0.01 0.01 +0.1% PASS
total/synth/DUP10/d1 0.01 0.01 +0.2% PASS
total/synth/DUP11/d0 0.01 0.01 +0.1% PASS
total/synth/DUP11/d1 0.01 0.01 +0.0% PASS
total/synth/DUP12/d0 0.01 0.01 +0.1% PASS
total/synth/DUP12/d1 0.01 0.01 -0.0% PASS
total/synth/DUP13/d0 0.01 0.01 -0.1% PASS
total/synth/DUP13/d1 0.01 0.01 +0.4% PASS
total/synth/DUP14/d0 0.01 0.01 -0.0% PASS
total/synth/DUP14/d1 0.01 0.01 +0.1% PASS
total/synth/DUP15/d0 0.01 0.01 -0.0% PASS
total/synth/DUP15/d1 0.01 0.01 +0.1% PASS
total/synth/DUP16/d0 0.01 0.01 -0.0% PASS
total/synth/DUP16/d1 0.01 0.01 +0.1% PASS
total/synth/DUP2/d0 0.01 0.01 -0.0% PASS
total/synth/DUP2/d1 0.01 0.01 +0.1% PASS
total/synth/DUP3/d0 0.01 0.01 +0.1% PASS
total/synth/DUP3/d1 0.01 0.01 +0.1% PASS
total/synth/DUP4/d0 0.01 0.01 +0.1% PASS
total/synth/DUP4/d1 0.01 0.01 +0.1% PASS
total/synth/DUP5/d0 0.01 0.01 +0.1% PASS
total/synth/DUP5/d1 0.01 0.01 +0.1% PASS
total/synth/DUP6/d0 0.01 0.01 +0.1% PASS
total/synth/DUP6/d1 0.01 0.01 +0.1% PASS
total/synth/DUP7/d0 0.01 0.01 +0.0% PASS
total/synth/DUP7/d1 0.01 0.01 +0.0% PASS
total/synth/DUP8/d0 0.01 0.01 +0.0% PASS
total/synth/DUP8/d1 0.01 0.01 +0.0% PASS
total/synth/DUP9/d0 0.01 0.01 +0.1% PASS
total/synth/DUP9/d1 0.01 0.01 +0.3% PASS
total/synth/EQ/b0 0.01 0.01 +0.0% PASS
total/synth/EQ/b1 0.01 0.01 +0.1% PASS
total/synth/GAS/a0 0.80 0.80 +0.1% PASS
total/synth/GAS/a1 0.76 0.76 +0.0% PASS
total/synth/GT/b0 0.01 0.01 -0.0% PASS
total/synth/GT/b1 0.01 0.01 +0.0% PASS
total/synth/ISZERO/u0 0.01 0.01 +0.0% PASS
total/synth/JUMPDEST/n0 0.13 0.13 +0.1% PASS
total/synth/LT/b0 0.01 0.01 +0.0% PASS
total/synth/LT/b1 0.01 0.01 +0.1% PASS
total/synth/MSIZE/a0 0.01 0.01 +0.1% PASS
total/synth/MSIZE/a1 0.01 0.01 +0.2% PASS
total/synth/MUL/b0 0.01 0.01 +0.1% PASS
total/synth/MUL/b1 0.01 0.01 +0.0% PASS
total/synth/NOT/u0 0.01 0.01 +0.1% PASS
total/synth/OR/b0 0.01 0.01 -0.0% PASS
total/synth/OR/b1 0.01 0.01 +0.2% PASS
total/synth/PC/a0 0.01 0.01 +0.2% PASS
total/synth/PC/a1 0.01 0.01 +0.0% PASS
total/synth/PUSH1/p0 0.01 0.01 +0.0% PASS
total/synth/PUSH1/p1 0.01 0.01 +0.2% PASS
total/synth/PUSH10/p0 0.01 0.01 +0.2% PASS
total/synth/PUSH10/p1 0.01 0.01 +0.1% PASS
total/synth/PUSH11/p0 0.01 0.01 +0.1% PASS
total/synth/PUSH11/p1 0.01 0.01 +0.1% PASS
total/synth/PUSH12/p0 0.01 0.01 +0.0% PASS
total/synth/PUSH12/p1 0.01 0.01 +0.1% PASS
total/synth/PUSH13/p0 0.01 0.01 +0.2% PASS
total/synth/PUSH13/p1 0.01 0.01 +0.1% PASS
total/synth/PUSH14/p0 0.01 0.01 +0.2% PASS
total/synth/PUSH14/p1 0.01 0.01 +0.2% PASS
total/synth/PUSH15/p0 0.01 0.01 +0.1% PASS
total/synth/PUSH15/p1 0.01 0.01 -0.0% PASS
total/synth/PUSH16/p0 0.01 0.01 +0.1% PASS
total/synth/PUSH16/p1 0.01 0.01 +0.1% PASS
total/synth/PUSH17/p0 0.01 0.01 +0.1% PASS
total/synth/PUSH17/p1 0.01 0.01 +0.0% PASS
total/synth/PUSH18/p0 0.01 0.01 -0.0% PASS
total/synth/PUSH18/p1 0.01 0.01 +0.1% PASS
total/synth/PUSH19/p0 0.01 0.01 +0.1% PASS
total/synth/PUSH19/p1 0.01 0.01 -0.1% PASS
total/synth/PUSH2/p0 0.01 0.01 -0.0% PASS
total/synth/PUSH2/p1 0.01 0.01 -0.0% PASS
total/synth/PUSH20/p0 0.01 0.01 -0.1% PASS
total/synth/PUSH20/p1 0.01 0.01 -0.0% PASS
total/synth/PUSH21/p0 0.01 0.01 -0.1% PASS
total/synth/PUSH21/p1 0.01 0.01 -0.1% PASS
total/synth/PUSH22/p0 1.08 1.08 -0.2% PASS
total/synth/PUSH22/p1 1.25 1.24 -0.6% PASS
total/synth/PUSH23/p0 1.08 1.08 -0.0% PASS
total/synth/PUSH23/p1 1.25 1.23 -2.1% PASS
total/synth/PUSH24/p0 1.08 1.08 +0.1% PASS
total/synth/PUSH24/p1 1.23 1.22 -0.8% PASS
total/synth/PUSH25/p0 1.08 1.08 +0.1% PASS
total/synth/PUSH25/p1 1.22 1.25 +2.0% PASS
total/synth/PUSH26/p0 0.84 0.84 -0.1% PASS
total/synth/PUSH26/p1 1.29 1.24 -3.7% PASS
total/synth/PUSH27/p0 1.08 1.08 +0.2% PASS
total/synth/PUSH27/p1 1.26 1.22 -2.5% PASS
total/synth/PUSH28/p0 1.08 1.08 +0.1% PASS
total/synth/PUSH28/p1 1.29 1.25 -3.4% PASS
total/synth/PUSH29/p0 1.08 1.08 +0.2% PASS
total/synth/PUSH29/p1 1.23 1.23 -0.2% PASS
total/synth/PUSH3/p0 0.01 0.01 +0.1% PASS
total/synth/PUSH3/p1 0.01 0.01 +0.1% PASS
total/synth/PUSH30/p0 1.09 1.08 -0.2% PASS
total/synth/PUSH30/p1 1.24 1.24 +0.3% PASS
total/synth/PUSH31/p0 1.08 1.08 +0.1% PASS
total/synth/PUSH31/p1 1.33 1.32 -1.4% PASS
total/synth/PUSH32/p0 1.08 1.08 -0.0% PASS
total/synth/PUSH32/p1 1.24 1.23 -0.1% PASS
total/synth/PUSH4/p0 0.01 0.01 +0.0% PASS
total/synth/PUSH4/p1 0.01 0.01 -0.0% PASS
total/synth/PUSH5/p0 0.01 0.01 +0.1% PASS
total/synth/PUSH5/p1 0.01 0.01 -0.0% PASS
total/synth/PUSH6/p0 0.01 0.01 -0.1% PASS
total/synth/PUSH6/p1 0.01 0.01 +0.2% PASS
total/synth/PUSH7/p0 0.01 0.01 +0.1% PASS
total/synth/PUSH7/p1 0.01 0.01 +0.0% PASS
total/synth/PUSH8/p0 0.01 0.01 -0.2% PASS
total/synth/PUSH8/p1 0.01 0.01 +0.1% PASS
total/synth/PUSH9/p0 0.01 0.01 -0.0% PASS
total/synth/PUSH9/p1 0.01 0.01 +0.0% PASS
total/synth/RETURNDATASIZE/a0 0.53 0.53 -0.2% PASS
total/synth/RETURNDATASIZE/a1 0.49 0.49 +0.0% PASS
total/synth/SAR/b0 3.78 3.77 -0.1% PASS
total/synth/SAR/b1 4.27 4.28 +0.2% PASS
total/synth/SGT/b0 0.01 0.01 +0.1% PASS
total/synth/SGT/b1 0.01 0.01 +0.1% PASS
total/synth/SHL/b0 3.06 3.06 -0.0% PASS
total/synth/SHL/b1 1.64 1.64 +0.2% PASS
total/synth/SHR/b0 3.11 3.11 +0.0% PASS
total/synth/SHR/b1 1.52 1.53 +0.4% PASS
total/synth/SIGNEXTEND/b0 3.13 3.12 -0.1% PASS
total/synth/SIGNEXTEND/b1 3.64 3.54 -2.5% PASS
total/synth/SLT/b0 0.01 0.01 +0.1% PASS
total/synth/SLT/b1 0.01 0.01 +0.1% PASS
total/synth/SUB/b0 0.01 0.01 +0.0% PASS
total/synth/SUB/b1 0.01 0.01 +0.3% PASS
total/synth/SWAP1/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP10/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP11/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP12/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP13/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP14/s0 0.01 0.01 +0.2% PASS
total/synth/SWAP15/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP16/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP2/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP3/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP4/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP5/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP6/s0 0.01 0.01 +0.0% PASS
total/synth/SWAP7/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP8/s0 0.01 0.01 +0.1% PASS
total/synth/SWAP9/s0 0.01 0.01 +0.0% PASS
total/synth/XOR/b0 0.01 0.01 +0.1% PASS
total/synth/XOR/b1 0.01 0.01 +0.1% PASS
total/synth/loop_v1 1.41 1.42 +0.7% PASS
total/synth/loop_v2 1.33 1.34 +0.7% PASS

Summary: 194 benchmarks, 0 regressions


@zoowii zoowii requested a review from Copilot March 24, 2026 02:50
@zoowii
Copy link
Copy Markdown
Contributor

zoowii commented Mar 24, 2026

conflict

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1366 to +1378
// If divisor has any upper limb set, b > a, so DIV = 0
MInstruction *Upper = createInstruction<BinaryInstruction>(
false, OP_or, I64Type, B[1],
createInstruction<BinaryInstruction>(false, OP_or, I64Type, B[2], B[3]));
MInstruction *HasUpper = createInstruction<CmpInstruction>(
false, CmpInstruction::ICMP_NE, &Ctx.I64Type, Upper, Zero);

MInstruction *A0 = createIntConstInstruction(I64Type, Dividend);
MInstruction *Q64 =
createInstruction<BinaryInstruction>(false, OP_udiv, I64Type, A0, B[0]);
MInstruction *DivResult =
createInstruction<SelectInstruction>(false, I64Type, HasUpper, Zero, Q64);

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleDivU64Dividend() can generate an OP_udiv by B[0] without guarding against the runtime case where the divisor is 0 (and all upper limbs are 0). On x86 lowering this becomes DIV64r, which traps on division-by-zero, but EVM DIV(x,0) must evaluate to 0. Add an explicit B == 0 check (e.g., OR-fold all limbs) and select 0 when the divisor is zero before emitting OP_udiv.

Copilot uses AI. Check for mistakes.
Comment on lines +1391 to +1402
MInstruction *Upper = createInstruction<BinaryInstruction>(
false, OP_or, I64Type, B[1],
createInstruction<BinaryInstruction>(false, OP_or, I64Type, B[2], B[3]));
MInstruction *HasUpper = createInstruction<CmpInstruction>(
false, CmpInstruction::ICMP_NE, &Ctx.I64Type, Upper, Zero);

MInstruction *A0 = createIntConstInstruction(I64Type, Dividend);
MInstruction *R64 =
createInstruction<BinaryInstruction>(false, OP_urem, I64Type, A0, B[0]);
MInstruction *ModResult =
createInstruction<SelectInstruction>(false, I64Type, HasUpper, A0, R64);

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleModU64Dividend() can emit OP_urem by B[0] without guarding against the runtime case where the divisor is 0 (and upper limbs are 0). On x86 this lowers to DIV64r-based remainder and will trap, while EVM MOD(x,0) must evaluate to 0. Add a B == 0 check and select 0 as the result when the divisor is zero before evaluating OP_urem.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +73
OPCODE(evm_udiv128_by64) // RDX:RAX / r64 -> quotient (RAX)
OPCODE(evm_urem128_by64) // extract remainder (RDX) from evm_udiv128_by64
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new MIR opcode comments for evm_udiv128_by64 / evm_urem128_by64 describe x86 register behavior (RDX:RAX / RAX,RDX). Since MIR is target-independent, this is misleading and may hide important invariants (e.g., x86 DIV traps if high limb >= divisor). Consider rewording these comments to describe the abstract operation (128/64 udiv and extracting remainder) and, if needed, document x86-specific lowering constraints in the x86 backend instead.

Suggested change
OPCODE(evm_udiv128_by64) // RDX:RAX / r64 -> quotient (RAX)
OPCODE(evm_urem128_by64) // extract remainder (RDX) from evm_udiv128_by64
OPCODE(evm_udiv128_by64) // unsigned 128-bit (hi:lo) divided by 64-bit divisor -> 64-bit quotient
OPCODE(evm_urem128_by64) // unsigned remainder from the same 128/64 division as evm_udiv128_by64

Copilot uses AI. Check for mistakes.
uint64_t D = DivisorOp.getConstValue()[0];
if (D != 0)
if (D != 0) {
if (!DividendOp.isConstant()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 1506 has check if (DividendOp.isConstant() && DivisorOp.isConstant()), the condition !DividendOp.isConstant() will always be true here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants