perf(compiler): optimize x86 u256 mul with bmi2 adx#428
perf(compiler): optimize x86 u256 mul with bmi2 adx#428abmcar wants to merge 14 commits intoDTVMStack:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves x86_64 EVM U256 MUL codegen by detecting adx/bmi2 support from the host feature string and selecting a dedicated ADX+BMI2 lowering path that uses MULX plus ADCX/ADOX carry chains.
Changes:
- Add host feature detection for
adxandbmi2in the x86 feature string. - Split
lowerEvmU256MulExpr()into legacy and ADX+BMI2 implementations. - Introduce helpers for emitting
MULX,ADCX,ADOX, and carry-chain setup/collection.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/compiler/target/x86/x86lowering.h | Declares new helpers and the legacy vs ADX lowering split for U256 MUL. |
| src/compiler/target/x86/x86lowering.cpp | Implements ADX+BMI2 lowering using MULX + ADCX/ADOX scheduling and gates it on subtarget features. |
| src/compiler/context.cpp | Extends host x86 feature-string construction to include adx and bmi2 as optional features. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 1 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
| CgRegister | ||
| X86CgLowering::lowerEvmU256MulExpr(const EvmU256MulInstruction &Inst) { | ||
| if (Subtarget->hasBMI2() && Subtarget->hasADX()) { | ||
| return lowerEvmU256MulExprAdx(Inst); |
There was a problem hiding this comment.
this feature only enabled when EVMJIT
There was a problem hiding this comment.
Okay, I'll add a comment for this. Besides that, do we need any additional constraints?
Summary
adxandbmi2in the x86 compiler target feature stringU256 MULto a BMI2+ADX row-wiseMULX+ADCX/ADOXschedule on supported x86_64 hostsValidation
tools/format.sh checkcmake --build /home/abmcar/DTVM-mulx-adx-opt/build-perf -j 20 --target libdtvmapi.soevmc run --benchchecks againstmain; gas and output matched in all casesPerformance
Local performance was re-measured with two distinct runtime 256-bit operands so the benchmark exercises the generic non-square
U256 MULpath.generic-mul-once182 ns183 ns+0.55%generic-mul-loop-64502 ns543 ns+7.55%generic-mul-loop-2551228 ns1372 ns+10.50%The single-shot case stays close to noise, but the hot generic MUL loop shows a stable improvement of about
8%to10%.Notes