perf(compiler): optimize x86 u256 mul with bmi2 adx by abmcar · Pull Request #428 · DTVMStack/DTVM

abmcar · 2026-03-23T08:13:18Z

Summary

detect adx and bmi2 in the x86 compiler target feature string
lower generic EVM U256 MUL to a BMI2+ADX row-wise MULX + ADCX/ADOX schedule on supported x86_64 hosts
trim reverted square / MULX-only experiment paths and drop the temporary benchmark helper from the final branch

Validation

tools/format.sh check
cmake --build /home/abmcar/DTVM-mulx-adx-opt/build-perf -j 20 --target libdtvmapi.so
local evmc run --bench checks against main; gas and output matched in all cases

Performance

Local performance was re-measured with two distinct runtime 256-bit operands so the benchmark exercises the generic non-square U256 MUL path.

Case	Current	Main	Delta
`generic-mul-once`	`182 ns`	`183 ns`	`+0.55%`
`generic-mul-loop-64`	`502 ns`	`543 ns`	`+7.55%`
`generic-mul-loop-255`	`1228 ns`	`1372 ns`	`+10.50%`

The single-shot case stays close to noise, but the hot generic MUL loop shows a stable improvement of about 8% to 10%.

Notes

the final branch intentionally keeps only the generic BMI2+ADX path as the optimization path over legacy lowering
square-specialized and MULX-only experimental paths are not part of the final PR state

Copilot

Pull request overview

This PR improves x86_64 EVM U256 MUL codegen by detecting adx/bmi2 support from the host feature string and selecting a dedicated ADX+BMI2 lowering path that uses MULX plus ADCX/ADOX carry chains.

Changes:

Add host feature detection for adx and bmi2 in the x86 feature string.
Split lowerEvmU256MulExpr() into legacy and ADX+BMI2 implementations.
Introduce helpers for emitting MULX, ADCX, ADOX, and carry-chain setup/collection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
src/compiler/target/x86/x86lowering.h	Declares new helpers and the legacy vs ADX lowering split for U256 MUL.
src/compiler/target/x86/x86lowering.cpp	Implements ADX+BMI2 lowering using `MULX` + `ADCX`/`ADOX` scheduling and gates it on subtarget features.
src/compiler/context.cpp	Extends host x86 feature-string construction to include `adx` and `bmi2` as optional features.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/compiler/target/x86/x86lowering.cpp

github-actions · 2026-03-23T08:49:31Z

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 25%)

Benchmark	Baseline (us)	Current (us)	Change	Status
total/main/blake2b_huff/8415nulls	1.52	1.50	-1.1%	PASS
total/main/blake2b_huff/empty	0.02	0.02	-1.6%	PASS
total/main/blake2b_shifts/8415nulls	11.98	11.89	-0.7%	PASS
total/main/sha1_divs/5311	5.23	5.27	+0.8%	PASS
total/main/sha1_divs/empty	0.07	0.06	-5.4%	PASS
total/main/sha1_shifts/5311	3.11	2.85	-8.4%	PASS
total/main/sha1_shifts/empty	0.04	0.04	-7.8%	PASS
total/main/snailtracer/benchmark	57.59	53.03	-7.9%	PASS
total/main/structarray_alloc/nfts_rank	1.07	1.03	-3.9%	PASS
total/main/swap_math/insufficient_liquidity	0.00	0.00	+0.6%	PASS
total/main/swap_math/received	0.01	0.01	-1.7%	PASS
total/main/swap_math/spent	0.00	0.00	-1.4%	PASS
total/main/weierstrudel/1	0.29	0.29	-0.9%	PASS
total/main/weierstrudel/15	3.16	3.14	-0.8%	PASS
total/micro/JUMPDEST_n0/empty	1.31	1.63	+24.9%	PASS
total/micro/jump_around/empty	0.09	0.10	+2.6%	PASS
total/micro/loop_with_many_jumpdests/empty	19.96	27.52	+37.9%	REGRESSED
total/micro/memory_grow_mload/by1	0.09	0.10	+1.3%	PASS
total/micro/memory_grow_mload/by16	0.12	0.10	-11.8%	PASS
total/micro/memory_grow_mload/by32	0.11	0.11	-2.6%	PASS
total/micro/memory_grow_mload/nogrow	0.09	0.09	+7.1%	PASS
total/micro/memory_grow_mstore/by1	0.09	0.09	-2.1%	PASS
total/micro/memory_grow_mstore/by16	0.11	0.14	+25.5%	PASS
total/micro/memory_grow_mstore/by32	0.12	0.12	-1.2%	PASS
total/micro/memory_grow_mstore/nogrow	0.09	0.09	-0.3%	PASS
total/micro/signextend/one	0.24	0.23	-2.7%	PASS
total/micro/signextend/zero	0.24	0.23	-3.1%	PASS
total/synth/ADD/b0	3.72	1.98	-46.8%	PASS
total/synth/ADD/b1	1.98	1.98	-0.2%	PASS
total/synth/ADDRESS/a0	4.84	4.83	-0.3%	PASS
total/synth/ADDRESS/a1	5.41	5.26	-2.7%	PASS
total/synth/AND/b0	1.64	1.63	-0.8%	PASS
total/synth/AND/b1	1.71	1.71	-0.0%	PASS
total/synth/BYTE/b0	6.27	6.13	-2.2%	PASS
total/synth/BYTE/b1	4.82	4.76	-1.2%	PASS
total/synth/CALLDATASIZE/a0	3.18	3.02	-5.1%	PASS
total/synth/CALLDATASIZE/a1	3.73	3.82	+2.3%	PASS
total/synth/CALLER/a0	4.85	4.82	-0.5%	PASS
total/synth/CALLER/a1	5.45	5.26	-3.5%	PASS
total/synth/CALLVALUE/a0	3.26	3.10	-4.9%	PASS
total/synth/CALLVALUE/a1	3.36	3.44	+2.4%	PASS
total/synth/CODESIZE/a0	3.51	3.42	-2.3%	PASS
total/synth/CODESIZE/a1	3.85	3.76	-2.3%	PASS
total/synth/DUP1/d0	5.23	1.14	-78.1%	PASS
total/synth/DUP1/d1	1.16	1.23	+6.0%	PASS
total/synth/DUP10/d0	5.21	1.15	-78.0%	PASS
total/synth/DUP10/d1	1.16	1.23	+6.8%	PASS
total/synth/DUP11/d0	5.21	1.15	-78.0%	PASS
total/synth/DUP11/d1	1.15	1.00	-13.3%	PASS
total/synth/DUP12/d0	5.19	0.91	-82.5%	PASS
total/synth/DUP12/d1	1.16	1.00	-13.6%	PASS
total/synth/DUP13/d0	5.21	0.91	-82.6%	PASS
total/synth/DUP13/d1	1.39	1.00	-28.3%	PASS
total/synth/DUP14/d0	5.21	1.22	-76.5%	PASS
total/synth/DUP14/d1	1.16	1.00	-13.5%	PASS
total/synth/DUP15/d0	5.21	1.22	-76.6%	PASS
total/synth/DUP15/d1	1.16	1.00	-13.7%	PASS
total/synth/DUP16/d0	5.22	1.15	-78.0%	PASS
total/synth/DUP16/d1	1.23	1.00	-18.6%	PASS
total/synth/DUP2/d0	5.23	1.15	-78.1%	PASS
total/synth/DUP2/d1	1.39	1.00	-28.5%	PASS
total/synth/DUP3/d0	5.21	1.15	-78.0%	PASS
total/synth/DUP3/d1	1.17	1.00	-15.0%	PASS
total/synth/DUP4/d0	5.23	1.15	-78.0%	PASS
total/synth/DUP4/d1	1.16	1.00	-14.1%	PASS
total/synth/DUP5/d0	5.23	0.91	-82.7%	PASS
total/synth/DUP5/d1	1.15	1.00	-13.7%	PASS
total/synth/DUP6/d0	5.24	0.99	-81.2%	PASS
total/synth/DUP6/d1	1.39	1.00	-28.5%	PASS
total/synth/DUP7/d0	5.24	1.15	-78.1%	PASS
total/synth/DUP7/d1	1.16	1.00	-14.1%	PASS
total/synth/DUP8/d0	5.22	1.15	-78.0%	PASS
total/synth/DUP8/d1	1.16	1.00	-13.7%	PASS
total/synth/DUP9/d0	5.22	1.15	-78.0%	PASS
total/synth/DUP9/d1	1.16	1.00	-13.7%	PASS
total/synth/EQ/b0	2.69	2.76	+2.5%	PASS
total/synth/EQ/b1	1.32	1.39	+5.7%	PASS
total/synth/GAS/a0	3.75	3.67	-2.3%	PASS
total/synth/GAS/a1	3.77	3.69	-2.3%	PASS
total/synth/GT/b0	2.61	2.63	+0.5%	PASS
total/synth/GT/b1	1.56	1.47	-5.2%	PASS
total/synth/ISZERO/u0	1.14	0.98	-14.0%	PASS
total/synth/JUMPDEST/n0	1.31	1.66	+27.1%	PASS
total/synth/LT/b0	2.61	2.67	+2.3%	PASS
total/synth/LT/b1	1.56	1.47	-5.7%	PASS
total/synth/MSIZE/a0	4.24	4.24	-0.1%	PASS
total/synth/MSIZE/a1	4.83	4.65	-3.7%	PASS
total/synth/MUL/b0	5.33	5.29	-0.7%	PASS
total/synth/MUL/b1	5.37	5.29	-1.6%	PASS
total/synth/NOT/u0	1.67	1.66	-0.6%	PASS
total/synth/OR/b0	1.64	1.63	-0.5%	PASS
total/synth/OR/b1	1.71	1.71	+0.0%	PASS
total/synth/PC/a0	3.10	3.02	-2.6%	PASS
total/synth/PC/a1	4.17	3.74	-10.2%	PASS
total/synth/PUSH1/p0	1.24	1.07	-13.8%	PASS
total/synth/PUSH1/p1	1.33	1.16	-12.6%	PASS
total/synth/PUSH10/p0	1.31	1.07	-18.9%	PASS
total/synth/PUSH10/p1	1.11	1.21	+8.5%	PASS
total/synth/PUSH11/p0	1.31	0.99	-25.0%	PASS
total/synth/PUSH11/p1	1.36	1.20	-11.3%	PASS
total/synth/PUSH12/p0	1.31	0.86	-34.3%	PASS
total/synth/PUSH12/p1	1.35	1.20	-11.3%	PASS
total/synth/PUSH13/p0	1.31	0.85	-35.2%	PASS
total/synth/PUSH13/p1	1.35	1.20	-10.9%	PASS
total/synth/PUSH14/p0	1.31	0.99	-24.4%	PASS
total/synth/PUSH14/p1	1.36	1.22	-10.6%	PASS
total/synth/PUSH15/p0	1.31	0.85	-35.4%	PASS
total/synth/PUSH15/p1	1.21	1.31	+8.5%	PASS
total/synth/PUSH16/p0	1.07	0.85	-20.5%	PASS
total/synth/PUSH16/p1	1.35	1.22	-9.6%	PASS
total/synth/PUSH17/p0	1.31	1.07	-18.7%	PASS
total/synth/PUSH17/p1	1.36	1.21	-11.2%	PASS
total/synth/PUSH18/p0	1.31	0.85	-35.0%	PASS
total/synth/PUSH18/p1	1.36	1.20	-11.9%	PASS
total/synth/PUSH19/p0	1.26	0.99	-22.1%	PASS
total/synth/PUSH19/p1	1.36	1.22	-10.4%	PASS
total/synth/PUSH2/p0	1.31	0.83	-36.7%	PASS
total/synth/PUSH2/p1	1.33	1.17	-12.0%	PASS
total/synth/PUSH20/p0	1.23	0.85	-30.5%	PASS
total/synth/PUSH20/p1	1.37	1.22	-11.5%	PASS
total/synth/PUSH21/p0	1.23	1.07	-13.6%	PASS
total/synth/PUSH21/p1	1.11	1.22	+9.8%	PASS
total/synth/PUSH22/p0	1.25	0.85	-31.6%	PASS
total/synth/PUSH22/p1	1.36	1.22	-10.5%	PASS
total/synth/PUSH23/p0	1.23	1.07	-13.5%	PASS
total/synth/PUSH23/p1	1.13	1.22	+7.4%	PASS
total/synth/PUSH24/p0	1.31	0.83	-37.0%	PASS
total/synth/PUSH24/p1	1.12	1.22	+9.2%	PASS
total/synth/PUSH25/p0	1.32	1.07	-18.9%	PASS
total/synth/PUSH25/p1	1.12	1.24	+11.0%	PASS
total/synth/PUSH26/p0	1.07	0.85	-20.4%	PASS
total/synth/PUSH26/p1	1.36	1.23	-9.1%	PASS
total/synth/PUSH27/p0	1.07	0.85	-20.7%	PASS
total/synth/PUSH27/p1	1.12	1.24	+10.6%	PASS
total/synth/PUSH28/p0	1.31	1.07	-18.5%	PASS
total/synth/PUSH28/p1	1.14	1.24	+8.8%	PASS
total/synth/PUSH29/p0	1.31	0.85	-35.4%	PASS
total/synth/PUSH29/p1	1.37	1.22	-10.5%	PASS
total/synth/PUSH3/p0	1.31	0.83	-36.6%	PASS
total/synth/PUSH3/p1	1.12	1.19	+6.7%	PASS
total/synth/PUSH30/p0	1.24	1.01	-18.5%	PASS
total/synth/PUSH30/p1	1.36	1.22	-10.1%	PASS
total/synth/PUSH31/p0	1.31	0.84	-36.2%	PASS
total/synth/PUSH31/p1	1.46	1.36	-6.8%	PASS
total/synth/PUSH32/p0	1.31	1.07	-18.7%	PASS
total/synth/PUSH32/p1	1.14	1.24	+8.6%	PASS
total/synth/PUSH4/p0	1.23	0.83	-32.3%	PASS
total/synth/PUSH4/p1	1.12	1.19	+6.1%	PASS
total/synth/PUSH5/p0	1.31	0.84	-36.3%	PASS
total/synth/PUSH5/p1	1.34	1.20	-10.9%	PASS
total/synth/PUSH6/p0	1.32	1.07	-19.0%	PASS
total/synth/PUSH6/p1	1.13	1.20	+6.3%	PASS
total/synth/PUSH7/p0	1.31	0.86	-34.6%	PASS
total/synth/PUSH7/p1	1.11	1.21	+8.7%	PASS
total/synth/PUSH8/p0	1.31	0.83	-36.7%	PASS
total/synth/PUSH8/p1	1.37	1.21	-11.7%	PASS
total/synth/PUSH9/p0	1.23	0.80	-34.7%	PASS
total/synth/PUSH9/p1	1.11	1.20	+7.6%	PASS
total/synth/RETURNDATASIZE/a0	3.59	3.42	-4.6%	PASS
total/synth/RETURNDATASIZE/a1	3.69	3.76	+1.8%	PASS
total/synth/SAR/b0	3.77	3.77	+0.1%	PASS
total/synth/SAR/b1	4.27	4.26	-0.1%	PASS
total/synth/SGT/b0	2.59	2.59	-0.3%	PASS
total/synth/SGT/b1	1.72	1.55	-9.6%	PASS
total/synth/SHL/b0	3.05	3.04	-0.4%	PASS
total/synth/SHL/b1	1.75	1.75	+0.1%	PASS
total/synth/SHR/b0	3.10	3.08	-0.8%	PASS
total/synth/SHR/b1	1.68	1.59	-5.0%	PASS
total/synth/SIGNEXTEND/b0	3.35	3.52	+5.2%	PASS
total/synth/SIGNEXTEND/b1	3.43	3.63	+5.9%	PASS
total/synth/SLT/b0	2.60	2.62	+0.7%	PASS
total/synth/SLT/b1	1.73	1.64	-5.2%	PASS
total/synth/SUB/b0	1.97	1.98	+0.5%	PASS
total/synth/SUB/b1	1.99	1.97	-0.6%	PASS
total/synth/SWAP1/s0	1.80	1.49	-17.2%	PASS
total/synth/SWAP10/s0	1.84	1.50	-18.1%	PASS
total/synth/SWAP11/s0	1.81	1.50	-17.1%	PASS
total/synth/SWAP12/s0	1.81	1.51	-16.7%	PASS
total/synth/SWAP13/s0	1.81	1.51	-17.0%	PASS
total/synth/SWAP14/s0	1.82	1.51	-17.2%	PASS
total/synth/SWAP15/s0	1.82	1.51	-16.9%	PASS
total/synth/SWAP16/s0	1.82	1.51	-17.1%	PASS
total/synth/SWAP2/s0	1.80	1.49	-17.4%	PASS
total/synth/SWAP3/s0	1.80	1.49	-17.3%	PASS
total/synth/SWAP4/s0	1.80	1.49	-17.3%	PASS
total/synth/SWAP5/s0	1.81	1.49	-17.5%	PASS
total/synth/SWAP6/s0	1.81	1.49	-17.4%	PASS
total/synth/SWAP7/s0	1.81	1.50	-17.1%	PASS
total/synth/SWAP8/s0	1.81	1.50	-17.1%	PASS
total/synth/SWAP9/s0	1.81	1.50	-17.1%	PASS
total/synth/XOR/b0	1.55	1.54	-0.2%	PASS
total/synth/XOR/b1	1.55	1.55	-0.0%	PASS
total/synth/loop_v1	4.80	4.76	-0.7%	PASS
total/synth/loop_v2	4.79	4.75	-0.7%	PASS

Summary: 194 benchmarks, 1 regressions

✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 25%)

Benchmark	Baseline (us)	Current (us)	Change	Status
total/main/blake2b_huff/8415nulls	1.42	1.60	+12.4%	PASS
total/main/blake2b_huff/empty	0.07	0.07	+2.4%	PASS
total/main/blake2b_shifts/8415nulls	5.95	5.33	-10.5%	PASS
total/main/sha1_divs/5311	3.27	1.87	-42.9%	PASS
total/main/sha1_divs/empty	0.04	0.03	-32.2%	PASS
total/main/sha1_shifts/5311	3.44	2.76	-19.8%	PASS
total/main/sha1_shifts/empty	0.05	0.04	-16.3%	PASS
total/main/snailtracer/benchmark	53.51	53.28	-0.4%	PASS
total/main/structarray_alloc/nfts_rank	0.28	0.29	+5.0%	PASS
total/main/swap_math/insufficient_liquidity	0.02	0.02	+2.3%	PASS
total/main/swap_math/received	0.02	0.02	-1.6%	PASS
total/main/swap_math/spent	0.02	0.02	-1.4%	PASS
total/main/weierstrudel/1	0.33	0.35	+5.9%	PASS
total/main/weierstrudel/15	2.99	3.31	+10.8%	PASS
total/micro/JUMPDEST_n0/empty	0.12	0.13	+8.2%	PASS
total/micro/jump_around/empty	0.58	0.62	+7.5%	PASS
total/micro/loop_with_many_jumpdests/empty	1.81	1.98	+9.4%	PASS
total/micro/memory_grow_mload/by1	0.18	0.18	+3.1%	PASS
total/micro/memory_grow_mload/by16	0.21	0.20	-7.6%	PASS
total/micro/memory_grow_mload/by32	0.20	0.24	+19.7%	PASS
total/micro/memory_grow_mload/nogrow	0.18	0.21	+17.0%	PASS
total/micro/memory_grow_mstore/by1	0.18	0.21	+19.0%	PASS
total/micro/memory_grow_mstore/by16	0.19	0.23	+19.7%	PASS
total/micro/memory_grow_mstore/by32	0.23	0.21	-7.3%	PASS
total/micro/memory_grow_mstore/nogrow	0.18	0.18	+0.7%	PASS
total/micro/signextend/one	0.33	0.38	+13.7%	PASS
total/micro/signextend/zero	0.33	0.34	+4.7%	PASS
total/synth/ADD/b0	0.01	0.01	+5.2%	PASS
total/synth/ADD/b1	0.01	0.01	+5.0%	PASS
total/synth/ADDRESS/a0	0.15	0.16	+7.4%	PASS
total/synth/ADDRESS/a1	0.15	0.16	+7.8%	PASS
total/synth/AND/b0	0.01	0.01	+5.5%	PASS
total/synth/AND/b1	0.01	0.01	+5.0%	PASS
total/synth/BYTE/b0	1.81	1.95	+8.3%	PASS
total/synth/BYTE/b1	2.12	2.33	+9.6%	PASS
total/synth/CALLDATASIZE/a0	0.07	0.08	+10.5%	PASS
total/synth/CALLDATASIZE/a1	0.07	0.08	+10.3%	PASS
total/synth/CALLER/a0	0.15	0.16	+7.5%	PASS
total/synth/CALLER/a1	0.15	0.16	+7.5%	PASS
total/synth/CALLVALUE/a0	0.25	0.27	+7.8%	PASS
total/synth/CALLVALUE/a1	0.25	0.28	+10.6%	PASS
total/synth/CODESIZE/a0	0.07	0.08	+6.7%	PASS
total/synth/CODESIZE/a1	0.07	0.08	+6.8%	PASS
total/synth/DUP1/d0	0.01	0.01	+4.7%	PASS
total/synth/DUP1/d1	0.01	0.01	+5.3%	PASS
total/synth/DUP10/d0	0.01	0.01	+9.9%	PASS
total/synth/DUP10/d1	0.01	0.01	+5.0%	PASS
total/synth/DUP11/d0	0.01	0.01	+4.5%	PASS
total/synth/DUP11/d1	0.01	0.01	+4.4%	PASS
total/synth/DUP12/d0	0.01	0.01	+4.8%	PASS
total/synth/DUP12/d1	0.01	0.01	+4.8%	PASS
total/synth/DUP13/d0	0.01	0.01	+4.6%	PASS
total/synth/DUP13/d1	0.01	0.01	+4.7%	PASS
total/synth/DUP14/d0	0.01	0.01	+5.0%	PASS
total/synth/DUP14/d1	0.01	0.01	+4.0%	PASS
total/synth/DUP15/d0	0.01	0.01	+4.9%	PASS
total/synth/DUP15/d1	0.01	0.01	+4.8%	PASS
total/synth/DUP16/d0	0.01	0.01	+5.0%	PASS
total/synth/DUP16/d1	0.01	0.01	+5.0%	PASS
total/synth/DUP2/d0	0.01	0.01	+4.9%	PASS
total/synth/DUP2/d1	0.01	0.01	+5.1%	PASS
total/synth/DUP3/d0	0.01	0.01	+4.9%	PASS
total/synth/DUP3/d1	0.01	0.01	+4.9%	PASS
total/synth/DUP4/d0	0.01	0.01	+5.0%	PASS
total/synth/DUP4/d1	0.01	0.01	+5.2%	PASS
total/synth/DUP5/d0	0.01	0.01	+5.0%	PASS
total/synth/DUP5/d1	0.01	0.01	+4.9%	PASS
total/synth/DUP6/d0	0.01	0.01	+5.1%	PASS
total/synth/DUP6/d1	0.01	0.01	+4.8%	PASS
total/synth/DUP7/d0	0.01	0.01	+4.9%	PASS
total/synth/DUP7/d1	0.01	0.01	+4.8%	PASS
total/synth/DUP8/d0	0.01	0.01	+4.7%	PASS
total/synth/DUP8/d1	0.01	0.01	+5.0%	PASS
total/synth/DUP9/d0	0.01	0.01	+5.1%	PASS
total/synth/DUP9/d1	0.01	0.01	+5.0%	PASS
total/synth/EQ/b0	0.01	0.01	+4.7%	PASS
total/synth/EQ/b1	0.01	0.01	+4.6%	PASS
total/synth/GAS/a0	0.73	0.80	+9.7%	PASS
total/synth/GAS/a1	0.73	0.76	+5.1%	PASS
total/synth/GT/b0	0.01	0.01	+5.0%	PASS
total/synth/GT/b1	0.01	0.01	+4.9%	PASS
total/synth/ISZERO/u0	0.01	0.01	+2.0%	PASS
total/synth/JUMPDEST/n0	0.13	0.14	+8.0%	PASS
total/synth/LT/b0	0.01	0.01	+4.8%	PASS
total/synth/LT/b1	0.01	0.01	+5.0%	PASS
total/synth/MSIZE/a0	0.01	0.01	+2.1%	PASS
total/synth/MSIZE/a1	0.01	0.01	+1.8%	PASS
total/synth/MUL/b0	0.01	0.01	+4.6%	PASS
total/synth/MUL/b1	0.01	0.01	+5.1%	PASS
total/synth/NOT/u0	0.01	0.01	+1.7%	PASS
total/synth/OR/b0	0.01	0.01	+4.9%	PASS
total/synth/OR/b1	0.01	0.01	+4.9%	PASS
total/synth/PC/a0	0.01	0.01	+2.4%	PASS
total/synth/PC/a1	0.01	0.01	+2.0%	PASS
total/synth/PUSH1/p0	0.01	0.01	+5.2%	PASS
total/synth/PUSH1/p1	0.01	0.01	+5.1%	PASS
total/synth/PUSH10/p0	0.01	0.01	+5.1%	PASS
total/synth/PUSH10/p1	0.01	0.01	+4.7%	PASS
total/synth/PUSH11/p0	0.01	0.01	+5.1%	PASS
total/synth/PUSH11/p1	0.01	0.01	+5.1%	PASS
total/synth/PUSH12/p0	0.01	0.01	+5.2%	PASS
total/synth/PUSH12/p1	0.01	0.01	+5.4%	PASS
total/synth/PUSH13/p0	0.01	0.01	+5.1%	PASS
total/synth/PUSH13/p1	0.01	0.01	+5.0%	PASS
total/synth/PUSH14/p0	0.01	0.01	+6.6%	PASS
total/synth/PUSH14/p1	0.01	0.01	+5.0%	PASS
total/synth/PUSH15/p0	0.01	0.01	+5.3%	PASS
total/synth/PUSH15/p1	0.01	0.01	+4.7%	PASS
total/synth/PUSH16/p0	0.01	0.01	+5.4%	PASS
total/synth/PUSH16/p1	0.01	0.01	+4.9%	PASS
total/synth/PUSH17/p0	0.01	0.01	+5.2%	PASS
total/synth/PUSH17/p1	0.01	0.01	+4.9%	PASS
total/synth/PUSH18/p0	0.01	0.01	+5.2%	PASS
total/synth/PUSH18/p1	0.01	0.01	+5.1%	PASS
total/synth/PUSH19/p0	0.01	0.01	+5.8%	PASS
total/synth/PUSH19/p1	0.01	0.01	+4.5%	PASS
total/synth/PUSH2/p0	0.01	0.01	+5.0%	PASS
total/synth/PUSH2/p1	0.01	0.01	+5.2%	PASS
total/synth/PUSH20/p0	0.01	0.01	+5.1%	PASS
total/synth/PUSH20/p1	0.01	0.01	+4.6%	PASS
total/synth/PUSH21/p0	0.01	0.01	+5.0%	PASS
total/synth/PUSH21/p1	0.01	0.01	+5.1%	PASS
total/synth/PUSH22/p0	1.22	1.11	-9.2%	PASS
total/synth/PUSH22/p1	1.06	1.23	+16.1%	PASS
total/synth/PUSH23/p0	1.23	1.08	-12.1%	PASS
total/synth/PUSH23/p1	1.08	1.22	+12.7%	PASS
total/synth/PUSH24/p0	1.23	1.08	-12.0%	PASS
total/synth/PUSH24/p1	1.07	1.25	+17.5%	PASS
total/synth/PUSH25/p0	1.26	1.11	-11.6%	PASS
total/synth/PUSH25/p1	1.07	1.22	+13.6%	PASS
total/synth/PUSH26/p0	0.96	0.83	-12.9%	PASS
total/synth/PUSH26/p1	1.05	1.21	+15.0%	PASS
total/synth/PUSH27/p0	1.22	1.08	-11.7%	PASS
total/synth/PUSH27/p1	1.06	1.26	+18.0%	PASS
total/synth/PUSH28/p0	1.22	1.08	-11.9%	PASS
total/synth/PUSH28/p1	1.07	1.23	+15.2%	PASS
total/synth/PUSH29/p0	1.23	1.08	-11.9%	PASS
total/synth/PUSH29/p1	1.08	1.21	+12.5%	PASS
total/synth/PUSH3/p0	0.01	0.01	+5.3%	PASS
total/synth/PUSH3/p1	0.01	0.01	+5.2%	PASS
total/synth/PUSH30/p0	1.27	1.11	-12.2%	PASS
total/synth/PUSH30/p1	1.12	1.24	+10.7%	PASS
total/synth/PUSH31/p0	1.26	1.08	-14.3%	PASS
total/synth/PUSH31/p1	1.22	1.37	+11.9%	PASS
total/synth/PUSH32/p0	1.25	1.08	-13.9%	PASS
total/synth/PUSH32/p1	1.06	1.29	+21.4%	PASS
total/synth/PUSH4/p0	0.01	0.01	+5.1%	PASS
total/synth/PUSH4/p1	0.01	0.01	+4.9%	PASS
total/synth/PUSH5/p0	0.01	0.01	+5.0%	PASS
total/synth/PUSH5/p1	0.01	0.01	+4.9%	PASS
total/synth/PUSH6/p0	0.01	0.01	+4.9%	PASS
total/synth/PUSH6/p1	0.01	0.01	+5.2%	PASS
total/synth/PUSH7/p0	0.01	0.01	+4.7%	PASS
total/synth/PUSH7/p1	0.01	0.01	+4.9%	PASS
total/synth/PUSH8/p0	0.01	0.01	+5.4%	PASS
total/synth/PUSH8/p1	0.01	0.01	+5.1%	PASS
total/synth/PUSH9/p0	0.01	0.01	+5.3%	PASS
total/synth/PUSH9/p1	0.01	0.01	+5.2%	PASS
total/synth/RETURNDATASIZE/a0	0.49	0.53	+8.7%	PASS
total/synth/RETURNDATASIZE/a1	0.45	0.49	+8.4%	PASS
total/synth/SAR/b0	3.51	3.82	+8.9%	PASS
total/synth/SAR/b1	3.97	4.29	+8.1%	PASS
total/synth/SGT/b0	0.01	0.01	+4.8%	PASS
total/synth/SGT/b1	0.01	0.01	+4.7%	PASS
total/synth/SHL/b0	2.86	3.05	+6.8%	PASS
total/synth/SHL/b1	1.60	1.71	+7.1%	PASS
total/synth/SHR/b0	2.87	3.13	+9.0%	PASS
total/synth/SHR/b1	1.55	1.61	+3.8%	PASS
total/synth/SIGNEXTEND/b0	2.92	3.34	+14.2%	PASS
total/synth/SIGNEXTEND/b1	3.17	3.66	+15.4%	PASS
total/synth/SLT/b0	0.01	0.01	+4.8%	PASS
total/synth/SLT/b1	0.01	0.01	+5.0%	PASS
total/synth/SUB/b0	0.01	0.01	+5.2%	PASS
total/synth/SUB/b1	0.01	0.01	+5.0%	PASS
total/synth/SWAP1/s0	0.01	0.01	+8.2%	PASS
total/synth/SWAP10/s0	0.01	0.01	+8.1%	PASS
total/synth/SWAP11/s0	0.01	0.01	+8.1%	PASS
total/synth/SWAP12/s0	0.01	0.01	+8.0%	PASS
total/synth/SWAP13/s0	0.01	0.01	+8.0%	PASS
total/synth/SWAP14/s0	0.01	0.01	+8.0%	PASS
total/synth/SWAP15/s0	0.01	0.01	+7.9%	PASS
total/synth/SWAP16/s0	0.01	0.01	+8.1%	PASS
total/synth/SWAP2/s0	0.01	0.01	+8.1%	PASS
total/synth/SWAP3/s0	0.01	0.01	+8.0%	PASS
total/synth/SWAP4/s0	0.01	0.01	+8.3%	PASS
total/synth/SWAP5/s0	0.01	0.01	+8.2%	PASS
total/synth/SWAP6/s0	0.01	0.01	+8.1%	PASS
total/synth/SWAP7/s0	0.01	0.01	+7.8%	PASS
total/synth/SWAP8/s0	0.01	0.01	+8.1%	PASS
total/synth/SWAP9/s0	0.01	0.01	+7.9%	PASS
total/synth/XOR/b0	0.01	0.01	+5.2%	PASS
total/synth/XOR/b1	0.01	0.01	+4.9%	PASS
total/synth/loop_v1	1.31	1.41	+8.0%	PASS
total/synth/loop_v2	1.31	1.33	+1.9%	PASS

Summary: 194 benchmarks, 0 regressions

zoowii · 2026-03-24T02:32:56Z

src/compiler/target/x86/x86lowering.cpp

 CgRegister
 X86CgLowering::lowerEvmU256MulExpr(const EvmU256MulInstruction &Inst) {
+  if (Subtarget->hasBMI2() && Subtarget->hasADX()) {
+    return lowerEvmU256MulExprAdx(Inst);


this feature only enabled when EVMJIT

Okay, I'll add a comment for this. Besides that, do we need any additional constraints?

abmcar added 11 commits March 20, 2026 14:51

perf(compiler): add bmi2 adx u256 mul lowering

ba772eb

perf(compiler): tighten adx u256 mul schedule

49f494d

perf(compiler): revert row-wise adx u256 mul schedule

dbbeb3d

perf(compiler): optimize u256 mul mulx/adx lowering

7d2d44a

perf(tools): stabilize u256 mul benchmark comparisons

f0e67e2

perf(tools): validate u256 mul benchmark results

9fd99b7

refactor(compiler): share x86 u256 mul lowering helpers

5e95ab7

perf(compiler): regroup x86 u256 mul carry schedule

d6f062d

refactor(compiler): factor x86 u256 mul product helpers

dac16bc

perf(compiler): reduce adx u256 mul spill pressure

82d0149

refactor(compiler): trim x86 u256 mul lowering paths

0e0f817

Copilot AI review requested due to automatic review settings March 23, 2026 08:13

Copilot started reviewing on behalf of abmcar March 23, 2026 08:13 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

src/compiler/target/x86/x86lowering.cpp Show resolved Hide resolved

docs(compiler): clarify truncated adx carry handling

bcfb9e3

zoowii reviewed Mar 24, 2026

View reviewed changes

abmcar added 2 commits March 24, 2026 13:31

docs(compiler): note evm jit-only mul lowering

1f7cc95

ci(ci): restart PR checks

dab8a1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(compiler): optimize x86 u256 mul with bmi2 adx#428

perf(compiler): optimize x86 u256 mul with bmi2 adx#428
abmcar wants to merge 14 commits intoDTVMStack:mainfrom
abmcar:perf/compiler-u256-mulx-adx

abmcar commented Mar 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

zoowii Mar 24, 2026

Uh oh!

abmcar Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abmcar commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Performance

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

✅ Performance Check Passed (multipass)

Uh oh!

zoowii Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

abmcar Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abmcar commented Mar 23, 2026 •

edited

Loading

github-actions bot commented Mar 23, 2026 •

edited

Loading