fix(compiler): add large-offset 64-bit address path to handleStoreImpl#426
fix(compiler): add large-offset 64-bit address path to handleStoreImpl#426cmgCr wants to merge 2 commits intoDTVMStack:mainfrom
Conversation
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
0093b92 to
d5a999c
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a singlepass x86-64 JIT addressing bug for WASM memory ops with very large memarg.offset values (≥ 2GiB) by materializing the full 64-bit effective address instead of relying on disp32 encodings that sign-extend.
Changes:
- Add an explicit 64-bit effective-address computation path for large offsets in x64 singlepass load/store codegen.
- Add a new spec-extra WAST regression test covering large-offset in-bounds behavior and a large-offset OOB trap.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/singlepass/x64/codegen.h |
Emits register-based address computation for large offsets to avoid disp32 sign-extension misaddressing. |
tests/wast/spec_extra/store_large_offset_false_positive_oob.wast |
Adds a regression test for large-offset loads/stores near the top of a large memory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!Base.isImm() && (Offset >= (uint32_t)INT32_MAX)) { | ||
| auto MemAddrReg = Layout.getScopedTemp<AddrType, ScopedTempReg2>(); | ||
| _ mov(X64Reg::getRegRef<X64::I32>(MemAddrReg), Offset); | ||
| _ add(X64Reg::getRegRef<X64::I64>(MemAddrReg), | ||
| X64Reg::getRegRef<X64::I64>(RegNum)); | ||
| _ add(X64Reg::getRegRef<X64::I64>(MemAddrReg), ABI.getMemoryBaseReg()); | ||
| Addr = asmjit::x86::Mem(X64Reg::getRegRef<X64::I64>(MemAddrReg), 0, | ||
| getWASMTypeSize<Type>()); | ||
| } |
There was a problem hiding this comment.
Like the load path, this large-offset fix only runs when !Base.isImm(). Immediate-base memory ops with offset>=0x80000000 still rely on a disp32 encoding (or the existing INT32_MAX clamping), which can misaddress memory or incorrectly trap on valid accesses. It would be safer to apply the same explicit 64-bit address materialization for the Base.isImm() case too.
| if (!Base.isImm() && (Offset >= (uint32_t)INT32_MAX)) { | ||
| auto MemAddrReg = Layout.getScopedTemp<AddrType, ScopedTempReg2>(); | ||
| _ mov(X64Reg::getRegRef<X64::I32>(MemAddrReg), Offset); | ||
| _ add(X64Reg::getRegRef<X64::I64>(MemAddrReg), | ||
| X64Reg::getRegRef<X64::I64>(RegNum)); |
There was a problem hiding this comment.
Same as above: Offset >= (uint32_t)INT32_MAX triggers the explicit-address slow path one value too early. The disp32 sign bit flips at 0x80000000 (Offset > INT32_MAX), while 0x7fffffff is still a safe positive disp32.
| @@ -0,0 +1,44 @@ | |||
| ;; Test: f64.store with offset >= INT32_MAX on large memory should NOT trap. | |||
There was a problem hiding this comment.
The opening comment says "offset >= INT32_MAX" but the disp32 sign-extension hazard starts at 0x80000000 (i.e., INT32_MAX + 1). Consider updating the wording to avoid implying that 0x7fffffff is problematic.
| ;; Test: f64.store with offset >= INT32_MAX on large memory should NOT trap. | |
| ;; Test: f64.store with offset >= 0x80000000 (INT32_MAX + 1) on large memory should NOT trap. |
| ;; dynamic base (memory.size result) + large offset: in-bounds store | ||
| (func (export "f64_store_dynamic_base_large_offset") | ||
| memory.size | ||
| f64.const -5.44203 | ||
| f64.store offset=4268353288) | ||
|
|
||
| ;; dynamic base via parameter + large offset: in-bounds store | ||
| (func (export "f64_store_param_base_large_offset") (param i32) | ||
| local.get 0 | ||
| f64.const 1.0 | ||
| f64.store offset=4268353288) |
There was a problem hiding this comment.
This test exercises large offsets with dynamic bases; to fully cover the disp32 sign-extension regression, consider also adding a case where the base is an immediate constant (e.g., i32.const 0) with an offset>=0x80000000. That hits the Base.isImm() addressing form, which can behave differently in the JIT.
| if (!Base.isImm() && (Offset >= (uint32_t)INT32_MAX)) { | ||
| auto MemAddrReg = Layout.getScopedTemp<AddrType, ScopedTempReg2>(); | ||
| _ mov(X64Reg::getRegRef<X64::I32>(MemAddrReg), Offset); | ||
| _ add(X64Reg::getRegRef<X64::I64>(MemAddrReg), | ||
| X64Reg::getRegRef<X64::I64>(BaseReg)); | ||
| _ add(X64Reg::getRegRef<X64::I64>(MemAddrReg), ABI.getMemoryBaseReg()); | ||
| Addr = asmjit::x86::Mem(X64Reg::getRegRef<X64::I64>(MemAddrReg), 0, | ||
| getWASMTypeSize(SrcType)); | ||
| getWASMTypeSize<SrcType>()); | ||
| } |
There was a problem hiding this comment.
The large-offset handling is gated on !Base.isImm(), but the disp32 sign-extension problem also affects the Base.isImm() addressing form (e.g., i32.const 0; *.load/store offset>=0x80000000). Additionally, the current Base.isImm() path clamps Offset to INT32_MAX, which would miscompile valid in-bounds accesses above 2GiB. Consider extending the 64-bit explicit-address computation to cover immediate bases as well (materialize the full effective address in a GP register and use [reg]), and remove the clamping-as-invalid approach.
| if (!Base.isImm() && (Offset >= (uint32_t)INT32_MAX)) { | ||
| auto MemAddrReg = Layout.getScopedTemp<AddrType, ScopedTempReg2>(); | ||
| _ mov(X64Reg::getRegRef<X64::I32>(MemAddrReg), Offset); | ||
| _ add(X64Reg::getRegRef<X64::I64>(MemAddrReg), | ||
| X64Reg::getRegRef<X64::I64>(BaseReg)); |
There was a problem hiding this comment.
The Offset >= (uint32_t)INT32_MAX threshold is slightly off for the disp32 sign-extension issue: sign-extension becomes negative starting at 0x80000000 (i.e., Offset > INT32_MAX). Using >= INT32_MAX will unnecessarily take the slower explicit-address path for Offset == 0x7fffffff, which is still representable as a positive disp32.
1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
6. Release note