-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT: fix crash in LSRA seen on VMR build on AVX-512 machines #119206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
During the second stage bootstrap build VMR on an AVX-512 capable machine, we end up in `try_SPILL_COST` looking at a K-reg spill candidate without an assigned interval, and crash. This happens because the preceding heuristic `try_REG_ORDER` fails to find a register when it should, because mask register numbers are greater than 63 and we shift 1ULL by this amount to build a mask, which is undefined behavior. The fix is to always look up the mask via table fetch, which is set up to handle mask register numbers properly. Fixes the crash seen in dotnet#119070.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a JIT crash that occurs during VMR builds on AVX-512 capable machines. The crash happens in the LSRA (Linear Scan Register Allocator) when processing mask registers with numbers greater than 63.
- Removes the AMD64-specific optimization that uses bit shifting to generate register masks
- Standardizes on table lookup for all architectures to handle mask register numbers properly
|
@dotnet/jit-contrib PTAL The handling of registers numbered 64 or higher in the allocator seems pretty fragile. The bug here was probably overlooked as we added extra registers to ARM64 first and only more recently to Intel, and before this PR this method behaved differently on ARM64. I know we went back and for on this approach for quite a while, but I wish there was a bit more sanity checking that we're not mixing up what we're looking at. |
👍. I'm still of the opinion that having one big "register set" for all register files isn't the best long term approach; in part due to this fragility and extra expense. I think that it would be much better to have n separate register sets, one for each "register file". This keeps each at no more than 32-bits per "file" and makes it most of the handling much less error prone. The major issue with that is it requires a larger refactoring. I do think, however, that it will improve perf and maintainability long term since most instructions and scenarios only touch a single register file. There will be a couple exceptions where we have some instruction that mixes register files (such as |
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
/backport to release/10.0 |
|
Started backporting to release/10.0: https://github.com/dotnet/runtime/actions/runs/17406514522 |
During the second stage bootstrap build VMR on an AVX-512 capable machine, we end up in
try_SPILL_COSTlooking at a K-reg spill candidate without an assigned interval, and crash.This happens because the preceding heuristic
try_REG_ORDERfails to find a register when it should, because mask register numbers are greater than 63 and we shift 1ULL by this amount to build a mask, which is undefined behavior.The fix is to always look up the mask via table fetch, which is set up to handle mask register numbers properly.
Fixes the crash seen in #119070.