Skip to content

Conversation

@AndyAyersMS
Copy link
Member

During the second stage bootstrap build VMR on an AVX-512 capable machine, we end up in try_SPILL_COST looking at a K-reg spill candidate without an assigned interval, and crash.

This happens because the preceding heuristic try_REG_ORDER fails to find a register when it should, because mask register numbers are greater than 63 and we shift 1ULL by this amount to build a mask, which is undefined behavior.

The fix is to always look up the mask via table fetch, which is set up to handle mask register numbers properly.

Fixes the crash seen in #119070.

During the second stage bootstrap build VMR on an AVX-512 capable machine,
we end up in `try_SPILL_COST` looking at a K-reg spill candidate without an
assigned interval,  and crash.

This happens because the preceding heuristic `try_REG_ORDER` fails to find
a register when it should, because mask register numbers are greater than 63
and we shift 1ULL by this amount to build a mask, which is undefined behavior.

The fix is to always look up the mask via table fetch, which is set up to handle
mask register numbers properly.

Fixes the crash seen in dotnet#119070.
Copilot AI review requested due to automatic review settings August 29, 2025 14:46
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 29, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a JIT crash that occurs during VMR builds on AVX-512 capable machines. The crash happens in the LSRA (Linear Scan Register Allocator) when processing mask registers with numbers greater than 63.

  • Removes the AMD64-specific optimization that uses bit shifting to generate register masks
  • Standardizes on table lookup for all architectures to handle mask register numbers properly

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib PTAL

The handling of registers numbered 64 or higher in the allocator seems pretty fragile. The bug here was probably overlooked as we added extra registers to ARM64 first and only more recently to Intel, and before this PR this method behaved differently on ARM64.

I know we went back and for on this approach for quite a while, but I wish there was a bit more sanity checking that we're not mixing up what we're looking at.

@tannergooding
Copy link
Member

The handling of registers numbered 64 or higher in the allocator seems pretty fragile.

👍. I'm still of the opinion that having one big "register set" for all register files isn't the best long term approach; in part due to this fragility and extra expense.

I think that it would be much better to have n separate register sets, one for each "register file". This keeps each at no more than 32-bits per "file" and makes it most of the handling much less error prone.

The major issue with that is it requires a larger refactoring. I do think, however, that it will improve perf and maintainability long term since most instructions and scenarios only touch a single register file. There will be a couple exceptions where we have some instruction that mixes register files (such as float input and int output), but these should be rare and have ways to reduce their cost as well.

@AndyAyersMS
Copy link
Member Author

Waiting to be sure it fixes things for @omajid as well.

No diffs. Shows as a small TP win which is probably misleading.

@AndyAyersMS AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 30, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS AndyAyersMS merged commit f579745 into dotnet:main Sep 2, 2025
115 checks passed
@AndyAyersMS
Copy link
Member Author

/backport to release/10.0

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2025

Started backporting to release/10.0: https://github.com/dotnet/runtime/actions/runs/17406514522

@github-actions github-actions bot locked and limited conversation to collaborators Oct 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants