[clr-interp] Fix additional iteration in Copy_Ref implementation on arm64 #120879

BrzVlad · 2025-10-19T13:12:29Z

After a loop iteration is done, the code is subtracting 16, obtaining the current remaining counter. If the remaining counter is bigger than 0 then we do another iteration. This is wrong, because we can only do another 16 byte copy iteration if the remaining counter is bigger or equal to 16. In order to be correct, the code would have to branch to the very beginning of the macro. In case of negative result, we would fallback to 8 byte copy, but, if the result is negative, it means we already overcopied, so this implementation is wrong.

The fix makes it such that when we subtract 16 after an iteration, we don't obtain the current remaining counter, rather the potential remaining counter, in case we did another copy iteration. If this counter is positive, then it is safe to do another iteration, if it is negative then we can't copy 16 bytes at a time, fallback to the next case.

The PR also removes redundant looping for 8 bytes copy scenario, which can be hit at most once.

Copy_Ref copies memory from one location to another. The implementation moves 16 bytes chunks, then 8 byte chunks and finally one byte at a time. This commit removes the loop over the 8 byte chunks because it can never happen. We can copy at most once 8 bytes at a time, given we already copied the memory previously in 16 byte chunks.

…rm64 After a loop iteration is done, the code is subtracting 16, obtaining the current remaining counter. If the remaining counter is bigger than 0 then we do another iteration. This is wrong, because we can only do another 16byte copy iteration if the remaining counter is bigger or equal to 16. In order to be correct, the code would have to branch to the very beginning of the macro. In case of negative result, we would fallback to 8byte copy, but, if the result is negative, it means we already overcopied, so this implementation is wrong. The fix makes it such that when we subtract 16 after an iteration, we don't obtain the current remaining counter, rather the potential remaining counter, in case we did another copy iteration. If this counter is positive, then it is safe to do another iteration, if it is negative then we can't copy 16 bytes at a time, fallback to the next case.

Copilot

Pull Request Overview

This PR fixes a logic error in the Copy_Ref macro implementation for ARM64 assembly that could cause an additional unintended loop iteration and potential over-copying of memory. The fix adjusts the loop counter calculation to prevent copying beyond the intended boundary by computing the potential remaining counter before deciding whether to continue the loop.

Key changes:

Pre-decrement the counter before entering the 16-byte copy loop to correctly determine if another iteration is safe
Change loop condition from bgt (greater than) to bge (greater or equal) to align with the new counter logic
Remove redundant 8-byte copy loop since it can execute at most once

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/coreclr/vm/arm64/asmhelpers.asm	Fixes Copy_Ref macro loop logic for Windows ARM64 assembly
src/coreclr/vm/arm64/asmhelpers.S	Applies identical fix to Copy_Ref macro for Unix ARM64 assembly

src/coreclr/vm/arm64/asmhelpers.asm

src/coreclr/vm/arm64/asmhelpers.S

BrzVlad · 2025-10-19T13:13:47Z

This could also be fixed with more straightforward code that does separate compare:

.macro Copy_Ref argReg
    cmp x11, #16
    blt LOCAL_LABEL(CopyBy8\argReg)
LOCAL_LABEL(RefCopyLoop16\argReg):
    ldp x13, x14, [\argReg], #16
    stp x13, x14, [x9], #16
    sub x11, x11, #16
    cmp x11, #16
    bge LOCAL_LABEL(RefCopyLoop16\argReg)
LOCAL_LABEL(CopyBy8\argReg):

am11 · 2025-10-19T14:54:31Z

This could also be fixed with more straightforward code that does separate compare:

nit: it has one-more instruction in the loop compared to the current (sub+cmp+bge vs. subs+bge and add after the loop). We can probably use bhs (branch if higher or same: unsigned >= 0) without needing to rectify post loop.

.macro Copy_Ref argReg
    cmp x11, #16
    blt LOCAL_LABEL(CopyBy8\argReg)
LOCAL_LABEL(RefCopyLoop16\argReg):
    ldp x13, x14, [\argReg], #16
    stp x13, x14, [x9], #16
    subs x11, x11, #16
    bge LOCAL_LABEL(RefCopyLoop16\argReg)
LOCAL_LABEL(CopyBy8\argReg):

BrzVlad · 2025-10-19T20:34:53Z

@am11 Yeah, the extra instruction is why I left the diff as is, but I don't believe there is any meaningful difference between the two implementations. I'm not sure I understand the example you provided, since it exhibits the issue that I was trying to fix. For example, for copy size 24, you do the pair copy once (16 bytes written so far), then you subtract x11 (24) with 16. Because the comparison result is greater than, then you run the loop one more time, copying already 32 bytes, which is incorrect. Did you mean to add some other instructions to the sample code ?

BrzVlad · 2025-10-20T13:12:03Z

cc @janvorli The gh bot tagging subscribers for the area seems to be down

src/coreclr/vm/arm64/asmhelpers.S

janvorli

LGTM, thank you!

BrzVlad added 2 commits October 19, 2025 14:03

Copilot AI review requested due to automatic review settings October 19, 2025 13:12

dotnet-policy-service bot assigned BrzVlad Oct 19, 2025

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 19, 2025

Copilot AI reviewed Oct 19, 2025

View reviewed changes

src/coreclr/vm/arm64/asmhelpers.asm Outdated Show resolved Hide resolved

src/coreclr/vm/arm64/asmhelpers.S Outdated Show resolved Hide resolved

BrzVlad added area-CodeGen-Interpreter-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 19, 2025

janvorli reviewed Oct 20, 2025

View reviewed changes

src/coreclr/vm/arm64/asmhelpers.S Outdated Show resolved Hide resolved

janvorli simplification

093a5ca

build-analysis bot mentioned this pull request Oct 21, 2025

Test failure: baseservices/exceptions/stackoverflow/stackoverflowtester/stackoverflowtester.cmd #110173

Open

janvorli approved these changes Oct 21, 2025

View reviewed changes

janvorli merged commit 6549d43 into dotnet:main Oct 21, 2025
97 checks passed

dotnet-maestro bot mentioned this pull request Oct 21, 2025

[main] Source code updates from dotnet/runtime dotnet/dotnet#3024

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clr-interp] Fix additional iteration in Copy_Ref implementation on arm64 #120879

[clr-interp] Fix additional iteration in Copy_Ref implementation on arm64 #120879

Uh oh!

BrzVlad commented Oct 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

BrzVlad commented Oct 19, 2025

Uh oh!

am11 commented Oct 19, 2025

Uh oh!

BrzVlad commented Oct 19, 2025

Uh oh!

BrzVlad commented Oct 20, 2025

Uh oh!

Uh oh!

janvorli left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[clr-interp] Fix additional iteration in Copy_Ref implementation on arm64 #120879

[clr-interp] Fix additional iteration in Copy_Ref implementation on arm64 #120879

Uh oh!

Conversation

BrzVlad commented Oct 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

BrzVlad commented Oct 19, 2025

Uh oh!

am11 commented Oct 19, 2025

Uh oh!

BrzVlad commented Oct 19, 2025

Uh oh!

BrzVlad commented Oct 20, 2025

Uh oh!

Uh oh!

janvorli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants