Skip to content

Conversation

@BrzVlad
Copy link
Member

@BrzVlad BrzVlad commented Oct 19, 2025

After a loop iteration is done, the code is subtracting 16, obtaining the current remaining counter. If the remaining counter is bigger than 0 then we do another iteration. This is wrong, because we can only do another 16 byte copy iteration if the remaining counter is bigger or equal to 16. In order to be correct, the code would have to branch to the very beginning of the macro. In case of negative result, we would fallback to 8 byte copy, but, if the result is negative, it means we already overcopied, so this implementation is wrong.

The fix makes it such that when we subtract 16 after an iteration, we don't obtain the current remaining counter, rather the potential remaining counter, in case we did another copy iteration. If this counter is positive, then it is safe to do another iteration, if it is negative then we can't copy 16 bytes at a time, fallback to the next case.

The PR also removes redundant looping for 8 bytes copy scenario, which can be hit at most once.

Copy_Ref copies memory from one location to another. The implementation moves 16 bytes chunks, then 8 byte chunks and finally one byte at a time. This commit removes the loop over the 8 byte chunks because it can never happen. We can copy at most once 8 bytes at a time, given we already copied the memory previously in 16 byte chunks.
…rm64

After a loop iteration is done, the code is subtracting 16, obtaining the current remaining counter. If the remaining counter is bigger than 0 then we do another iteration. This is wrong, because we can only do another 16byte copy iteration if the remaining counter is bigger or equal to 16. In order to be correct, the code would have to branch to the very beginning of the macro. In case of negative result, we would fallback to 8byte copy, but, if the result is negative, it means we already overcopied, so this implementation is wrong.

The fix makes it such that when we subtract 16 after an iteration, we don't obtain the current remaining counter, rather the potential remaining counter, in case we did another copy iteration. If this counter is positive, then it is safe to do another iteration, if it is negative then we can't copy 16 bytes at a time, fallback to the next case.
@Copilot Copilot AI review requested due to automatic review settings October 19, 2025 13:12
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 19, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a logic error in the Copy_Ref macro implementation for ARM64 assembly that could cause an additional unintended loop iteration and potential over-copying of memory. The fix adjusts the loop counter calculation to prevent copying beyond the intended boundary by computing the potential remaining counter before deciding whether to continue the loop.

Key changes:

  • Pre-decrement the counter before entering the 16-byte copy loop to correctly determine if another iteration is safe
  • Change loop condition from bgt (greater than) to bge (greater or equal) to align with the new counter logic
  • Remove redundant 8-byte copy loop since it can execute at most once

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/coreclr/vm/arm64/asmhelpers.asm Fixes Copy_Ref macro loop logic for Windows ARM64 assembly
src/coreclr/vm/arm64/asmhelpers.S Applies identical fix to Copy_Ref macro for Unix ARM64 assembly

@BrzVlad
Copy link
Member Author

BrzVlad commented Oct 19, 2025

This could also be fixed with more straightforward code that does separate compare:

.macro Copy_Ref argReg
    cmp x11, #16
    blt LOCAL_LABEL(CopyBy8\argReg)
LOCAL_LABEL(RefCopyLoop16\argReg):
    ldp x13, x14, [\argReg], #16
    stp x13, x14, [x9], #16
    sub x11, x11, #16
    cmp x11, #16
    bge LOCAL_LABEL(RefCopyLoop16\argReg)
LOCAL_LABEL(CopyBy8\argReg):

@BrzVlad BrzVlad added area-CodeGen-Interpreter-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 19, 2025
@am11
Copy link
Member

am11 commented Oct 19, 2025

This could also be fixed with more straightforward code that does separate compare:

nit: it has one-more instruction in the loop compared to the current (sub+cmp+bge vs. subs+bge and add after the loop). We can probably use bhs (branch if higher or same: unsigned >= 0) without needing to rectify post loop.

.macro Copy_Ref argReg
    cmp x11, #16
    blt LOCAL_LABEL(CopyBy8\argReg)
LOCAL_LABEL(RefCopyLoop16\argReg):
    ldp x13, x14, [\argReg], #16
    stp x13, x14, [x9], #16
    subs x11, x11, #16
    bge LOCAL_LABEL(RefCopyLoop16\argReg)
LOCAL_LABEL(CopyBy8\argReg):

@BrzVlad
Copy link
Member Author

BrzVlad commented Oct 19, 2025

@am11 Yeah, the extra instruction is why I left the diff as is, but I don't believe there is any meaningful difference between the two implementations. I'm not sure I understand the example you provided, since it exhibits the issue that I was trying to fix. For example, for copy size 24, you do the pair copy once (16 bytes written so far), then you subtract x11 (24) with 16. Because the comparison result is greater than, then you run the loop one more time, copying already 32 bytes, which is incorrect. Did you mean to add some other instructions to the sample code ?

@BrzVlad
Copy link
Member Author

BrzVlad commented Oct 20, 2025

cc @janvorli The gh bot tagging subscribers for the area seems to be down

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@janvorli janvorli merged commit 6549d43 into dotnet:main Oct 21, 2025
97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants