-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[clr-interp] Fix additional iteration in Copy_Ref implementation on arm64 #120879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Copy_Ref copies memory from one location to another. The implementation moves 16 bytes chunks, then 8 byte chunks and finally one byte at a time. This commit removes the loop over the 8 byte chunks because it can never happen. We can copy at most once 8 bytes at a time, given we already copied the memory previously in 16 byte chunks.
…rm64 After a loop iteration is done, the code is subtracting 16, obtaining the current remaining counter. If the remaining counter is bigger than 0 then we do another iteration. This is wrong, because we can only do another 16byte copy iteration if the remaining counter is bigger or equal to 16. In order to be correct, the code would have to branch to the very beginning of the macro. In case of negative result, we would fallback to 8byte copy, but, if the result is negative, it means we already overcopied, so this implementation is wrong. The fix makes it such that when we subtract 16 after an iteration, we don't obtain the current remaining counter, rather the potential remaining counter, in case we did another copy iteration. If this counter is positive, then it is safe to do another iteration, if it is negative then we can't copy 16 bytes at a time, fallback to the next case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a logic error in the Copy_Ref macro implementation for ARM64 assembly that could cause an additional unintended loop iteration and potential over-copying of memory. The fix adjusts the loop counter calculation to prevent copying beyond the intended boundary by computing the potential remaining counter before deciding whether to continue the loop.
Key changes:
- Pre-decrement the counter before entering the 16-byte copy loop to correctly determine if another iteration is safe
- Change loop condition from
bgt(greater than) tobge(greater or equal) to align with the new counter logic - Remove redundant 8-byte copy loop since it can execute at most once
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/coreclr/vm/arm64/asmhelpers.asm | Fixes Copy_Ref macro loop logic for Windows ARM64 assembly |
| src/coreclr/vm/arm64/asmhelpers.S | Applies identical fix to Copy_Ref macro for Unix ARM64 assembly |
|
This could also be fixed with more straightforward code that does separate compare: |
nit: it has one-more instruction in the loop compared to the current ( .macro Copy_Ref argReg
cmp x11, #16
blt LOCAL_LABEL(CopyBy8\argReg)
LOCAL_LABEL(RefCopyLoop16\argReg):
ldp x13, x14, [\argReg], #16
stp x13, x14, [x9], #16
subs x11, x11, #16
bge LOCAL_LABEL(RefCopyLoop16\argReg)
LOCAL_LABEL(CopyBy8\argReg): |
|
@am11 Yeah, the extra instruction is why I left the diff as is, but I don't believe there is any meaningful difference between the two implementations. I'm not sure I understand the example you provided, since it exhibits the issue that I was trying to fix. For example, for copy size 24, you do the pair copy once (16 bytes written so far), then you subtract x11 (24) with 16. Because the comparison result is greater than, then you run the loop one more time, copying already 32 bytes, which is incorrect. Did you mean to add some other instructions to the sample code ? |
|
cc @janvorli The gh bot tagging subscribers for the area seems to be down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
After a loop iteration is done, the code is subtracting 16, obtaining the current remaining counter. If the remaining counter is bigger than 0 then we do another iteration. This is wrong, because we can only do another 16 byte copy iteration if the remaining counter is bigger or equal to 16. In order to be correct, the code would have to branch to the very beginning of the macro. In case of negative result, we would fallback to 8 byte copy, but, if the result is negative, it means we already overcopied, so this implementation is wrong.
The fix makes it such that when we subtract 16 after an iteration, we don't obtain the current remaining counter, rather the potential remaining counter, in case we did another copy iteration. If this counter is positive, then it is safe to do another iteration, if it is negative then we can't copy 16 bytes at a time, fallback to the next case.
The PR also removes redundant looping for 8 bytes copy scenario, which can be hit at most once.