-
Notifications
You must be signed in to change notification settings - Fork 410
Z: Implement vreductionand, vreductionor, and vreductionxor #7998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Z: Implement vreductionand, vreductionor, and vreductionxor #7998
Conversation
0961079 to
f9ec52f
Compare
|
Please remove the "This commit" from your commit message. Just say what it does in the imperative voice (per the CG). |
Add support for the vreductionand, vreductionor, and vreductionxor opcodes on the IBM Z platform. These opcodes perform bitwise reduction across all lanes of a vector, producing a scalar result. signed-off-by: Ehsan Kiani Far <[email protected]>
7922204 to
a400467
Compare
hzongaro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor comments and one technical question.
Also, may I ask you to update the commit comment and the pull request title and description to use the correct case for the various opcodes: vreductionAnd, vreductionOr and vreductionXor?
| // In each iteration, the vector is split in half: the first half stays in | ||
| // sourceReg, and the second half moves to the scratch register. The operation | ||
| // is then performed. This process repeats until all vector elements are combined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not exactly true that the vector is split in half, is it? It might be better to say that the second doubleword element of the source register is copied to the scratch register and the operation is performed placing the intermediate result in the source register. Then the second word-sized element of the source is copied to the scratch register and the operation is performed placing the next intermediate result in the source register, and so on until the desired element size is reached, leaving the final result in element zero of the source register.
Or perhaps a small example would help to illustrate how this works. For instance, for TR::Int32 vreductionOr, where x indicates nibble values that we aren't interested in:
Initial sourceReg: 00000000 11111111 22222222 44444444
scratchReg after first VREP: 22222222 44444444 22222222 44444444
After first VO, sourceReg: 22222222 55555555 xxxxxxxx xxxxxxxx
scratchReg after second VREP: 55555555 55555555 55555555 55555555
After second VO, sourceReg: 77777777 xxxxxxxx xxxxxxxx xxxxxxxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hzongaro,
I will update the comment as you suggested.
| } | ||
| TR::Register *resultReg = cg->allocateRegister(); | ||
| // The final result is stored in element 0 of the source register. | ||
| generateVRScInstruction(cg, TR::InstOpCode::VLGV, node, resultReg, sourceReg, generateS390MemoryReference(0, cg), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the GPR need to be sign extended? I'm assuming that if the result of a TR::Int16 vreductionAnd was FFFF, say, that the GPR is expected to contain either the 32-bit value FFFFFFFF or the 64-bit value FFFFFFFF FFFFFFFF, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently We are not sign extending the value as the returned datatype is similar to lane size. A casting node (e.g. i2l) is expected if sign extension is required. I see that it can produce bugs if we miss the casting and perform arithmetic operations on this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether your response means that you will modify this to ensure sign extension happens or it means that you don't think sign extension is necessary, as it's expected that an s2i would happen if necessary. Let me give an example where a vreductionAnd produces a TR::Int16 result 0xFFFF. The value that is loaded into the result register by the VLGV at the end of the code generated for the vreductionAnd will be 32767 (0x0000FFFF) if I understand correctly. Now suppose that the vreductionAnd is the operand of an sshr (shift short integer arithmetically) operation:
sshr
vreductionAnd
...
iconst 15
I believe that the expected result of this operation is the TR::Int16 result -1 (0xFFFF). Will z code generation sign extend the value in the register before performing a 32-bit SRA operation to ensure that result is produced, or does it expect the value in the 32-bit register to have already been sign extended, which would yield an incorrect result of 1 (0x0001) in this case?
@r30shah, I would appreciate your insights into what expectations z code generation has about sign extension of intermediate values of type TR::Int16 and TR::Int8 that are being held in registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I work on instructions that sign extend the result and will check the created trees.
|
I convert this PR to WIP until investigating the sign extension and updating the code. |
This commit adds support for the vreductionand, vreductionor, and vreductionxor opcodes on the IBM Z platform. These opcodes perform bitwise reduction across all lanes of a vector, producing a scalar result.