Skip to content

Conversation

@ehsankianifar
Copy link
Contributor

This commit adds support for the vreductionand, vreductionor, and vreductionxor opcodes on the IBM Z platform. These opcodes perform bitwise reduction across all lanes of a vector, producing a scalar result.

@ehsankianifar ehsankianifar marked this pull request as draft October 24, 2025 21:09
@ehsankianifar
Copy link
Contributor Author

Hi @r30shah @hzongaro ,
This PR implements vreductionand, vreductionor, and vreductionxor opcodes. I'll remove the draft tag when tests pass.

@0xdaryl
Copy link
Contributor

0xdaryl commented Oct 27, 2025

Please remove the "This commit" from your commit message. Just say what it does in the imperative voice (per the CG).

Add support for the vreductionand, vreductionor, and vreductionxor
opcodes on the IBM Z platform. These opcodes perform bitwise reduction
across all lanes of a vector, producing a scalar result.

signed-off-by: Ehsan Kiani Far <[email protected]>
@ehsankianifar ehsankianifar force-pushed the Z_ImplementReductionOpcodes branch from 7922204 to a400467 Compare October 27, 2025 20:51
@ehsankianifar ehsankianifar marked this pull request as ready for review October 27, 2025 20:53
Copy link
Contributor

@hzongaro hzongaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor comments and one technical question.

Also, may I ask you to update the commit comment and the pull request title and description to use the correct case for the various opcodes: vreductionAnd, vreductionOr and vreductionXor?

Comment on lines +1523 to +1525
// In each iteration, the vector is split in half: the first half stays in
// sourceReg, and the second half moves to the scratch register. The operation
// is then performed. This process repeats until all vector elements are combined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not exactly true that the vector is split in half, is it? It might be better to say that the second doubleword element of the source register is copied to the scratch register and the operation is performed placing the intermediate result in the source register. Then the second word-sized element of the source is copied to the scratch register and the operation is performed placing the next intermediate result in the source register, and so on until the desired element size is reached, leaving the final result in element zero of the source register.

Or perhaps a small example would help to illustrate how this works. For instance, for TR::Int32 vreductionOr, where x indicates nibble values that we aren't interested in:

Initial sourceReg:             00000000 11111111 22222222 44444444
scratchReg after first VREP:   22222222 44444444 22222222 44444444
After first VO, sourceReg:     22222222 55555555 xxxxxxxx xxxxxxxx
scratchReg after second VREP:  55555555 55555555 55555555 55555555
After second VO, sourceReg:    77777777 xxxxxxxx xxxxxxxx xxxxxxxx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hzongaro,
I will update the comment as you suggested.

}
TR::Register *resultReg = cg->allocateRegister();
// The final result is stored in element 0 of the source register.
generateVRScInstruction(cg, TR::InstOpCode::VLGV, node, resultReg, sourceReg, generateS390MemoryReference(0, cg),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the GPR need to be sign extended? I'm assuming that if the result of a TR::Int16 vreductionAnd was FFFF, say, that the GPR is expected to contain either the 32-bit value FFFFFFFF or the 64-bit value FFFFFFFF FFFFFFFF, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently We are not sign extending the value as the returned datatype is similar to lane size. A casting node (e.g. i2l) is expected if sign extension is required. I see that it can produce bugs if we miss the casting and perform arithmetic operations on this value.

Copy link
Contributor

@hzongaro hzongaro Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether your response means that you will modify this to ensure sign extension happens or it means that you don't think sign extension is necessary, as it's expected that an s2i would happen if necessary. Let me give an example where a vreductionAnd produces a TR::Int16 result 0xFFFF. The value that is loaded into the result register by the VLGV at the end of the code generated for the vreductionAnd will be 32767 (0x0000FFFF) if I understand correctly. Now suppose that the vreductionAnd is the operand of an sshr (shift short integer arithmetically) operation:

sshr
   vreductionAnd  
      ...
   iconst 15

I believe that the expected result of this operation is the TR::Int16 result -1 (0xFFFF). Will z code generation sign extend the value in the register before performing a 32-bit SRA operation to ensure that result is produced, or does it expect the value in the 32-bit register to have already been sign extended, which would yield an incorrect result of 1 (0x0001) in this case?

@r30shah, I would appreciate your insights into what expectations z code generation has about sign extension of intermediate values of type TR::Int16 and TR::Int8 that are being held in registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I work on instructions that sign extend the result and will check the created trees.

@hzongaro hzongaro self-assigned this Oct 30, 2025
@ehsankianifar ehsankianifar marked this pull request as draft November 5, 2025 19:16
@ehsankianifar
Copy link
Contributor Author

I convert this PR to WIP until investigating the sign extension and updating the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants