Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End to end support for bfp16 mac/mul intrinsics #279

Merged
merged 2 commits into from
Jan 23, 2025

Conversation

niwinanto
Copy link
Collaborator

No description provided.

case Intrinsic::aie2p_BFP576_BFP1152_ACC2048_mul_conf:
case Intrinsic::aie2p_BFP576_BFP1152_ACC2048_negmul_conf:
MI = MIB.buildInstr(TII.getOpCode(I), {DstReg},
{Src1Reg, Src2Reg, /*Mode*/ I.getOperand(8).getReg()});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang-tidy can help with those comments, like: /*Mode=*/I.getOperand(8).getReg()}

.addImm(AIE2P::sub_bfp576_lo)
.addReg(Src2RegSub1)
.addImm(AIE2P::sub_bfp576_hi);
constrainOperandRegClass(*MF, TRI, MRI, TII, RBI, *RegSeq1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we have a lambda where we pass something like constrainRegSeq(RegSeq1, AIE2P::VEC512RegClass, 1); ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -18,6 +18,8 @@
#include "AIE2PTargetMachine.h"
#include "AIEBaseInstructionSelector.h"
#include "MCTargetDesc/aie2p/AIE2PMCTargetDesc.h"
#include "llvm/ADT/ArrayRef.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those includes are not necessary here.

@niwinanto niwinanto force-pushed the niwin.bfp16.mac_mul2 branch 2 times, most recently from 879d252 to 94481ed Compare January 20, 2025 11:20
Copy link
Collaborator

@khallouh khallouh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add register bank select tests for these intrinsics? No need to test every single type, only each intrinsic would be enough, as all vector types follow the same code path.

@niwinanto
Copy link
Collaborator Author

Could you add register bank select tests for these intrinsics? No need to test every single type, only each intrinsic would be enough, as all vector types follow the same code path.

Done!

@@ -38827,6 +38827,376 @@ addmsc_elem_64_conf(v64bfloat16 a, int sgn_x, v64bfloat16 b, int sgn_y,
conf);
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 2025 for this file?

@@ -301,6 +301,45 @@ def : PatInaccessibleMem<(int_aie2p_I512_I512_ACC512_bf_mac_conf VEC512:$s1, VEC
def : PatInaccessibleMem<(int_aie2p_I512_I512_ACC512_bf_mul_conf VEC512:$s1, VEC512:$s2, eR:$acc),
(EXTRACT_SUBREG (VMUL_f_vmul_bf_vmul_bf_core_X_X VEC512:$s1, VEC512:$s2, eR:$acc), sub_512_acc_lo)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 2024-2025 in the header

ConstrainRegSeq(*RegSeq2, AIE2P::EXPVEC64RegClass, 3);
ConstrainRegSeq(*RegSeq3, AIE2P::VEC512RegClass, 1);
ConstrainRegSeq(*RegSeq3, AIE2P::EXPVEC64RegClass, 3);
ConstrainRegSeq(*RegSeq4, AIE2P::mEXaRegClass, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: VEC576 instead of mEXa. You could add this if not already there.

// BFP16 MAC MUL
class AIE2PMUL_BFP16V64BFP16
: Intrinsic<[llvm_v64i32_ty], [llvm_v64i8_ty, llvm_v8i8_ty, llvm_v64i8_ty, llvm_v8i8_ty, llvm_i32_ty],
[IntrNoMem]>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These intrinsics read control registers. To model this correctly, we set the following attributes [IntrReadMem, IntrInaccessibleMemOnly] Could you update?

@@ -2973,6 +2984,88 @@ AIE2PInstructionSelector::getCombinedOpcodeSRSUPS(
return {};
}

bool AIE2PInstructionSelector ::selectVMAC_BFP(MachineInstr &I,
MachineRegisterInfo &MRI) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have patterns for all these intrinsics. I tried this one and it works after defining the 2 higher composed sub-register indices. See below:

def : PatInaccessibleMem<(int_aie2p_BFP576_BFP1152_ACC2048_mul_conf VEC512:$s1, EXPVEC64:$s2, VEC512:$s3, VEC512:$s4, EXPVEC64:$s5, EXPVEC64:$s6, eR:$acc),
           (VMUL_f_vmul_bfp_vmul_bfp_core_EX_EY 
            (REG_SEQUENCE mEXv, VEC512:$s1, sub_bfp16_x, EXPVEC64:$s2, sub_bfp16_e), 
            (REG_SEQUENCE eEY, 
              VEC512:$s3, sub_bfp16_x, EXPVEC64:$s5, sub_bfp16_e, VEC512:$s4, sub_bfp576_hi_bfp16_x, EXPVEC64:$s6, sub_bfp576_hi_bfp16_e), 
            eR:$acc)>;

In AIE2PRegisterInfo.td:

def sub_bfp576_hi_bfp16_e : ComposedSubRegIndex<sub_bfp576_hi, sub_bfp16_e>;
def sub_bfp576_hi_bfp16_x : ComposedSubRegIndex<sub_bfp576_hi, sub_bfp16_x>;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khallouh This is really nice. I could not make it working with REG_SEQUENCE. Thanks for spending time on it.

@niwinanto niwinanto force-pushed the niwin.bfp16.mac_mul2 branch 2 times, most recently from 08814d5 to 050b95c Compare January 23, 2025 07:55
@niwinanto niwinanto force-pushed the niwin.bfp16.mac_mul2 branch from 050b95c to 535eba3 Compare January 23, 2025 09:39
Copy link
Collaborator

@khallouh khallouh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the intrinsics to the isAccIntrinsic in RegBankSelect? And to test this store the result of the MAC operations instead of copying it into an accumulator register.
The rest looks good, thanks for addressing the comments.

@niwinanto niwinanto force-pushed the niwin.bfp16.mac_mul2 branch from 535eba3 to e66e24e Compare January 23, 2025 10:51
@niwinanto
Copy link
Collaborator Author

Could you add the intrinsics to the isAccIntrinsic in RegBankSelect? And to test this store the result of the MAC operations instead of copying it into an accumulator register. The rest looks good, thanks for addressing the comments.

Done. For the register bank tests instead of copying to accumulator register, returning the virtual register. This will do the trick.

Copy link
Collaborator

@khallouh khallouh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@niwinanto niwinanto merged commit 81f4c1f into aie-public Jan 23, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants