-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2P] Legalize and select VMUL.f from G_FMUL #360
base: aie-public
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,8 @@ class VecConf { | |
int BMODE_16x16_b = 1; | ||
int BMODE_32x16 = 0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Funny to have aliases here. |
||
|
||
int VARIANT_BF16xBF16_1_elem_1 = 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds as if there are more variants. List them all in one go? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could but I'm not sure if we will ever be able to use all of them them in any patterns. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like the translation of a hardware enumeration into tablegen speak. I'm hoping that one day we'll have a single point of definition for these, and the full list would make them more recognisable. |
||
|
||
bits<1> dynZeroAccum = 0; // 0 – Use default first accumulator input to the post-adder. 1 – Replace default first accumulator with zeros. | ||
bits<2> amode = 0; // Accumulator width (see above) | ||
bits<2> bmode = 0; // Multiplication precision (see above) | ||
|
@@ -59,6 +61,7 @@ class VecConf { | |
} | ||
|
||
def accfp32_vecconf : VecConf { let amode = AMODE_FP32; let bmode = BMODE_16x16; } | ||
def mulbf16_vecconf : VecConf { let amode = AMODE_FP32; let bmode = BMODE_16x16; let cmode = VARIANT_BF16xBF16_1_elem_1; } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is a local definition, I wouldn't mind using CMODE as prefix. |
||
|
||
/// Generic pattern classes | ||
class PatGpr<SDPatternOperator OpNode, AIE2PInst Inst, ValueType type> | ||
|
@@ -222,6 +225,26 @@ def : Pat<(fadd ACC2048:$acc1, ACC2048:$acc2), | |
def : Pat<(fsub ACC2048:$acc1, ACC2048:$acc2), | ||
(VSUB_f_vmac_cm2_add_reg ACC2048:$acc1, ACC2048:$acc2, (i32 accfp32_vecconf.ConfBits))>; | ||
|
||
// MUL | ||
def : Pat<(v64bf16 (fmul v64bf16:$vec1, v64bf16:$vec2)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Check: We are performing the same multiplication twice: one for extract lo and other to extract hi. I guess we cannot express an optimized reuse of the same VMUL here, right? |
||
(v64bf16 (REG_SEQUENCE VEC1024, | ||
(VCONV_bf16_fp32_mv_x_srs_bf | ||
(EXTRACT_SUBREG | ||
(VMUL_f_vmul_bf_vmul_bf_core_Y_Y VEC1024:$vec1, VEC1024:$vec2, (i32 mulbf16_vecconf.ConfBits)), | ||
sub_1024_acc_lo)), | ||
sub_512_lo, | ||
(VCONV_bf16_fp32_mv_x_srs_bf | ||
(EXTRACT_SUBREG | ||
(VMUL_f_vmul_bf_vmul_bf_core_Y_Y VEC1024:$vec1, VEC1024:$vec2, (i32 mulbf16_vecconf.ConfBits)), | ||
sub_1024_acc_hi)), | ||
sub_512_hi))>; | ||
|
||
def : Pat<(v32bf16 (fmul v32bf16:$vec1, v32bf16:$vec2)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't a standard legalization? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this case, I don't know any but for the wider |
||
(VCONV_bf16_fp32_mv_x_srs_bf | ||
(EXTRACT_SUBREG | ||
(VMUL_f_vmul_bf_vmul_bf_core_X_X VEC512:$vec1, VEC512:$vec2, (i32 mulbf16_vecconf.ConfBits)), | ||
sub_1024_acc_lo))>; | ||
|
||
// VMUL/VMAC Intrinsics | ||
|
||
def : Pat<(int_aie2p_I1024_I1024_ACC2048_addmac_conf VEC1024:$s1, VEC1024:$s2, ACC2048:$acc1, ACC2048:$acc2, eR:$acc), | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -225,12 +225,17 @@ AIE2PLegalizerInfo::AIE2PLegalizerInfo(const AIE2PSubtarget &ST) | |
|
||
getActionDefinitionsBuilder(G_FABS).customFor({S16, S32, S64}).scalarize(0); | ||
|
||
getActionDefinitionsBuilder(G_FMUL) | ||
.legalFor({V64S16, V32S16}) | ||
.customFor({S16}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to retain There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have custom legalization for S16 now, no need to clamp it to S32/S64. Any other scalar should be illegal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but why? the only float type under 16 bits we have is bfloat (aka S16) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True. Just pointing that we deviate from old behavior, s128 to s64 or s8 to s32. But you are right, it does not make sense for these types. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to have a comment to explain why we would customize this for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't have an instruction to multiply bf16 scalars, so instead of using an inefficient and potentially unsafe libcall (e.g. in the case of hardware loops) we need custom legalization by inserting the bf16 scalar into a vector, perform the element wise multiplication with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's the same as for FADD / FSUB. We implement a scalar multiplication by a full element by element vector mul. |
||
.libcallFor({S32, S64}); | ||
|
||
getActionDefinitionsBuilder({G_FADD, G_FSUB}) | ||
.legalFor({AccV64S32}) | ||
.customFor({S16}) | ||
.libcallFor({S32, S64}); | ||
|
||
getActionDefinitionsBuilder({G_FMUL, G_FDIV, G_FREM}) | ||
getActionDefinitionsBuilder({G_FDIV, G_FREM}) | ||
.clampScalar(0, S32, S64) | ||
.libcallFor({S32, S64}); | ||
|
||
|
@@ -723,6 +728,8 @@ bool AIE2PLegalizerInfo::legalizeCustom( | |
case TargetOpcode::G_FADD: | ||
case TargetOpcode::G_FSUB: | ||
return AIEHelper.legalizeG_FADD_G_FSUB(Helper, MI); | ||
case TargetOpcode::G_FMUL: | ||
return AIEHelper.legalizeG_FMUL(Helper, MI); | ||
case TargetOpcode::G_BUILD_VECTOR: | ||
return AIEHelper.legalizeG_BUILD_VECTOR(Helper, MI); | ||
case TargetOpcode::G_UNMERGE_VALUES: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,6 @@ | |
# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates | ||
|
||
# RUN: llc -mtriple aie2 -run-pass=legalizer %s -verify-machineinstrs -o - | FileCheck -DVER=2 --check-prefix=COMMON --check-prefix=AIE2 %s | ||
# RUN: llc -mtriple aie2p -run-pass=legalizer %s -verify-machineinstrs -o - | FileCheck -DVER=2p --check-prefix=COMMON --check-prefix=AIE2P %s | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We still have AIE2P checkline in the test. You could also remove |
||
|
||
--- | ||
name: test_fmul_bfloat16 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py | ||
# | ||
# This file is licensed under the Apache License v2.0 with LLVM Exceptions. | ||
# See https://llvm.org/LICENSE.txt for license information. | ||
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
# | ||
# (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates | ||
|
||
# RUN: llc -mtriple aie2p -run-pass=instruction-select %s -o - | FileCheck %s | ||
|
||
|
||
--- | ||
name: test_fmul_1024 | ||
legalized: true | ||
regBankSelected: true | ||
tracksRegLiveness: true | ||
body: | | ||
bb.1.entry: | ||
liveins: $y0, $y1 | ||
; CHECK-LABEL: name: test_fmul_1024 | ||
; CHECK: liveins: $y0, $y1 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:vec1024 = COPY $y0 | ||
; CHECK-NEXT: [[COPY1:%[0-9]+]]:vec1024 = COPY $y1 | ||
; CHECK-NEXT: [[MOV_RLC_imm11_pseudo:%[0-9]+]]:er = MOV_RLC_imm11_pseudo 60 | ||
; CHECK-NEXT: [[VMUL_f_vmul_bf_vmul_bf_core_Y_Y:%[0-9]+]]:edm = VMUL_f_vmul_bf_vmul_bf_core_Y_Y [[COPY]], [[COPY1]], [[MOV_RLC_imm11_pseudo]], implicit-def dead $srfpflags, implicit $crfpmask | ||
; CHECK-NEXT: [[MOV_RLC_imm11_pseudo1:%[0-9]+]]:er = MOV_RLC_imm11_pseudo 60 | ||
; CHECK-NEXT: [[VMUL_f_vmul_bf_vmul_bf_core_Y_Y1:%[0-9]+]]:edm = VMUL_f_vmul_bf_vmul_bf_core_Y_Y [[COPY]], [[COPY1]], [[MOV_RLC_imm11_pseudo1]], implicit-def dead $srfpflags, implicit $crfpmask | ||
; CHECK-NEXT: [[COPY2:%[0-9]+]]:ecmh = COPY [[VMUL_f_vmul_bf_vmul_bf_core_Y_Y1]].sub_1024_acc_hi | ||
; CHECK-NEXT: [[VCONV_bf16_fp32_mv_x_srs_bf:%[0-9]+]]:exo = VCONV_bf16_fp32_mv_x_srs_bf [[COPY2]], implicit-def dead $srf2fflags, implicit $crf2fmask, implicit $crrnd | ||
; CHECK-NEXT: [[COPY3:%[0-9]+]]:ecml = COPY [[VMUL_f_vmul_bf_vmul_bf_core_Y_Y]].sub_1024_acc_lo | ||
; CHECK-NEXT: [[VCONV_bf16_fp32_mv_x_srs_bf1:%[0-9]+]]:exe = VCONV_bf16_fp32_mv_x_srs_bf [[COPY3]], implicit-def dead $srf2fflags, implicit $crf2fmask, implicit $crrnd | ||
; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vec1024 = REG_SEQUENCE [[VCONV_bf16_fp32_mv_x_srs_bf1]], %subreg.sub_512_lo, [[VCONV_bf16_fp32_mv_x_srs_bf]], %subreg.sub_512_hi | ||
; CHECK-NEXT: PseudoRET implicit $lr, implicit [[REG_SEQUENCE]] | ||
%0:vregbank(<64 x s16>) = COPY $y0 | ||
%1:vregbank(<64 x s16>) = COPY $y1 | ||
%2:vregbank(<64 x s16>) = G_FMUL %0, %1 | ||
PseudoRET implicit $lr, implicit %2 | ||
... | ||
|
||
--- | ||
name: test_fmul_512 | ||
legalized: true | ||
regBankSelected: true | ||
tracksRegLiveness: true | ||
body: | | ||
bb.1.entry: | ||
liveins: $x0, $x1 | ||
; CHECK-LABEL: name: test_fmul_512 | ||
; CHECK: liveins: $x0, $x1 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:vec512 = COPY $x0 | ||
; CHECK-NEXT: [[COPY1:%[0-9]+]]:vec512 = COPY $x1 | ||
; CHECK-NEXT: [[MOV_RLC_imm11_pseudo:%[0-9]+]]:er = MOV_RLC_imm11_pseudo 60 | ||
; CHECK-NEXT: [[VMUL_f_vmul_bf_vmul_bf_core_X_X:%[0-9]+]]:edm = VMUL_f_vmul_bf_vmul_bf_core_X_X [[COPY]], [[COPY1]], [[MOV_RLC_imm11_pseudo]], implicit-def dead $srfpflags, implicit $crfpmask | ||
; CHECK-NEXT: [[COPY2:%[0-9]+]]:ecml = COPY [[VMUL_f_vmul_bf_vmul_bf_core_X_X]].sub_1024_acc_lo | ||
; CHECK-NEXT: [[VCONV_bf16_fp32_mv_x_srs_bf:%[0-9]+]]:vec512 = VCONV_bf16_fp32_mv_x_srs_bf [[COPY2]], implicit-def dead $srf2fflags, implicit $crf2fmask, implicit $crrnd | ||
; CHECK-NEXT: PseudoRET implicit $lr, implicit [[VCONV_bf16_fp32_mv_x_srs_bf]] | ||
%0:vregbank(<32 x s16>) = COPY $x0 | ||
%1:vregbank(<32 x s16>) = COPY $x1 | ||
%2:vregbank(<32 x s16>) = G_FMUL %0, %1 | ||
PseudoRET implicit $lr, implicit %2 | ||
... |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4 | ||
# This file is licensed under the Apache License v2.0 with LLVM Exceptions. | ||
# See https://llvm.org/LICENSE.txt for license information. | ||
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
# | ||
# (c) Copyright 2024 Advanced Micro Devices, Inc. or its affiliates | ||
|
||
# RUN: llc -mtriple aie2p -run-pass=legalizer %s -verify-machineinstrs -o - | FileCheck %s | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice to include the libcall tests as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We already didn't have them but I will add them while at it. |
||
|
||
--- | ||
name: test_fmul_s16 | ||
body: | | ||
bb.0: | ||
liveins: $r1, $r2 | ||
; CHECK-LABEL: name: test_fmul_s16 | ||
; CHECK: liveins: $r1, $r2 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $r1 | ||
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32) | ||
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $r2 | ||
; CHECK-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32) | ||
; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 | ||
; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<32 x s16>) = G_IMPLICIT_DEF | ||
; CHECK-NEXT: [[AIE_INSERT_VECTOR_ELT:%[0-9]+]]:_(<32 x s16>) = G_AIE_INSERT_VECTOR_ELT [[DEF]], [[TRUNC]](s16), [[C]](s32) | ||
; CHECK-NEXT: [[AIE_INSERT_VECTOR_ELT1:%[0-9]+]]:_(<32 x s16>) = G_AIE_INSERT_VECTOR_ELT [[DEF]], [[TRUNC1]](s16), [[C]](s32) | ||
; CHECK-NEXT: [[FMUL:%[0-9]+]]:_(<32 x s16>) = G_FMUL [[AIE_INSERT_VECTOR_ELT]], [[AIE_INSERT_VECTOR_ELT1]] | ||
; CHECK-NEXT: [[AIE_SEXT_EXTRACT_VECTOR_ELT:%[0-9]+]]:_(s32) = G_AIE_SEXT_EXTRACT_VECTOR_ELT [[FMUL]](<32 x s16>), [[C]](s32) | ||
; CHECK-NEXT: [[ASSERT_SEXT:%[0-9]+]]:_(s32) = G_ASSERT_SEXT [[AIE_SEXT_EXTRACT_VECTOR_ELT]], 16 | ||
; CHECK-NEXT: $r0 = COPY [[ASSERT_SEXT]](s32) | ||
; CHECK-NEXT: PseudoRET implicit $lr, implicit $r0 | ||
%0:_(s32) = COPY $r1 | ||
%1:_(s16) = G_TRUNC %0(s32) | ||
%2:_(s32) = COPY $r2 | ||
%3:_(s16) = G_TRUNC %2(s32) | ||
%4:_(s16) = G_FMUL %1, %3 | ||
%5:_(s32) = G_ANYEXT %4(s16) | ||
$r0 = COPY %5(s32) | ||
PseudoRET implicit $lr, implicit $r0 | ||
... | ||
|
||
--- | ||
name: test_fmul_vec_1024 | ||
body: | | ||
bb.0: | ||
liveins: $dm0, $dm1 | ||
; CHECK-LABEL: name: test_fmul_vec_1024 | ||
; CHECK: liveins: $dm0, $dm1 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<64 x s16>) = COPY $cml0 | ||
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<64 x s16>) = COPY $cml1 | ||
; CHECK-NEXT: [[FMUL:%[0-9]+]]:_(<64 x s16>) = G_FMUL [[COPY]], [[COPY1]] | ||
; CHECK-NEXT: $cml0 = COPY [[FMUL]](<64 x s16>) | ||
; CHECK-NEXT: PseudoRET implicit $lr, implicit $cml0 | ||
%0:_(<64 x s16>) = COPY $cml0 | ||
%1:_(<64 x s16>) = COPY $cml1 | ||
%2:_(<64 x s16>) = G_FMUL %0, %1 | ||
$cml0 = COPY %2(<64 x s16>) | ||
PseudoRET implicit $lr, implicit $cml0 | ||
... | ||
|
||
--- | ||
name: test_fmul_vec_512 | ||
body: | | ||
bb.0: | ||
liveins: $dm0, $dm1 | ||
; CHECK-LABEL: name: test_fmul_vec_512 | ||
; CHECK: liveins: $dm0, $dm1 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<32 x s16>) = COPY $bmll0 | ||
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<32 x s16>) = COPY $bmll1 | ||
; CHECK-NEXT: [[FMUL:%[0-9]+]]:_(<32 x s16>) = G_FMUL [[COPY]], [[COPY1]] | ||
; CHECK-NEXT: $bmll0 = COPY [[FMUL]](<32 x s16>) | ||
; CHECK-NEXT: PseudoRET implicit $lr, implicit $bmll0 | ||
%0:_(<32 x s16>) = COPY $bmll0 | ||
%1:_(<32 x s16>) = COPY $bmll1 | ||
%2:_(<32 x s16>) = G_FMUL %0, %1 | ||
$bmll0 = COPY %2(<32 x s16>) | ||
PseudoRET implicit $lr, implicit $bmll0 | ||
... | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be cheaper to broadcast? Or is this picked up by a push.lo?