[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 #145632

shiltian · 2025-06-25T03:36:45Z

Co-authored-by: Shilei Tian [email protected]

shiltian · 2025-06-25T03:37:09Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

shiltian · 2025-06-25T03:37:55Z

@rampitec For some reason, the instruction encoding works, but the decoder doesn't work. Those dpp decoding tests failed. It looks like we missed something here.

rampitec · 2025-06-25T07:56:14Z

@rampitec For some reason, the instruction encoding works, but the decoder doesn't work. Those dpp decoding tests failed. It looks like we missed something here.

Check generated lib/Target/AMDGPU/AMDGPUGenDisassemblerTables.inc.

Search the instruction name. Go to AMDGPUDisassembler.cpp and add line processing that table. DecoderTableGFX125064?

I.e., this?

static const uint8_t DecoderTableGFX125064[] = {
...
/* 220 */     MCD::OPC_Decode, 138, 179, 2, 144, 3, // Opcode: V_CVT_F32_BF16V_CVT_F32_BF16_gfx1250_t16_dpp_gfx1250, DecodeIdx: 400
/* 226 */     MCD::OPC_CheckPredicateOrFail, 191, 2,
/* 229 */     MCD::OPC_Decode, 137, 179, 2, 145, 3, // Opcode: V_CVT_F32_BF16V_CVT_F32_BF16_gfx1250_t16_dpp8_gfx1250, DecodeIdx: 401
/* 235 */     MCD::OPC_FilterValue, 115, 36, 0, 0, // Skip to: 276

and this:

      if (isGFX1250() &&
          tryDecodeInst(DecoderTableGFX125064, DecoderTableGFX1250_FAKE1664, MI,
                        QW, Address, CS))
        break;

rampitec · 2025-06-25T09:03:02Z

@rampitec For some reason, the instruction encoding works, but the decoder doesn't work. Those dpp decoding tests failed. It looks like we missed something here.

Check generated lib/Target/AMDGPU/AMDGPUGenDisassemblerTables.inc.

Search the instruction name. Go to AMDGPUDisassembler.cpp and add line processing that table. DecoderTableGFX125064?

I.e., this?
static const uint8_t DecoderTableGFX125064[] = {
...
/* 220 */     MCD::OPC_Decode, 138, 179, 2, 144, 3, // Opcode: V_CVT_F32_BF16V_CVT_F32_BF16_gfx1250_t16_dpp_gfx1250, DecodeIdx: 400
/* 226 */     MCD::OPC_CheckPredicateOrFail, 191, 2,
/* 229 */     MCD::OPC_Decode, 137, 179, 2, 145, 3, // Opcode: V_CVT_F32_BF16V_CVT_F32_BF16_gfx1250_t16_dpp8_gfx1250, DecodeIdx: 401
/* 235 */     MCD::OPC_FilterValue, 115, 36, 0, 0, // Skip to: 276
and this:
      if (isGFX1250() &&
          tryDecodeInst(DecoderTableGFX125064, DecoderTableGFX1250_FAKE1664, MI,
                        QW, Address, CS))
        break;

Note that FAKE16 might not exist yet. I have couple of cases with TODO comments already, because we cannot upstream all at once, and we cannot rollback t16 support already in upstream.

llvmbot · 2025-06-25T15:59:42Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-mc

Author: Shilei Tian (shiltian)

Changes

Co-authored-by: Shilei Tian <[email protected]>

Patch is 56.70 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145632.diff

21 Files Affected:

(modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+12-3)
(modified) llvm/lib/Target/AMDGPU/VOP1Instructions.td (+23-2)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s (+47)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s (+50)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s (+59)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s (+63)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s (+15)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s (+19)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop1_err.s (+16)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1-fake16.s (+77)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1.s (+80)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16-fake16.s (+47)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp16.s (+51)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8-fake16.s (+7)
(added) llvm/test/MC/AMDGPU/gfx1250_asm_vop3_from_vop1_dpp8.s (+11)
(added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1.txt (+51)
(added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1_dpp16.txt (+48)
(added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop1_dpp8.txt (+15)
(added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1.txt (+49)
(added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1_dpp16.txt (+51)
(added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_dasm_vop3_from_vop1_dpp8.txt (+11)

diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 07a4292ef28bc..59c72fcbff18a 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -599,6 +599,11 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
                         DecW, Address, CS))
         break;
 
+      if (isGFX1250() &&
+          tryDecodeInst(DecoderTableGFX125096, DecoderTableGFX1250_FAKE1696, MI,
+                        DecW, Address, CS))
+        break;
+
       if (isGFX12() &&
           tryDecodeInst(DecoderTableGFX1296, DecoderTableGFX12_FAKE1696, MI,
                         DecW, Address, CS))
@@ -661,6 +666,11 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
       if (isGFX10() && tryDecodeInst(DecoderTableGFX1064, MI, QW, Address, CS))
         break;
 
+      if (isGFX1250() &&
+          tryDecodeInst(DecoderTableGFX125064, DecoderTableGFX1250_FAKE1664, MI,
+                        QW, Address, CS))
+        break;
+
       if (isGFX12() &&
           tryDecodeInst(DecoderTableGFX1264, DecoderTableGFX12_FAKE1664, MI, QW,
                         Address, CS))
@@ -717,10 +727,9 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
                         Address, CS))
         break;
 
-      // FIXME: Should use DecoderTableGFX1250_FAKE1632, but it is not generated
-      //        yet.
       if (isGFX1250() &&
-          tryDecodeInst(DecoderTableGFX125032, MI, DW, Address, CS))
+          tryDecodeInst(DecoderTableGFX125032, DecoderTableGFX1250_FAKE1632, MI,
+                        DW, Address, CS))
         break;
 
       if (isGFX12() &&
diff --git a/llvm/lib/Target/AMDGPU/VOP1Instructions.td b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
index 02b912bcfb9e0..4ef7a34261b6f 100644
--- a/llvm/lib/Target/AMDGPU/VOP1Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
@@ -259,6 +259,12 @@ foreach vt = Reg32Types.types in {
   >;
 }
 
+let HasOMod = 0, HasClamp = 0 in {
+  def VOPProfile_CVT_F32_BF16_gfx1250_t16 : VOPProfile_True16 <VOP_F32_BF16>;
+  let HasOpSel = 1, EmitDstSel = 0 in
+  def VOPProfile_CVT_F32_BF16_gfx1250_fake16 : VOPProfile_Fake16 <VOP_F32_BF16>;
+} // End HasOMod = 0, HasClamp = 0
+
 let isReMaterializable = 1 in {
 let SchedRW = [WriteDoubleCvt] in {
 // OMod clears exceptions when set in this instruction
@@ -309,8 +315,14 @@ let OtherPredicates = [UseRealTrue16Insts] in
 let OtherPredicates = [UseFakeTrue16Insts] in
   defm V_CVT_F32_F16_fake16 : VOP1Inst <"v_cvt_f32_f16_fake16", VOPProfile_Fake16<VOP_F32_F16>, any_fpextend>;
 
-let SubtargetPredicate = HasBF16ConversionInsts in
-defm V_CVT_F32_BF16 : VOP1Inst_t16 <"v_cvt_f32_bf16", VOP_F32_BF16>;
+let SubtargetPredicate = HasGFX950Insts, OtherPredicates = [HasBF16ConversionInsts] in {
+  defm V_CVT_F32_BF16 : VOP1Inst_t16 <"v_cvt_f32_bf16", VOP_F32_BF16>;
+}
+let SubtargetPredicate = isGFX1250Plus, OtherPredicates = [HasBF16ConversionInsts] in {
+  defm V_CVT_F32_BF16_gfx1250 : VOP1Inst_t16_with_profiles <"v_cvt_f32_bf16_gfx1250", VOP_F32_BF16,
+                                                            VOPProfile_CVT_F32_BF16_gfx1250_t16,
+                                                            VOPProfile_CVT_F32_BF16_gfx1250_fake16>;
+}
 
 let ReadsModeReg = 0, mayRaiseFPException = 0 in {
 defm V_CVT_RPI_I32_F32 : VOP1Inst <"v_cvt_rpi_i32_f32", VOP_I32_F32, cvt_rpi_i32_f32>;
@@ -980,6 +992,13 @@ multiclass VOP1_Real_NO_DPP_OP_SEL_with_name<GFXGen Gen, bits<9> op,
   VOP1_Real_e32_with_name<Gen, op, opName, asmName>,
   VOP3_Real_with_name<Gen, {0, 1, 1, op{6-0}}, opName, asmName>;
 
+multiclass VOP1_Real_FULL_t16_and_fake16_gfx1250<
+    bits<9> op, string asmName = !tolower(NAME), string opName = NAME> {
+  defm opName#"_t16" :
+       VOP1_Real_FULL_with_name<GFX1250Gen, op, opName#"_t16", asmName>;
+  defm opName#"_fake16":
+       VOP1_Real_FULL_with_name<GFX1250Gen, op, opName#"_fake16", asmName>;
+}
 
 defm V_CVT_F32_FP8      : VOP1_Real_FULL_with_name<GFX12Gen, 0x06c, "V_CVT_F32_FP8_OP_SEL", "v_cvt_f32_fp8">;
 defm V_CVT_F32_BF8      : VOP1_Real_FULL_with_name<GFX12Gen, 0x06d, "V_CVT_F32_BF8_OP_SEL", "v_cvt_f32_bf8">;
@@ -1042,6 +1061,8 @@ defm V_CVT_NORM_U16_F16      : VOP1_Real_FULL_t16_and_fake16_gfx11_gfx12<0x064>;
 defm V_CVT_F16_F32           : VOP1_Real_FULL_t16_and_fake16_gfx11_gfx12<0x00a>;
 defm V_CVT_F32_F16           : VOP1_Real_FULL_t16_and_fake16_gfx11_gfx12<0x00b>;
 
+defm V_CVT_F32_BF16          : VOP1_Real_FULL_t16_and_fake16_gfx1250<0x072, "v_cvt_f32_bf16", "V_CVT_F32_BF16_gfx1250">;
+
 //===----------------------------------------------------------------------===//
 // GFX10.
 //===----------------------------------------------------------------------===//
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s
new file mode 100644
index 0000000000000..9b2e506d4e043
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1-fake16.s
@@ -0,0 +1,47 @@
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 5
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=-real-true16 -show-encoding %s | FileCheck --check-prefix=GFX1250 %s
+
+v_cvt_f32_bf16 v5, v1
+// GFX1250: v_cvt_f32_bf16_e32 v5, v1               ; encoding: [0x01,0xe5,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, v127
+// GFX1250: v_cvt_f32_bf16_e32 v5, v127             ; encoding: [0x7f,0xe5,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, s1
+// GFX1250: v_cvt_f32_bf16_e32 v5, s1               ; encoding: [0x01,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, s105
+// GFX1250: v_cvt_f32_bf16_e32 v5, s105             ; encoding: [0x69,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, vcc_lo
+// GFX1250: v_cvt_f32_bf16_e32 v5, vcc_lo           ; encoding: [0x6a,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, vcc_hi
+// GFX1250: v_cvt_f32_bf16_e32 v5, vcc_hi           ; encoding: [0x6b,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, ttmp15
+// GFX1250: v_cvt_f32_bf16_e32 v5, ttmp15           ; encoding: [0x7b,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, m0
+// GFX1250: v_cvt_f32_bf16_e32 v5, m0               ; encoding: [0x7d,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, exec_lo
+// GFX1250: v_cvt_f32_bf16_e32 v5, exec_lo          ; encoding: [0x7e,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, exec_hi
+// GFX1250: v_cvt_f32_bf16_e32 v5, exec_hi          ; encoding: [0x7f,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, null
+// GFX1250: v_cvt_f32_bf16_e32 v5, null             ; encoding: [0x7c,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, -1
+// GFX1250: v_cvt_f32_bf16_e32 v5, -1               ; encoding: [0xc1,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, 0.5
+// GFX1250: v_cvt_f32_bf16_e32 v5, 0.5              ; encoding: [0xf0,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, src_scc
+// GFX1250: v_cvt_f32_bf16_e32 v5, src_scc          ; encoding: [0xfd,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v127, 0x8000
+// GFX1250: v_cvt_f32_bf16_e32 v127, 0x8000         ; encoding: [0xff,0xe4,0xfe,0x7e,0x00,0x80,0x00,0x00]
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s
new file mode 100644
index 0000000000000..1d1badc4f009b
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1.s
@@ -0,0 +1,50 @@
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 5
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=+real-true16 -show-encoding %s | FileCheck --check-prefix=GFX1250 %s
+
+v_cvt_f32_bf16 v5, v1
+// GFX1250: v_cvt_f32_bf16_e32 v5, v1               ; encoding: [0x01,0xe5,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, v127
+// GFX1250: v_cvt_f32_bf16_e32 v5, v127             ; encoding: [0x7f,0xe5,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, s1
+// GFX1250: v_cvt_f32_bf16_e32 v5, s1               ; encoding: [0x01,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, s105
+// GFX1250: v_cvt_f32_bf16_e32 v5, s105             ; encoding: [0x69,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, vcc_lo
+// GFX1250: v_cvt_f32_bf16_e32 v5, vcc_lo           ; encoding: [0x6a,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, vcc_hi
+// GFX1250: v_cvt_f32_bf16_e32 v5, vcc_hi           ; encoding: [0x6b,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, ttmp15
+// GFX1250: v_cvt_f32_bf16_e32 v5, ttmp15           ; encoding: [0x7b,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, m0
+// GFX1250: v_cvt_f32_bf16_e32 v5, m0               ; encoding: [0x7d,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, exec_lo
+// GFX1250: v_cvt_f32_bf16_e32 v5, exec_lo          ; encoding: [0x7e,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, exec_hi
+// GFX1250: v_cvt_f32_bf16_e32 v5, exec_hi          ; encoding: [0x7f,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, null
+// GFX1250: v_cvt_f32_bf16_e32 v5, null             ; encoding: [0x7c,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, -1
+// GFX1250: v_cvt_f32_bf16_e32 v5, -1               ; encoding: [0xc1,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, 0.5
+// GFX1250: v_cvt_f32_bf16_e32 v5, 0.5              ; encoding: [0xf0,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v5, src_scc
+// GFX1250: v_cvt_f32_bf16_e32 v5, src_scc          ; encoding: [0xfd,0xe4,0x0a,0x7e]
+
+v_cvt_f32_bf16 v127, 0x8000
+// GFX1250: v_cvt_f32_bf16_e32 v127, 0x8000         ; encoding: [0xff,0xe4,0xfe,0x7e,0x00,0x80,0x00,0x00]
+
+v_cvt_f32_bf16 v5, v1.h
+// GFX1250: v_cvt_f32_bf16_e32 v5, v1.h             ; encoding: [0x81,0xe5,0x0a,0x7e]
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s
new file mode 100644
index 0000000000000..18b6c91e2fb8c
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16-fake16.s
@@ -0,0 +1,59 @@
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 5
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=-real-true16 -show-encoding %s | FileCheck --check-prefixes=GFX1250 %s
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1200 -mattr=-real-true16 -show-encoding %s 2>&1 | FileCheck --check-prefix=GFX12-ERR --implicit-check-not=error: --strict-whitespace %s
+
+v_cvt_f32_bf16 v5, v1 quad_perm:[3,2,1,0]
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 quad_perm:[3,2,1,0] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x1b,0x00,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 quad_perm:[0,1,2,3]
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0xe4,0x00,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_mirror
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_mirror row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x40,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_half_mirror
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_half_mirror row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x41,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_shl:1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_shl:1 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x01,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_shl:15
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_shl:15 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x0f,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_shr:1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_shr:1 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x11,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_shr:15
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_shr:15 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x1f,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_ror:1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_ror:1 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x21,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_ror:15
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_ror:15 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x2f,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_share:0 row_mask:0xf bank_mask:0xf
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_share:0 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x50,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_share:15 row_mask:0x0 bank_mask:0x1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_share:15 row_mask:0x0 bank_mask:0x1 ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x5f,0x01,0x01]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_xmask:0 row_mask:0x1 bank_mask:0x3 bound_ctrl:1 fi:0
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_xmask:0 row_mask:0x1 bank_mask:0x3 bound_ctrl:1 ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x60,0x09,0x13]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v127, -|v127| row_xmask:15 row_mask:0x3 bank_mask:0x0 bound_ctrl:0 fi:1
+// GFX1250: v_cvt_f32_bf16_dpp v127, -|v127| row_xmask:15 row_mask:0x3 bank_mask:0x0 fi:1 ; encoding: [0xfa,0xe4,0xfe,0x7e,0x7f,0x6f,0x35,0x30]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s
new file mode 100644
index 0000000000000..0f71c46eb4725
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp16.s
@@ -0,0 +1,63 @@
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 5
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=+real-true16 -show-encoding %s | FileCheck --check-prefixes=GFX1250 %s
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1200 -mattr=+real-true16 -show-encoding %s 2>&1 | FileCheck --check-prefix=GFX12-ERR --implicit-check-not=error: --strict-whitespace %s
+
+v_cvt_f32_bf16 v5, v1 quad_perm:[3,2,1,0]
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 quad_perm:[3,2,1,0] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x1b,0x00,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 quad_perm:[0,1,2,3]
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0xe4,0x00,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_mirror
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_mirror row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x40,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_half_mirror
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_half_mirror row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x41,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_shl:1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_shl:1 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x01,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_shl:15
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_shl:15 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x0f,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_shr:1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_shr:1 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x11,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_shr:15
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_shr:15 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x1f,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_ror:1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_ror:1 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x21,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_ror:15
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_ror:15 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x2f,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_share:0 row_mask:0xf bank_mask:0xf
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_share:0 row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x50,0x01,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_share:15 row_mask:0x0 bank_mask:0x1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_share:15 row_mask:0x0 bank_mask:0x1 ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x5f,0x01,0x01]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 row_xmask:0 row_mask:0x1 bank_mask:0x3 bound_ctrl:1 fi:0
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 row_xmask:0 row_mask:0x1 bank_mask:0x3 bound_ctrl:1 ; encoding: [0xfa,0xe4,0x0a,0x7e,0x01,0x60,0x09,0x13]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v127, -|v127| row_xmask:15 row_mask:0x3 bank_mask:0x0 bound_ctrl:0 fi:1
+// GFX1250: v_cvt_f32_bf16_dpp v127, -|v127| row_xmask:15 row_mask:0x3 bank_mask:0x0 fi:1 ; encoding: [0xfa,0xe4,0xfe,0x7e,0x7f,0x6f,0x35,0x30]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1.h quad_perm:[3,2,1,0]
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1.h quad_perm:[3,2,1,0] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xe4,0x0a,0x7e,0x81,0x1b,0x00,0xff]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s
new file mode 100644
index 0000000000000..c973022dbeca6
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8-fake16.s
@@ -0,0 +1,15 @@
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 5
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=-real-true16 -show-encoding %s | FileCheck --check-prefixes=GFX1250 %s
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1200 -mattr=-real-true16 -show-encoding %s 2>&1 | FileCheck --check-prefix=GFX12-ERR --implicit-check-not=error: --strict-whitespace %s
+
+v_cvt_f32_bf16 v5, v1 dpp8:[7,6,5,4,3,2,1,0]
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 dpp8:[7,6,5,4,3,2,1,0] ; encoding: [0xe9,0xe4,0x0a,0x7e,0x01,0x77,0x39,0x05]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v5, v1 dpp8:[7,6,5,4,3,2,1,0] fi:1
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 dpp8:[7,6,5,4,3,2,1,0] fi:1 ; encoding: [0xea,0xe4,0x0a,0x7e,0x01,0x77,0x39,0x05]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
+
+v_cvt_f32_bf16 v127, v127 dpp8:[0,0,0,0,0,0,0,0] fi:0
+// GFX1250: v_cvt_f32_bf16_dpp v127, v127 dpp8:[0,0,0,0,0,0,0,0] ; encoding: [0xe9,0xe4,0xfe,0x7e,0x7f,0x00,0x00,0x00]
+// GFX12-ERR: :[[@LINE-2]]:1: error: instruction not supported on this GPU
diff --git a/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s
new file mode 100644
index 0000000000000..8a5f3cba2fbc0
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/gfx1250_asm_vop1_dpp8.s
@@ -0,0 +1,19 @@
+// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 5
+// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=+real-true16 -show-encoding %s | FileCheck --check-prefixes=GFX1250 %s
+// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1200 -mattr=+real-true16 -show-encoding %s 2>&1 | FileCheck --check-prefix=GFX12-ERR --implicit-check-not=error: --strict-whitespace %s
+
+v_cvt_f32_bf16 v5, v1 dpp8:[7,6,5,4,3,2,1,0]
+// GFX1250: v_cvt_f32_bf16_dpp v5, v1 dpp8:[7,6,5,4,3,2,1,0] ; encoding: [0xe9,0xe4,0x0a,0x7e,0x01,0x77,0x39,0x05]
...
[truncated]

Co-authored-by: Shilei Tian <[email protected]>

shiltian · 2025-06-25T20:01:08Z

Merge activity

Jun 25, 8:01 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Jun 25, 8:02 PM UTC: @shiltian merged this pull request with Graphite.

Co-authored-by: Shilei Tian <[email protected]>

shiltian requested a review from rampitec June 25, 2025 03:36

shiltian force-pushed the users/shiltian/v_cvt_f32_bf16 branch from 983f052 to 7484261 Compare June 25, 2025 15:57

shiltian marked this pull request as ready for review June 25, 2025 15:59

llvmbot added backend:AMDGPU mc Machine (object) code labels Jun 25, 2025

shiltian force-pushed the users/shiltian/v_cvt_f32_bf16 branch from 7484261 to a905b76 Compare June 25, 2025 16:01

shiltian changed the title ~~[AMDGPU] Add the support for v_cvt_f32_bf16 on gfx1250~~ [AMDGPU] Add the support for v_cvt_f32_bf16 on gfx1250 Jun 25, 2025

shiltian force-pushed the users/shiltian/v_cvt_f32_bf16 branch 2 times, most recently from 2e6b53e to c653b0c Compare June 25, 2025 17:40

This was referenced Jun 25, 2025

[AMDGPU] Add support for v_cvt_pk_f16_fp8 on gfx1250 #145747

Merged

[AMDGPU] Add support for v_cvt_pk_f16_bf8 on gfx1250 #145753

Merged

shiltian requested review from jayfoad, changpeng and kosarev June 25, 2025 18:32

[AMDGPU] Add the support for v_cvt_f32_bf16 on gfx1250

aeadc62

Co-authored-by: Shilei Tian <[email protected]>

shiltian force-pushed the users/shiltian/v_cvt_f32_bf16 branch from c653b0c to aeadc62 Compare June 25, 2025 18:33

rampitec approved these changes Jun 25, 2025

View reviewed changes

shiltian merged commit 473f992 into main Jun 25, 2025
7 checks passed

shiltian deleted the users/shiltian/v_cvt_f32_bf16 branch June 25, 2025 20:02

anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025

[AMDGPU] Add the support for v_cvt_f32_bf16 on gfx1250 (llvm#145632)

0f859f6

Co-authored-by: Shilei Tian <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 #145632

[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 #145632

Uh oh!

shiltian commented Jun 25, 2025

Uh oh!

shiltian commented Jun 25, 2025 •

edited

Loading

Uh oh!

shiltian commented Jun 25, 2025

Uh oh!

rampitec commented Jun 25, 2025 •

edited

Loading

Uh oh!

rampitec commented Jun 25, 2025

Uh oh!

llvmbot commented Jun 25, 2025 •

edited

Loading

Uh oh!

shiltian commented Jun 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] Add the support for v_cvt_f32_bf16 on gfx1250 #145632

[AMDGPU] Add the support for v_cvt_f32_bf16 on gfx1250 #145632

Uh oh!

Conversation

shiltian commented Jun 25, 2025

Uh oh!

shiltian commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Jun 25, 2025

Uh oh!

rampitec commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rampitec commented Jun 25, 2025

Uh oh!

llvmbot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 #145632

[AMDGPU] Add the support for `v_cvt_f32_bf16` on gfx1250 #145632

shiltian commented Jun 25, 2025 •

edited

Loading

rampitec commented Jun 25, 2025 •

edited

Loading

llvmbot commented Jun 25, 2025 •

edited

Loading

shiltian commented Jun 25, 2025 •

edited

Loading