Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remaining instructions #27

Open
Qazalin opened this issue Oct 14, 2024 · 0 comments
Open

remaining instructions #27

Qazalin opened this issue Oct 14, 2024 · 0 comments

Comments

@Qazalin
Copy link
Owner

Qazalin commented Oct 14, 2024

Total: ~490 instructions

Category by engineering effort

  1. Easy, maybe 5 minutes max per instruction. 61%
  2. Medium, up to 1 hr, 9%
  3. Hard, 1 week or more, 30% (I haven't seen the AMD compiler use these that often either)

Easy encodings

remu has all the infra to support these, diff would look like eaa86fb.

opcode list
Encoding Op #
SOPK S_ADD_U32 0
SOPK S_XOR_B64 27
SOPK S_SUB_U32 1
SOPK S_NAND_B32 28
SOPK S_ADD_I32 2
SOPK S_NAND_B64 29
SOPK S_SUB_I32 3
SOPK S_NOR_B32 30
SOPK S_ADDC_U32 4
SOPK S_NOR_B64 31
SOPK S_SUBB_U32 5
SOPK S_XNOR_B32 32
SOPK S_ABSDIFF_I32 6
SOPK S_XNOR_B64 33
SOPK S_LSHL_B32 8
SOPK S_AND_NOT1_B32 34
SOPK S_LSHL_B64 9
SOPK S_AND_NOT1_B64 35
SOPK S_LSHR_B32 10
SOPK S_OR_NOT1_B32 36
SOPK S_LSHR_B64 11
SOPK S_OR_NOT1_B64 37
SOPK S_ASHR_I32 12
SOPK S_BFE_U32 38
SOPK S_ASHR_I64 13
SOPK S_BFE_I32 39
SOPK S_LSHL1_ADD_U32 14
SOPK S_BFE_U64 40
SOPK S_LSHL2_ADD_U32 15
SOPK S_BFE_I64 41
SOPK S_LSHL3_ADD_U32 16
SOPK S_BFM_B32 42
SOPK S_LSHL4_ADD_U32 17
SOPK S_BFM_B64 43
SOPK S_MIN_I32 18
SOPK S_MUL_I32 44
SOPK S_MIN_U32 19
SOPK S_MUL_HI_U32 45
SOPK S_MAX_I32 20
SOPK S_MUL_HI_I32 46
SOPK S_MAX_U32 21
SOPK S_CSELECT_B32 48
SOPK S_AND_B32 22
SOPK S_CSELECT_B64 49
SOPK S_AND_B64 23
SOPK S_PACK_LL_B32_B16 50
SOPK S_OR_B32 24
SOPK S_PACK_LH_B32_B16 51
SOPK S_OR_B64 25
SOPK S_PACK_HH_B32_B16 52
SOPK S_XOR_B32 26
SOPK S_PACK_HL_B32_B16 53
SOP1 S_MOVK_I32 0
SOP1 S_CMPK_LT_U32 13
SOP1 S_VERSION 1
SOP1 S_CMPK_LE_U32 14
SOP1 S_CMOVK_I32 2
SOP1 S_ADDK_I32 15
SOP1 S_CMPK_EQ_I32 3
SOP1 S_MULK_I32 16
SOP1 S_CMPK_LG_I32 4
SOP1 S_GETREG_B32 17
SOP1 S_CMPK_GT_I32 5
SOP1 S_SETREG_B32 18
SOP1 S_CMPK_GE_I32 6
SOP1 S_SETREG_IMM32_B32 19
SOP1 S_CMPK_LT_I32 7
SOP1 S_CALL_B64 20
SOP1 S_CMPK_LE_I32 8
SOP1 S_WAITCNT_VSCNT 24
SOP1 S_CMPK_EQ_U32 9
SOP1 S_WAITCNT_VMCNT 25
SOP1 S_CMPK_LG_U32 10
SOP1 S_WAITCNT_EXPCNT 26
SOP1 S_CMPK_GT_U32 11
SOP1 S_WAITCNT_LGKMCNT 27
SOP1 S_CMPK_GE_U32 12
S_CMP_EQ_I32 0
S_CMP_GE_U32 9
S_CMP_LG_I32 1
S_CMP_LT_U32 10
S_CMP_GT_I32 2
S_CMP_LE_U32 11
S_CMP_GE_I32 3
S_BITCMP0_B32 12
S_CMP_LT_I32 4
S_BITCMP1_B32 13
S_CMP_LE_I32 5
S_BITCMP0_B64 14
S_CMP_EQ_U32 6
S_BITCMP1_B64 15
S_CMP_LG_U32 7
S_CMP_EQ_U64 16
S_CMP_GT_U32 8
S_CMP_LG_U64 17
SOPP S_NOP 0
SOPP S_CBRANCH_VCCNZ 36
SOPP S_SETKILL 1
SOPP S_CBRANCH_EXECZ 37
SOPP S_SETHALT 2
SOPP S_CBRANCH_EXECNZ 38
SOPP S_SLEEP 3
SOPP S_CBRANCH_CDBGSYS 39
SOPP S_SET_INST_PREFETCH_DISTANCE 4
SOPP S_CBRANCH_CDBGUSER 40
SOPP S_CLAUSE 5
SOPP S_CBRANCH_CDBGSYS_OR_USER 41
SOPP S_DELAY_ALU 7
SOPP S_CBRANCH_CDBGSYS_AND_USER 42
SOPP S_SETPRIO 53
SOPP S_ROUND_MODE 17
SOPP S_SENDMSG 54
SOPP S_DENORM_MODE 18
SOPP S_SENDMSGHALT 55
SOPP S_INCPERFLEVEL 56
SOPP S_DECPERFLEVEL 57
SOPP S_ICACHE_INV 60
SMEM S_LOAD_B32 0
SMEM S_BUFFER_LOAD_B64 9
SMEM S_LOAD_B64 1
SMEM S_BUFFER_LOAD_B128 10
SMEM S_LOAD_B128 2
SMEM S_BUFFER_LOAD_B256 11
SMEM S_LOAD_B256 3
SMEM S_BUFFER_LOAD_B512 12
SMEM S_LOAD_B512 4
SMEM S_GL1_INV 32
SMEM S_BUFFER_LOAD_B32 8
SMEM S_DCACHE_INV 33
VOP1 V_CNDMASK_B32 1
VOP1 V_XOR_B32 29
VOP1 V_DOT2ACC_F32_F16 2
VOP1 V_XNOR_B32 30
VOP1 V_ADD_F32 3
VOP1 V_ADD_CO_CI_U32 32
VOP1 V_SUB_F32 4
VOP1 V_SUB_CO_CI_U32 33
VOP1 V_SUBREV_F32 5
VOP1 V_SUBREV_CO_CI_U32 34
VOP1 V_FMAC_DX9_ZERO_F32 6
VOP1 V_ADD_NC_U32 37
VOP1 V_MUL_DX9_ZERO_F32 7
VOP1 V_SUB_NC_U32 38
VOP1 V_MUL_F32 8
VOP1 V_SUBREV_NC_U32 39
VOP1 V_MUL_I32_I24 9
VOP1 V_FMAC_F32 43
VOP1 V_MUL_HI_I32_I24 10
VOP1 V_FMAMK_F32 44
VOP1 V_MUL_U32_U24 11
VOP1 V_FMAAK_F32 45
VOP1 V_MUL_HI_U32_U24 12
VOP1 V_CVT_PK_RTZ_F16_F32 47
VOP1 V_MIN_F32 15
VOP1 V_ADD_F16 50
VOP1 V_MAX_F32 16
VOP1 V_SUB_F16 51
VOP1 V_MIN_I32 17
VOP1 V_SUBREV_F16 52
VOP1 V_MAX_I32 18
VOP1 V_MUL_F16 53
VOP1 V_MIN_U32 19
VOP1 V_FMAC_F16 54
VOP1 V_MAX_U32 20
VOP1 V_FMAMK_F16 55
VOP1 V_LSHLREV_B32 24
VOP1 V_FMAAK_F16 56
VOP1 V_LSHRREV_B32 25
VOP1 V_MAX_F16 57
VOP1 V_ASHRREV_I32 26
VOP1 V_MIN_F16 58
VOP1 V_AND_B32 27
VOP1 V_LDEXP_F16 59
VOP1 V_OR_B32 28
VOP1 V_PK_FMAC_F16 60
V_CMP_F_F16 0
V_CMPX_F_F16 128
V_CMP_LT_F16 1
V_CMPX_LT_F16 129
V_CMP_EQ_F16 2
V_CMPX_EQ_F16 130
V_CMP_LE_F16 3
V_CMPX_LE_F16 131
V_CMP_GT_F16 4
V_CMPX_GT_F16 132
V_CMP_LG_F16 5
V_CMPX_LG_F16 133
V_CMP_GE_F16 6
V_CMPX_GE_F16 134
V_CMP_O_F16 7
V_CMPX_O_F16 135
V_CMP_U_F16 8
V_CMPX_U_F16 136
V_CMP_NGE_F16 9
V_CMPX_NGE_F16 137
V_CMP_NLG_F16 10
V_CMPX_NLG_F16 138
V_CMP_NGT_F16 11
V_CMPX_NGT_F16 139
V_NOP 384
V_CVT_PK_U16_U32 803
V_MOV_B32 385
V_CVT_PK_I16_I32 804
V_READFIRSTLANE_B32 386
V_SUB_NC_I32 805
V_CVT_I32_F64 387
V_ADD_NC_I32 806
VOP3SD V_DOT2_F16_F16 614
VOP3SD V_CMPX_NE_U32 205
VOP3SD V_DOT2_BF16_BF16 615
VOP3SD V_CMPX_GE_U32 206
VOP3SD V_ADD_NC_U16 771
VOP3SD V_CMPX_T_U32 207
VOP3SD V_SUB_NC_U16 772
VOP3SD V_CMPX_F_I64 208
VOP3SD V_MUL_LO_U16 773
VOP3SD V_CMPX_LT_I64 209
VOP3SD V_CVT_PK_I16_F32 774
VOP3SD V_CMPX_EQ_I64 210
VOP3SD V_CVT_PK_U16_F32 775
VOP3SD V_CMPX_LE_I64 211
VOP3SD V_MAX_U16 777
VOP3SD V_CMPX_GT_I64 212
VOP3SD V_MAX_I16 778
VOP3SD V_CMPX_NE_I64 213
VOP3SD V_MIN_U16 779
VOP3SD V_CMPX_GE_I64 214
VOP3SD V_MIN_I16 780
VOP3SD V_CMPX_T_I64 215
VOP3SD V_ADD_NC_I16 781
VOP3SD V_CMPX_F_U64 216
VOP3SD V_SUB_NC_I16 782
VOP3SD V_CMPX_LT_U64 217
VOP3SD V_PACK_B32_F16 785
VOP3SD V_CMPX_EQ_U64 218
VOP3SD V_CVT_PK_NORM_I16_F16 786
VOP3SD V_CMPX_LE_U64 219
VOP3SD V_CVT_PK_NORM_U16_F16 787
VOP3SD V_CMPX_GT_U64 220
VOP3SD V_LDEXP_F32 796
VOP3SD V_CMPX_NE_U64 221
VOP3SD V_BFM_B32 797
VOP3SD V_CMPX_GE_U64 222
VOP3SD V_BCNT_U32_B32 798
VOP3SD V_CMPX_T_U64 223
VOP3SD V_MBCNT_LO_U32_B32 799
VOP3SD V_CMPX_CLASS_F16 253
VOP3SD V_MBCNT_HI_U32_B32 800
VOP3SD V_CMPX_CLASS_F32 254
VOP3SD V_CVT_PK_NORM_I16_F32 801
VOP3SD V_CMPX_CLASS_F64 255
VOP3SD V_CVT_PK_NORM_U16_F32 802
VOPD V_PK_MAD_I16 0
VOPD V_PK_MIN_F16 17
VOPD V_PK_MUL_LO_U16 1
VOPD V_PK_MAX_F16 18
VOPD V_PK_ADD_I16 2
VOPD V_DOT2_F32_F16 19
VOPD V_PK_SUB_I16 3
VOPD V_DOT4_I32_IU8 22
VOPD V_PK_LSHLREV_B16 4
VOPD V_DOT4_U32_U8 23
VOPD V_PK_LSHRREV_B16 5
VOPD V_DOT8_I32_IU4 24
VOPD V_PK_ASHRREV_I16 6
VOPD V_DOT8_U32_U4 25
VOPD V_PK_MAX_I16 7
VOPD V_DOT2_F32_BF16 26
VOPD V_PK_MIN_I16 8
VOPD V_FMA_MIX_F32 32
VOPD V_PK_MAD_U16 9
VOPD V_FMA_MIXLO_F16 33
VOPD V_PK_ADD_U16 10
VOPD V_FMA_MIXHI_F16 34
VOPD V_PK_SUB_U16 11
VOPD V_WMMA_F32_16X16X16_F16 64
VOPD V_PK_MAX_U16 12
VOPD V_WMMA_F32_16X16X16_BF16 65
VOPD V_PK_MIN_U16 13
VOPD V_WMMA_F16_16X16X16_F16 66
VOPD V_PK_FMA_F16 14
VOPD V_WMMA_BF16_16X16X16_BF16 67
VOPD V_PK_ADD_F16 15
VOPD V_WMMA_I32_16X16X16_IU8 68
VOPD V_PK_MUL_F16 16
VOPD V_WMMA_I32_16X16X16_IU4 69

Medium effort

Some infra for LDS exists. This might need additional work in state.rs.

opcode list
Encoding Op #
DS DS_ADD_U32 0
DS DS_SUB_U64 65
DS DS_SUB_U32 1
DS DS_RSUB_U64 66
DS DS_RSUB_U32 2
DS DS_INC_U64 67
DS DS_INC_U32 3
DS DS_DEC_U64 68
DS DS_DEC_U32 4
DS DS_MIN_I64 69
DS DS_MIN_I32 5
DS DS_MAX_I64 70
DS DS_MAX_I32 6
DS DS_MIN_U64 71
DS DS_MIN_U32 7
DS DS_MAX_U64 72
DS DS_MAX_U32 8
DS DS_AND_B64 73
DS DS_AND_B32 9
DS DS_OR_B64 74
DS DS_OR_B32 10
DS DS_XOR_B64 75
DS DS_XOR_B32 11
DS DS_MSKOR_B64 76
DS DS_MSKOR_B32 12
DS DS_STORE_B64 77
DS DS_STORE_B32 13
DS DS_STORE_2ADDR_B64 78
DS DS_STORE_2ADDR_B32 14
DS DS_STORE_2ADDR_STRIDE64_B64 79
DS DS_STORE_2ADDR_STRIDE64_B32 15
DS DS_CMPSTORE_B64 80
DS DS_CMPSTORE_B32 16
DS DS_CMPSTORE_F64 81
DS DS_CMPSTORE_F32 17
DS DS_MIN_F64 82
DS DS_MIN_F32 18
DS DS_MAX_F64 83
DS DS_MAX_F32 19
DS DS_ADD_RTN_U64 96
DS DS_NOP 20
DS DS_SUB_RTN_U64 97
DS DS_ADD_F32 21
DS DS_RSUB_RTN_U64 98
DS DS_INC_RTN_U64 99

Needs more work

Little to no infra exists for BUF and IMG instructions, needs very good tests and new abstractions in state.rs.

opcode list
Encoding Op #
LDSDIR V_INTERP_P10_F32 0
LDSDIR V_INTERP_P2_F16_F32 3
LDSDIR V_INTERP_P2_F32 1
LDSDIR V_INTERP_P10_RTZ_F16_F32 4
LDSDIR V_INTERP_P10_F16_F32 2
LDSDIR V_INTERP_P2_RTZ_F16_F32 5
MTBUF TBUFFER_LOAD_FORMAT_X 0
MTBUF TBUFFER_LOAD_D16_FORMAT_X 8
MTBUF TBUFFER_LOAD_FORMAT_XY 1
MTBUF TBUFFER_LOAD_D16_FORMAT_XY 9
MTBUF TBUFFER_LOAD_FORMAT_XYZ 2
MTBUF TBUFFER_LOAD_D16_FORMAT_XYZ 10
MTBUF TBUFFER_LOAD_FORMAT_XYZW 3
MTBUF TBUFFER_LOAD_D16_FORMAT_XYZW 11
MTBUF TBUFFER_STORE_FORMAT_X 4
MTBUF TBUFFER_STORE_D16_FORMAT_X 12
MTBUF TBUFFER_STORE_FORMAT_XY 5
MTBUF TBUFFER_STORE_D16_FORMAT_XY 13
MUBUF TBUFFER_STORE_FORMAT_XYZ 6
MUBUF TBUFFER_STORE_D16_FORMAT_XYZ 14
MUBUF TBUFFER_STORE_FORMAT_XYZW 7
MUBUF TBUFFER_STORE_D16_FORMAT_XYZW 15
MIMG IMAGE_LOAD 0
MIMG IMAGE_SAMPLE_C_O 42
MIMG IMAGE_LOAD_MIP 1
MIMG IMAGE_SAMPLE_C_D_O 43
MIMG IMAGE_LOAD_PCK 2
MIMG IMAGE_SAMPLE_C_L_O 44
MIMG IMAGE_LOAD_PCK_SGN 3
MIMG IMAGE_SAMPLE_C_B_O 45
MIMG IMAGE_LOAD_MIP_PCK 4
MIMG IMAGE_SAMPLE_C_LZ_O 46
MIMG IMAGE_LOAD_MIP_PCK_SGN 5
MIMG IMAGE_GATHER4 47
MIMG IMAGE_STORE 6
MIMG IMAGE_GATHER4_L 48
MIMG IMAGE_STORE_MIP 7
MIMG IMAGE_GATHER4_B 49
MIMG IMAGE_STORE_PCK 8
MIMG IMAGE_GATHER4_LZ 50
MIMG IMAGE_STORE_MIP_PCK 9
MIMG IMAGE_GATHER4_C 51
MIMG IMAGE_ATOMIC_SWAP 10
MIMG IMAGE_GATHER4_C_LZ 52
MIMG IMAGE_ATOMIC_CMPSWAP 11
MIMG IMAGE_GATHER4_O 53
MIMG IMAGE_ATOMIC_ADD 12
MIMG IMAGE_GATHER4_LZ_O 54
MIMG IMAGE_ATOMIC_SUB 13
MIMG IMAGE_GATHER4_C_LZ_O 55
MIMG IMAGE_ATOMIC_SMIN 14
MIMG IMAGE_GET_LOD 56
MIMG IMAGE_ATOMIC_UMIN 15
MIMG IMAGE_SAMPLE_D_G16 57
MIMG IMAGE_ATOMIC_SMAX 16
MIMG IMAGE_SAMPLE_C_D_G16 58
MIMG IMAGE_ATOMIC_UMAX 17
MIMG IMAGE_SAMPLE_D_O_G16 59
MIMG IMAGE_ATOMIC_AND 18
MIMG IMAGE_SAMPLE_C_D_O_G16 60
MIMG IMAGE_ATOMIC_OR 19
MIMG IMAGE_SAMPLE_CL 64
MIMG IMAGE_ATOMIC_XOR 20
MIMG IMAGE_SAMPLE_D_CL 65
MIMG IMAGE_ATOMIC_INC 21
MIMG IMAGE_SAMPLE_B_CL 66
MIMG IMAGE_ATOMIC_DEC 22
MIMG IMAGE_SAMPLE_C_CL 67
MIMG IMAGE_GET_RESINFO 23
MIMG IMAGE_SAMPLE_C_D_CL 68
MIMG IMAGE_MSAA_LOAD 24
MIMG IMAGE_SAMPLE_C_B_CL 69
MIMG IMAGE_BVH_INTERSECT_RAY 25
MIMG IMAGE_SAMPLE_CL_O 70
MIMG IMAGE_BVH64_INTERSECT_RAY 26
MIMG IMAGE_SAMPLE_D_CL_O 71
MIMG IMAGE_SAMPLE 27
MIMG IMAGE_SAMPLE_B_CL_O 72
MIMG IMAGE_SAMPLE_D 28
MIMG IMAGE_SAMPLE_C_CL_O 73
MIMG IMAGE_SAMPLE_L 29
MIMG IMAGE_SAMPLE_C_D_CL_O 74
MIMG IMAGE_SAMPLE_B 30
MIMG IMAGE_SAMPLE_C_B_CL_O 75
MIMG IMAGE_SAMPLE_LZ 31
MIMG IMAGE_SAMPLE_C_D_CL_G16 84
MIMG IMAGE_SAMPLE_C 32
MIMG IMAGE_SAMPLE_D_CL_O_G16 85
GLOBAL FLAT_LOAD_U8 16
GLOBAL FLAT_ATOMIC_MIN_I32 56
GLOBAL FLAT_LOAD_I8 17
GLOBAL FLAT_ATOMIC_MIN_U32 57
GLOBAL FLAT_LOAD_U16 18
GLOBAL FLAT_ATOMIC_MAX_I32 58
GLOBAL FLAT_LOAD_I16 19
GLOBAL FLAT_ATOMIC_MAX_U32 59
GLOBAL FLAT_LOAD_B32 20
GLOBAL FLAT_ATOMIC_AND_B32 60
GLOBAL FLAT_LOAD_B64 21
GLOBAL FLAT_ATOMIC_OR_B32 61
GLOBAL FLAT_LOAD_B96 22
GLOBAL FLAT_ATOMIC_XOR_B32 62
GLOBAL FLAT_LOAD_B128 23
GLOBAL FLAT_ATOMIC_INC_U32 63
GLOBAL FLAT_STORE_B8 24
GLOBAL FLAT_ATOMIC_DEC_U32 64
GLOBAL FLAT_STORE_B16 25
GLOBAL FLAT_ATOMIC_SWAP_B64 65
GLOBAL FLAT_STORE_B32 26
GLOBAL FLAT_ATOMIC_CMPSWAP_B64 66
GLOBAL FLAT_STORE_B64 27
GLOBAL FLAT_ATOMIC_ADD_U64 67
GLOBAL FLAT_STORE_B96 28
GLOBAL FLAT_ATOMIC_SUB_U64 68
GLOBAL FLAT_STORE_B128 29
GLOBAL FLAT_ATOMIC_MIN_I64 69
GLOBAL FLAT_LOAD_D16_U8 30
GLOBAL FLAT_ATOMIC_MIN_U64 70
GLOBAL FLAT_LOAD_D16_I8 31
GLOBAL FLAT_ATOMIC_MAX_I64 71
GLOBAL FLAT_LOAD_D16_B16 32
GLOBAL FLAT_ATOMIC_MAX_U64 72
GLOBAL FLAT_LOAD_D16_HI_U8 33
GLOBAL FLAT_ATOMIC_AND_B64 73
GLOBAL FLAT_LOAD_D16_HI_I8 34
GLOBAL FLAT_ATOMIC_OR_B64 74
GLOBAL FLAT_LOAD_D16_HI_B16 35
GLOBAL FLAT_ATOMIC_XOR_B64 75
GLOBAL FLAT_STORE_D16_HI_B8 36
GLOBAL FLAT_ATOMIC_INC_U64 76
GLOBAL FLAT_STORE_D16_HI_B16 37
GLOBAL FLAT_ATOMIC_DEC_U64 77
GLOBAL FLAT_ATOMIC_SWAP_B32 51
GLOBAL FLAT_ATOMIC_CMPSWAP_F32 80
GLOBAL FLAT_ATOMIC_CMPSWAP_B32 52
GLOBAL FLAT_ATOMIC_MIN_F32 81
GLOBAL FLAT_ATOMIC_ADD_U32 53
GLOBAL FLAT_ATOMIC_MAX_F32 82
GLOBAL FLAT_ATOMIC_SUB_U32 54
GLOBAL FLAT_ATOMIC_ADD_F32 86
@Qazalin Qazalin pinned this issue Oct 14, 2024
Qazalin referenced this issue Oct 14, 2024
9d14a36 support all vopp
4f0ce39 v_pk_mad_i16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant