Question on the vector add example #2099

pbchekin · 2024-09-03T16:02:43Z

Received this:

A question after analyzing the LLVM IR generated and comparing against the SYCL vector add equivalent. In the normal SYCL program, we tend to load the input data as vector of size 4 and compute the addition on that vector. In Triton case, we load as vector from memory but tend to extract each element and add them as scalar and insert them back. Attached are the shader dump from Triton vector add run which includes the input to the IGC compiler before unification and IGC optimized LLVM IR.

OCL_asm67166d3621db5283_beforeUnification.zip
OCL_asm67166d3621db5283_optimized.zip

etiotto · 2024-09-18T23:18:30Z

IGC has a pass that scalarizes the vector addition. Before that pass the LLVM IR is:

  %42 = fadd <4 x float> %bc3, %bc9, !dbg !373
  %43 = fadd <4 x float> %bc3, %bc9, !dbg !373
  %44 = fadd <4 x float> %bc3, %bc9, !dbg !373
  %45 = shufflevector <4 x float> %43, <4 x float> %44, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>, !dbg !375
  %46 = fadd <4 x float> %bc3, %bc9, !dbg !373
  %47 = shufflevector <4 x float> %45, <4 x float> %46, <4 x i32> <i32 0, i32 1, i32 6, i32 undef>, !dbg !375
  %48 = shufflevector <4 x float> %47, <4 x float> %42, <4 x i32> <i32 0, i32 1, i32 2, i32 7>, !dbg !375
  %49 = sext i32 %9 to i64, !dbg !374
  %50 = getelementptr float, float addrspace(1)* %2, i64 %49, !dbg !374
  %51 = bitcast float addrspace(1)* %50 to <4 x float> addrspace(1)*, !dbg !375
  store <4 x float> %48, <4 x float> addrspace(1)* %51, align 16, !dbg !375

and after that pass the vector add is scalarized:


59:                                               ; preds = %52, %51
  %bc1226 = phi float [ %55, %52 ], [ 0.000000e+00, %51 ], !dbg !371
  %bc1227 = phi float [ %56, %52 ], [ 0.000000e+00, %51 ], !dbg !371
  %bc1228 = phi float [ %57, %52 ], [ 0.000000e+00, %51 ], !dbg !371 the 
  %bc1229 = phi float [ %58, %52 ], [ 0.000000e+00, %51 ], !dbg !371
  %60 = fadd float %bc618, %bc1226, !dbg !372
  %61 = fadd float %bc619, %bc1227, !dbg !372
  %62 = fadd float %bc620, %bc1228, !dbg !372
  %63 = fadd float %bc621, %bc1229, !dbg !372
  %64 = fadd float %bc618, %bc1226, !dbg !372
  %65 = fadd float %bc619, %bc1227, !dbg !372
  %66 = fadd float %bc620, %bc1228, !dbg !372
  %67 = fadd float %bc621, %bc1229, !dbg !372
  %68 = fadd float %bc618, %bc1226, !dbg !372
  %69 = fadd float %bc619, %bc1227, !dbg !372
  %70 = fadd float %bc620, %bc1228, !dbg !372
  %71 = fadd float %bc621, %bc1229, !dbg !372
  %72 = fadd float %bc618, %bc1226, !dbg !372
  %73 = fadd float %bc619, %bc1227, !dbg !372
  %74 = fadd float %bc620, %bc1228, !dbg !372
  %75 = fadd float %bc621, %bc1229, !dbg !372
  %76 = getelementptr float, float addrspace(1)* %2, i64 %21, !dbg !373rformed by IG
  br i1 %19, label %77, label %101, !dbg !374

So this is a transformation performed by IGC. Triton generates the vector code. Is unclear at this point the reason the SYCL program is not scalarized. @pbchekin who is the contact and can we get the SYCL code reproducer along with compilation command?

whitneywhtsang · 2024-09-18T23:40:58Z

@etiotto Can you give open-linux-driver-ci-dev_igc-17737 a try? It contains a recent change which makes that IGC pass more restrictive.

alexbaden · 2024-09-18T23:44:18Z

Are we confusing vector types and vectorization? SYCL has a vec4 type which is syntactic sugar for unpacking a struct. https://developer.codeplay.com/products/computecpp/ce/2.11.0/api-reference/vec__types__defines_8h.html

etiotto · 2024-09-18T23:54:31Z

Are we confusing vector types and vectorization? SYCL has a vec4 type which is syntactic sugar for unpacking a struct. https://developer.codeplay.com/products/computecpp/ce/2.11.0/api-reference/vec__types__defines_8h.html

I don't have the SYCL program, however from the original question I am guessing the LLVM IR generated by SYCL would contain vector adds and that for some reasons IGC doesn't scalarize them. When we get the SYCL program we can check the LLVM IR it generates.

etiotto · 2024-09-19T23:41:45Z

@pbchekin do you have the contact info for the person that asked the original question?

etiotto · 2024-09-30T13:34:24Z

I have asked but not yet received the SYCL program. Moving to next iteration.

whitneywhtsang · 2024-12-12T14:02:40Z

IGC team is working on the implementation of vector emission for fadd.

pbchekin added question Further information is requested community labels Sep 3, 2024

vlad-penkin added the codegen: mlir label Sep 5, 2024

vlad-penkin added this to the 0.3 [Triton] Language and Runtime milestone Sep 5, 2024

vlad-penkin assigned etiotto Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the vector add example #2099

Question on the vector add example #2099

pbchekin commented Sep 3, 2024

etiotto commented Sep 18, 2024 •

edited

Loading

whitneywhtsang commented Sep 18, 2024

alexbaden commented Sep 18, 2024

etiotto commented Sep 18, 2024

etiotto commented Sep 19, 2024

etiotto commented Sep 30, 2024

whitneywhtsang commented Dec 12, 2024

Question on the vector add example #2099

Question on the vector add example #2099

Comments

pbchekin commented Sep 3, 2024

etiotto commented Sep 18, 2024 • edited Loading

whitneywhtsang commented Sep 18, 2024

alexbaden commented Sep 18, 2024

etiotto commented Sep 18, 2024

etiotto commented Sep 19, 2024

etiotto commented Sep 30, 2024

whitneywhtsang commented Dec 12, 2024

etiotto commented Sep 18, 2024 •

edited

Loading