Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the vector add example #2099

Open
pbchekin opened this issue Sep 3, 2024 · 7 comments
Open

Question on the vector add example #2099

pbchekin opened this issue Sep 3, 2024 · 7 comments
Assignees
Labels
codegen: mlir community question Further information is requested

Comments

@pbchekin
Copy link
Contributor

pbchekin commented Sep 3, 2024

Received this:

A question after analyzing the LLVM IR generated and comparing against the SYCL vector add equivalent. In the normal SYCL program, we tend to load the input data as vector of size 4 and compute the addition on that vector. In Triton case, we load as vector from memory but tend to extract each element and add them as scalar and insert them back. Attached are the shader dump from Triton vector add run which includes the input to the IGC compiler before unification and IGC optimized LLVM IR.

OCL_asm67166d3621db5283_beforeUnification.zip
OCL_asm67166d3621db5283_optimized.zip

@etiotto
Copy link
Contributor

etiotto commented Sep 18, 2024

IGC has a pass that scalarizes the vector addition. Before that pass the LLVM IR is:

  %42 = fadd <4 x float> %bc3, %bc9, !dbg !373
  %43 = fadd <4 x float> %bc3, %bc9, !dbg !373
  %44 = fadd <4 x float> %bc3, %bc9, !dbg !373
  %45 = shufflevector <4 x float> %43, <4 x float> %44, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>, !dbg !375
  %46 = fadd <4 x float> %bc3, %bc9, !dbg !373
  %47 = shufflevector <4 x float> %45, <4 x float> %46, <4 x i32> <i32 0, i32 1, i32 6, i32 undef>, !dbg !375
  %48 = shufflevector <4 x float> %47, <4 x float> %42, <4 x i32> <i32 0, i32 1, i32 2, i32 7>, !dbg !375
  %49 = sext i32 %9 to i64, !dbg !374
  %50 = getelementptr float, float addrspace(1)* %2, i64 %49, !dbg !374
  %51 = bitcast float addrspace(1)* %50 to <4 x float> addrspace(1)*, !dbg !375
  store <4 x float> %48, <4 x float> addrspace(1)* %51, align 16, !dbg !375

and after that pass the vector add is scalarized:


59:                                               ; preds = %52, %51
  %bc1226 = phi float [ %55, %52 ], [ 0.000000e+00, %51 ], !dbg !371
  %bc1227 = phi float [ %56, %52 ], [ 0.000000e+00, %51 ], !dbg !371
  %bc1228 = phi float [ %57, %52 ], [ 0.000000e+00, %51 ], !dbg !371 the 
  %bc1229 = phi float [ %58, %52 ], [ 0.000000e+00, %51 ], !dbg !371
  %60 = fadd float %bc618, %bc1226, !dbg !372
  %61 = fadd float %bc619, %bc1227, !dbg !372
  %62 = fadd float %bc620, %bc1228, !dbg !372
  %63 = fadd float %bc621, %bc1229, !dbg !372
  %64 = fadd float %bc618, %bc1226, !dbg !372
  %65 = fadd float %bc619, %bc1227, !dbg !372
  %66 = fadd float %bc620, %bc1228, !dbg !372
  %67 = fadd float %bc621, %bc1229, !dbg !372
  %68 = fadd float %bc618, %bc1226, !dbg !372
  %69 = fadd float %bc619, %bc1227, !dbg !372
  %70 = fadd float %bc620, %bc1228, !dbg !372
  %71 = fadd float %bc621, %bc1229, !dbg !372
  %72 = fadd float %bc618, %bc1226, !dbg !372
  %73 = fadd float %bc619, %bc1227, !dbg !372
  %74 = fadd float %bc620, %bc1228, !dbg !372
  %75 = fadd float %bc621, %bc1229, !dbg !372
  %76 = getelementptr float, float addrspace(1)* %2, i64 %21, !dbg !373rformed by IG
  br i1 %19, label %77, label %101, !dbg !374

So this is a transformation performed by IGC. Triton generates the vector code. Is unclear at this point the reason the SYCL program is not scalarized. @pbchekin who is the contact and can we get the SYCL code reproducer along with compilation command?

@whitneywhtsang
Copy link
Contributor

@etiotto Can you give open-linux-driver-ci-dev_igc-17737 a try? It contains a recent change which makes that IGC pass more restrictive.

@alexbaden
Copy link
Contributor

Are we confusing vector types and vectorization? SYCL has a vec4 type which is syntactic sugar for unpacking a struct. https://developer.codeplay.com/products/computecpp/ce/2.11.0/api-reference/vec__types__defines_8h.html

@etiotto
Copy link
Contributor

etiotto commented Sep 18, 2024

Are we confusing vector types and vectorization? SYCL has a vec4 type which is syntactic sugar for unpacking a struct. https://developer.codeplay.com/products/computecpp/ce/2.11.0/api-reference/vec__types__defines_8h.html

I don't have the SYCL program, however from the original question I am guessing the LLVM IR generated by SYCL would contain vector adds and that for some reasons IGC doesn't scalarize them. When we get the SYCL program we can check the LLVM IR it generates.

@etiotto
Copy link
Contributor

etiotto commented Sep 19, 2024

@pbchekin do you have the contact info for the person that asked the original question?

@etiotto
Copy link
Contributor

etiotto commented Sep 30, 2024

I have asked but not yet received the SYCL program. Moving to next iteration.

@whitneywhtsang
Copy link
Contributor

IGC team is working on the implementation of vector emission for fadd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen: mlir community question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants