Fine tune sub-group transpose bank conflict prevention for PVC #2797

victor-eds · 2024-11-22T09:17:49Z

As of now, sub-group transpose bank conflict prevention leaves a single item every 17 items ((sub-group size = 16) + 1) to avoid bank conflicts:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X
...

This is too conservative and can be greatly improved knowing PVC's SLM configuration for parallel accesses (64 banks providing access to 8 B each). So the ideal mechanism would be:

Store (64 banks * 8 B/bank / X B/element) elements
Leave (1 bank * 8 B/bank / X B/element) empty spots
Store (64 banks * 8 B/bank / X B/element) elements
...

Assuming fp32 elements:

0 1 2 3 4 5 ... 127 X X
0 1 2 3 4 5 ... 127 X X
0 1 2 3 4 5 ... 127 X X

Again, for fp32, in terms of code:

; Store untransposed
call spir_funccc void @intel_sub_group_block_write8(ptr(3) %ptr0, <8 x float> %data)
%ptr1 = getelementptr inbounds %ptr0[130], float
; ...
; Load transposed
%vec0 = load<8 x float> %ptrwi0
%ptrwi1 = getelementptr inbounds %ptrwi0[1], <8 x float>
; ...
; Take into account empty elements
%ptrwi16 = getelementptr inbounds %ptrwi15[10], float

The text was updated successfully, but these errors were encountered:

victor-eds · 2024-12-02T12:31:21Z

Postponed as #2890 looks like a more promising approach.

sommerlukas · 2025-02-03T08:36:59Z

The approach described in #2890 performs the reduction directly in registers, so SLM is not involved and no tuning for bank conflicts is required. Closing this issue.

victor-eds added performance codegen: attention labels Nov 22, 2024

victor-eds self-assigned this Nov 22, 2024

victor-eds mentioned this issue Nov 22, 2024

[XPU][TritonGPUToLLVM] Avoid bank conflicts in sub-group transposes #2769

Merged

vlad-penkin added this to the 4.0 [Performance] Core milestone Dec 2, 2024

sommerlukas closed this as completed Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tune sub-group transpose bank conflict prevention for PVC #2797

Fine tune sub-group transpose bank conflict prevention for PVC #2797

victor-eds commented Nov 22, 2024 •

edited

Loading

victor-eds commented Dec 2, 2024

sommerlukas commented Feb 3, 2025

Fine tune sub-group transpose bank conflict prevention for PVC #2797

Fine tune sub-group transpose bank conflict prevention for PVC #2797

Comments

victor-eds commented Nov 22, 2024 • edited Loading

victor-eds commented Dec 2, 2024

sommerlukas commented Feb 3, 2025

victor-eds commented Nov 22, 2024 •

edited

Loading