[GEMM perf] Poor GEMM performance on A770 #1765

Egor-Krivov · 2024-08-02T12:51:58Z

When I run GEMM benchmark on A770 I get about ~0.3 TFLOPs, while 1550 can get about 250 TFLOPs

Performance table:

File with triton cache from the run (cache is in cache folder):
benchmark-reports (6).zip

My run, just in case:
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10215632110/job/28265440574

The text was updated successfully, but these errors were encountered:

alexbaden · 2024-08-02T13:17:04Z

A770 does not support DPAS 16, so the kernel is likely a fully unrolled loop.

Egor-Krivov · 2024-08-02T13:18:40Z

What's our timeline for supporting fast GEMM on A770?

aregm · 2024-08-02T14:32:10Z

What's our timeline for supporting fast GEMM on A770?

What does it mean a fast GEMM on 770? It doesn't have DPAS, so it will lag behind. Do you mean efficiency?

Egor-Krivov · 2024-08-02T15:29:50Z

I think that current performance is lower than could be expected. Here is another GEMM benchmark (in milliseconds) using out matmul
triton implementation against IPEX torch (onednn). We get about ~100x slowdown when use triton vs IPEX torch.

alexbaden · 2024-08-02T15:38:22Z

Torch does not use Triton for GEMM - neither for XPU nor for CUDA. There is an existing, performant solution for GEMM in PyTorch on A770. Why do we need Triton to be competitive?
We have line of sight to very good GEMM performance on hardware with DPAS instructions. On A770, we would be effectively starting over. What is the consumer demands that justifies such resource intensive work?

vlad-penkin · 2024-08-04T13:46:00Z

@alexbaden as per @whitneywhtsang 's comments in the issue

[DPAS] Support low precision DPAS on A770 with sub-group-size=8. #991

DPAS8 is supported via different OpenCL built-in.

Egor-Krivov added the performance label Aug 2, 2024

vlad-penkin added this to the 5.0 [DGPU] Initial enabling milestone Aug 2, 2024

vlad-penkin added the enhancement New feature or request label Aug 2, 2024

LiyangLingIntel mentioned this issue Nov 14, 2024

test_dot3d tests now pass on PVC, but fails on A770 #2607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GEMM perf] Poor GEMM performance on A770 #1765

[GEMM perf] Poor GEMM performance on A770 #1765

Egor-Krivov commented Aug 2, 2024

alexbaden commented Aug 2, 2024

Egor-Krivov commented Aug 2, 2024

aregm commented Aug 2, 2024

Egor-Krivov commented Aug 2, 2024

alexbaden commented Aug 2, 2024

vlad-penkin commented Aug 4, 2024

[GEMM perf] Poor GEMM performance on A770 #1765

[GEMM perf] Poor GEMM performance on A770 #1765

Comments

Egor-Krivov commented Aug 2, 2024

alexbaden commented Aug 2, 2024

Egor-Krivov commented Aug 2, 2024

aregm commented Aug 2, 2024

Egor-Krivov commented Aug 2, 2024

alexbaden commented Aug 2, 2024

vlad-penkin commented Aug 4, 2024