Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable scaled_dot tests for client platform #3234

Open
anmyachev opened this issue Jan 22, 2025 · 10 comments · May be fixed by #3403
Open

Enable scaled_dot tests for client platform #3234

anmyachev opened this issue Jan 22, 2025 · 10 comments · May be fixed by #3403
Assignees
Labels
bug Something isn't working tests: ut

Comments

@anmyachev
Copy link
Contributor

@whitneywhtsang
Copy link
Contributor

@whitneywhtsang what device properties should we look at first? Could you please advise?

The reason why scaled_dot tests are not working is likely because DPAS is not used. Below are some reasons why DPAS is not used:

@etiotto
Copy link
Contributor

etiotto commented Jan 27, 2025

@whitneywhtsang what device properties should we look at first? Could you please advise?

The reason why scaled_dot tests are not working is likely because DPAS is not used. Below are some reasons why DPAS is not used:

Surprising if that is the case. For lowering tt.dot we should already use DPAS and 2D block reads and we have unit tests to check the codegen.

@alexbaden
Copy link
Contributor

A770 and MTL do not have DPAS (A770 only supports 8 threads per warp, MTL does not have the hardware support at all).

@whitneywhtsang
Copy link
Contributor

@LiyangLingIntel Can you please look into the xe2 scaled_dot failures?

@LiyangLingIntel
Copy link
Contributor

I tried to reproduce the scaled_dot failures on BMG, but only one fail test/unit/language/test_core.py::test_scaled_dot[128-32-64-False-False-True-e4m3-bf16-4-16-1] which is the known issue in #2968.
Just removed test_scaled_dot from xe2 skiplist and triggered CI, https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/13279132310/job/37073967686 shows they are all passed.
It seems no extra work needs to be done.

@anmyachev can you help confirm this?

@LiyangLingIntel LiyangLingIntel linked a pull request Feb 12, 2025 that will close this issue
@LiyangLingIntel
Copy link
Contributor

There is no DPAS support for A770 and MTL, we should keep it in the skiplist.

Created PR(#3403) to remove the most scaled_dot cases from the Xe2 skiplist, the remaining one will be covered in #2968.
If there is no further comment, #3403 will close this issue.

@anmyachev
Copy link
Contributor Author

Hi @LiyangLingIntel

Just removed test_scaled_dot from xe2 skiplist and triggered CI, https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/13279132310/job/37073967686 shows they are all passed.

This CI doesn't use Xe2 skiplist and runs on PVC, right? To check, you need to run CI on LNL (also uses this skiplist) or on BMG, this can be done in the internal repository. On the other hand, if you manually checked that the tests are working, then it is quite possible that there could be some instability in the runner.

There is no DPAS support for A770 and MTL, we should keep it in the skiplist.

DPAS is about performance, right? Why isn't it expected that these tests will at least work correctly on these devices?

@LiyangLingIntel
Copy link
Contributor

DPAS is about performance, right? Why isn't it expected that these tests will at least work correctly on these devices?

Not exactly, the most of our features are enabled on DPAS. If not going to the DPAS path, we have to go to the FMA Dot which only support the basic dot, and this should be same to NV and AMD.
If we want to enable these on FMA path, corresponding changes should also be made on the common files.

@anmyachev
Copy link
Contributor Author

DPAS is about performance, right? Why isn't it expected that these tests will at least work correctly on these devices?

Not exactly, the most of our features are enabled on DPAS. If not going to the DPAS path, we have to go to the FMA Dot which only support the basic dot, and this should be same to NV and AMD. If we want to enable these on FMA path, corresponding changes should also be made on the common files.

I see

@anmyachev
Copy link
Contributor Author

It seems no extra work needs to be done.

@anmyachev can you help confirm this?

These tests are still falling in CI.

@whitneywhtsang whitneywhtsang linked a pull request Feb 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests: ut
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants