Sharktank Data-Dependent tests reporting workgroup distribution verification errors #1007

renxida · 2025-02-25T17:41:50Z

No description provided.

renxida · 2025-02-25T17:42:21Z

CI run: https://github.com/nod-ai/shark-ai/actions/runs/13519729609/job/37776128410?pr=1003

Error message:

2025-02-25T11:21:31.1576148Z E               /shark-dev/tmp/tmph4kwxa_mFluxTest/model.mlir:2644:12: error: 'linalg.generic' op write affecting operations on global resources are restricted to workgroup distributed contexts.
2025-02-25T11:21:31.1576863Z E                   %923 = torch.aten.add.Tensor %922, %17, %int1_131 : !torch.vtensor<[1,18432],f32>, !torch.vtensor<[18432],f32>, !torch.int -> !torch.vtensor<[1,18432],f32>
2025-02-25T11:21:31.1577246Z E                          ^
2025-02-25T11:21:31.1577607Z E               /shark-dev/tmp/tmph4kwxa_mFluxTest/model.mlir:2642:12: error: 'func.func' op failed on workgroup distribution verification
2025-02-25T11:21:31.1578123Z E                   %922 = torch.aten.mm %920, %921 : !torch.vtensor<[1,3072],f32>, !torch.vtensor<[3072,18432],f32> -> !torch.vtensor<[1,18432],f32>
2025-02-25T11:21:31.1578444Z E                          ^
2025-02-25T11:21:31.1580897Z E               /shark-dev/tmp/tmph4kwxa_mFluxTest/model.mlir:2642:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {abi = "hip", iree.gpu.target = #iree_gpu.target<arch = "gfx942", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<MFMA_F32_16x16x4_F32>, <MFMA_F32_16x16x16_F16>, <MFMA_F32_32x32x8_F16>, <MFMA_F64_16x16x4_F64>, <MFMA_F32_16x16x16_BF16>, <MFMA_F32_32x32x8_BF16>, <MFMA_F32_16x16x32_F8E5M2FNUZ>, <MFMA_F32_16x16x32_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_I32_16x16x32_I8>, <MFMA_I32_32x32x16_I8>], subgroup_size_choices = [64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647], max_load_instruction_bits = 128, simds_per_wgp = 4, vgpr_space_bits = 16384>>, ukernels = "none", waves_per_eu = 2 : i64}>
2025-02-25T11:21:31.1583572Z E                   %922 = torch.aten.mm %920, %921 : !torch.vtensor<[1,3072],f32>, !torch.vtensor<[3072,18432],f32> -> !torch.vtensor<[1,18432],f32>
2025-02-25T11:21:31.1583906Z E                          ^

renxida · 2025-02-25T17:45:54Z

@MaheshRavishankar might be able to help. Found this iree-org/iree#20063 in the iree commit history. Do I need to specify a flag to fix this error?

renxida mentioned this issue Feb 25, 2025

[tracking] iree bump to 0225 failures #1004

Closed

4 tasks

renxida mentioned this issue Feb 25, 2025

Bump IREE requirement pins to 3.3.0rc20250225 #1003

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharktank Data-Dependent tests reporting workgroup distribution verification errors #1007

Sharktank Data-Dependent tests reporting workgroup distribution verification errors #1007

renxida commented Feb 25, 2025

renxida commented Feb 25, 2025

renxida commented Feb 25, 2025

Sharktank Data-Dependent tests reporting workgroup distribution verification errors #1007

Sharktank Data-Dependent tests reporting workgroup distribution verification errors #1007

Comments

renxida commented Feb 25, 2025

renxida commented Feb 25, 2025

renxida commented Feb 25, 2025