We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find the error here nod-ai/shark-ai#994 (comment)
input mlir https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir
detailed bug log llama_fp8_kv16_compile_bug.txt
iree-compile /sharedfile/32/fp8_32.mlir \ --iree-hip-target=gfx942 \ -o=/sharedfile/32/fp8_32.vmfb \ --iree-hal-target-device=hip \ --iree-dispatch-creation-enable-aggressive-fusion=true \ --iree-global-opt-propagate-transposes=true \ --iree-opt-aggressively-propagate-transposes=true \ --iree-opt-data-tiling=false \ --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \ --iree-hal-indirect-command-buffers=true \ --iree-stream-resource-memory-model=discrete \ --iree-hal-memoization=true \ --iree-opt-strip-assertions /sharedfile/32/fp8_32_kv16.mlir:9019:13: error: 'func.func' op failed on workgroup distribution verification %3349 = torch.aten.index_put %3347, %3348, %3343, %false_2832 : !torch.vtensor<[?,32,8,128],bf16>, !torch.list<optional<vtensor>>, !torch.vtensor<[?,32,8,128],bf16>, !torch.bool -> !torch.vtensor<[?,32,8,128],bf16> ^ /sharedfile/32/fp8_32_kv16.mlir:9019:13: note: see current operation: /sharedfile/32/fp8_32_kv16.mlir:28291:14: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {abi = "hip", iree.gpu.target = #iree_gpu.target<arch = "gfx942", features = "", wgp = <compute = fp64|fp32|fp16|int64|int32|int16|int8, storage = b64|b32|b16|b8, subgroup = shuffle|arithmetic, dot = dp4xi8toi32, mma = [<MFMA_F32_16x16x4_F32>, <MFMA_F32_16x16x16_F16>, <MFMA_F32_32x32x8_F16>, <MFMA_F64_16x16x4_F64>, <MFMA_F32_16x16x16_BF16>, <MFMA_F32_32x32x8_BF16>, <MFMA_F32_16x16x32_F8E5M2FNUZ>, <MFMA_F32_16x16x32_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_I32_16x16x32_I8>, <MFMA_I32_32x32x16_I8>], subgroup_size_choices = [64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647], max_load_instruction_bits = 128, simds_per_wgp = 4, vgpr_space_bits = 16384>>, ukernels = "none"}> %10763 = torch.aten.index_put %10761, %10762, %10757, %false_11018 : !torch.vtensor<[?,32,8,128],bf16>, !torch.list<optional<vtensor>>, !torch.vtensor<[?,32,8,128],bf16>, !torch.bool -> !torch.vtensor<[?,32,8,128],bf16>
iree-base-compiler 3.3.0rc20250223 iree-base-runtime 3.3.0rc20250223 iree-turbine 3.3.0rc20250223
The input.mlir can be downloaded here https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir or exported by nod-ai/shark-ai#994
python3 -m sharktank.examples.export_paged_llm_v1 --irpa-file=/sharedfile/llama3_8b_fp8.irpa \ --output-mlir=/sharedfile/32/fp8_32_kv16.mlir \ --output-config=/sharedfile/32/config_32.json \ --bs=1 --attention-kernel torch \ --attention-dtype=bfloat16 \ --kv-cache-dtype=bfloat16 \ --activation-dtype=bfloat16 \ --use-hf
wget https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir iree-compile /sharedfile/32/fp8_32_kv16.mlir \ --iree-hip-target=gfx942 \ -o=/sharedfile/32/fp8_32_kv16.vmfb \ --iree-hal-target-device=hip \ --iree-dispatch-creation-enable-aggressive-fusion=true \ --iree-global-opt-propagate-transposes=true \ --iree-opt-aggressively-propagate-transposes=true \ --iree-opt-data-tiling=false \ --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \ --iree-hal-indirect-command-buffers=true \ --iree-stream-resource-memory-model=discrete \ --iree-hal-memoization=true \ --iree-opt-strip-assertions
Compiler
iree-base-compiler 3.3.0rc20250223
No response
The text was updated successfully, but these errors were encountered:
Its a problem with a scatter dispatch (see https://gist.github.com/IanWood1/5ac5ed8c1a9ee2b37b9727b93d42b028) where the config is being set on the bit extend linalg generic op.
Sorry, something went wrong.
e5cab84
Successfully merging a pull request may close this issue.
What happened?
Find the error here nod-ai/shark-ai#994 (comment)
input mlir https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir
detailed bug log llama_fp8_kv16_compile_bug.txt
Steps to reproduce your issue
The input.mlir can be downloaded here https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir
or exported by nod-ai/shark-ai#994
What component(s) does this issue relate to?
Compiler
Version information
iree-base-compiler 3.3.0rc20250223
Additional context
No response
The text was updated successfully, but these errors were encountered: