llama 8b fp8 kv16 model compile failed #20077

AmosLewis · 2025-02-24T18:11:27Z

What happened?

Find the error here nod-ai/shark-ai#994 (comment)

input mlir https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir

detailed bug log llama_fp8_kv16_compile_bug.txt

iree-compile /sharedfile/32/fp8_32.mlir \
  --iree-hip-target=gfx942 \
  -o=/sharedfile/32/fp8_32.vmfb \
  --iree-hal-target-device=hip \
  --iree-dispatch-creation-enable-aggressive-fusion=true \
  --iree-global-opt-propagate-transposes=true \
  --iree-opt-aggressively-propagate-transposes=true \
  --iree-opt-data-tiling=false \
  --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \
  --iree-hal-indirect-command-buffers=true \
  --iree-stream-resource-memory-model=discrete \
  --iree-hal-memoization=true \
  --iree-opt-strip-assertions

/sharedfile/32/fp8_32_kv16.mlir:9019:13: error: 'func.func' op failed on workgroup distribution verification
    %3349 = torch.aten.index_put %3347, %3348, %3343, %false_2832 : !torch.vtensor<[?,32,8,128],bf16>, !torch.list<optional<vtensor>>, !torch.vtensor<[?,32,8,128],bf16>, !torch.bool -> !torch.vtensor<[?,32,8,128],bf16>
            ^
/sharedfile/32/fp8_32_kv16.mlir:9019:13: note: see current operation:

/sharedfile/32/fp8_32_kv16.mlir:28291:14: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {abi = "hip", iree.gpu.target = #iree_gpu.target<arch = "gfx942", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<MFMA_F32_16x16x4_F32>, <MFMA_F32_16x16x16_F16>, <MFMA_F32_32x32x8_F16>, <MFMA_F64_16x16x4_F64>, <MFMA_F32_16x16x16_BF16>, <MFMA_F32_32x32x8_BF16>, <MFMA_F32_16x16x32_F8E5M2FNUZ>, <MFMA_F32_16x16x32_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_I32_16x16x32_I8>, <MFMA_I32_32x32x16_I8>], subgroup_size_choices = [64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647], max_load_instruction_bits = 128, simds_per_wgp = 4, vgpr_space_bits = 16384>>, ukernels = "none"}>
    %10763 = torch.aten.index_put %10761, %10762, %10757, %false_11018 : !torch.vtensor<[?,32,8,128],bf16>, !torch.list<optional<vtensor>>, !torch.vtensor<[?,32,8,128],bf16>, !torch.bool -> !torch.vtensor<[?,32,8,128],bf16>

Steps to reproduce your issue

iree-base-compiler       3.3.0rc20250223
iree-base-runtime        3.3.0rc20250223
iree-turbine             3.3.0rc20250223

The input.mlir can be downloaded here https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir
or exported by nod-ai/shark-ai#994

 python3 -m sharktank.examples.export_paged_llm_v1 --irpa-file=/sharedfile/llama3_8b_fp8.irpa  \
  --output-mlir=/sharedfile/32/fp8_32_kv16.mlir \
  --output-config=/sharedfile/32/config_32.json \
  --bs=1 --attention-kernel torch \
  --attention-dtype=bfloat16 \
  --kv-cache-dtype=bfloat16 \
  --activation-dtype=bfloat16 \
  --use-hf

wget  https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir

iree-compile /sharedfile/32/fp8_32_kv16.mlir \
  --iree-hip-target=gfx942 \
  -o=/sharedfile/32/fp8_32_kv16.vmfb \
  --iree-hal-target-device=hip \
  --iree-dispatch-creation-enable-aggressive-fusion=true \
  --iree-global-opt-propagate-transposes=true \
  --iree-opt-aggressively-propagate-transposes=true \
  --iree-opt-data-tiling=false \
  --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \
  --iree-hal-indirect-command-buffers=true \
  --iree-stream-resource-memory-model=discrete \
  --iree-hal-memoization=true \
  --iree-opt-strip-assertions

What component(s) does this issue relate to?

Compiler

Version information

iree-base-compiler 3.3.0rc20250223

Additional context

No response

The text was updated successfully, but these errors were encountered:

IanWood1 · 2025-02-24T18:40:46Z

Its a problem with a scatter dispatch (see https://gist.github.com/IanWood1/5ac5ed8c1a9ee2b37b9727b93d42b028) where the config is being set on the bit extend linalg generic op.

AmosLewis added the bug 🐞 Something isn't working label Feb 24, 2025

IanWood1 mentioned this issue Feb 24, 2025

[Dispatch] Clone scatter indices #20080

Merged

IanWood1 closed this as completed in #20080 Feb 25, 2025

IanWood1 closed this as completed in e5cab84 Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama 8b fp8 kv16 model compile failed #20077

llama 8b fp8 kv16 model compile failed #20077

AmosLewis commented Feb 24, 2025 •

edited

Loading

IanWood1 commented Feb 24, 2025

llama 8b fp8 kv16 model compile failed #20077

llama 8b fp8 kv16 model compile failed #20077

Comments

AmosLewis commented Feb 24, 2025 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

IanWood1 commented Feb 24, 2025

AmosLewis commented Feb 24, 2025 •

edited

Loading