Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama 8b fp8 kv16 model compile failed #20077

Closed
AmosLewis opened this issue Feb 24, 2025 · 1 comment · Fixed by #20080
Closed

llama 8b fp8 kv16 model compile failed #20077

AmosLewis opened this issue Feb 24, 2025 · 1 comment · Fixed by #20080
Labels
bug 🐞 Something isn't working

Comments

@AmosLewis
Copy link
Contributor

AmosLewis commented Feb 24, 2025

What happened?

Find the error here nod-ai/shark-ai#994 (comment)

input mlir https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir

detailed bug log llama_fp8_kv16_compile_bug.txt

iree-compile /sharedfile/32/fp8_32.mlir \
  --iree-hip-target=gfx942 \
  -o=/sharedfile/32/fp8_32.vmfb \
  --iree-hal-target-device=hip \
  --iree-dispatch-creation-enable-aggressive-fusion=true \
  --iree-global-opt-propagate-transposes=true \
  --iree-opt-aggressively-propagate-transposes=true \
  --iree-opt-data-tiling=false \
  --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \
  --iree-hal-indirect-command-buffers=true \
  --iree-stream-resource-memory-model=discrete \
  --iree-hal-memoization=true \
  --iree-opt-strip-assertions

/sharedfile/32/fp8_32_kv16.mlir:9019:13: error: 'func.func' op failed on workgroup distribution verification
    %3349 = torch.aten.index_put %3347, %3348, %3343, %false_2832 : !torch.vtensor<[?,32,8,128],bf16>, !torch.list<optional<vtensor>>, !torch.vtensor<[?,32,8,128],bf16>, !torch.bool -> !torch.vtensor<[?,32,8,128],bf16>
            ^
/sharedfile/32/fp8_32_kv16.mlir:9019:13: note: see current operation:

/sharedfile/32/fp8_32_kv16.mlir:28291:14: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {abi = "hip", iree.gpu.target = #iree_gpu.target<arch = "gfx942", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<MFMA_F32_16x16x4_F32>, <MFMA_F32_16x16x16_F16>, <MFMA_F32_32x32x8_F16>, <MFMA_F64_16x16x4_F64>, <MFMA_F32_16x16x16_BF16>, <MFMA_F32_32x32x8_BF16>, <MFMA_F32_16x16x32_F8E5M2FNUZ>, <MFMA_F32_16x16x32_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ>, <MFMA_F32_16x16x32_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ>, <MFMA_F32_32x32x16_F8E5M2FNUZ_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ>, <MFMA_F32_32x32x16_F8E4M3FNUZ_F8E5M2FNUZ>, <MFMA_I32_16x16x32_I8>, <MFMA_I32_32x32x16_I8>], subgroup_size_choices = [64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647], max_load_instruction_bits = 128, simds_per_wgp = 4, vgpr_space_bits = 16384>>, ukernels = "none"}>
    %10763 = torch.aten.index_put %10761, %10762, %10757, %false_11018 : !torch.vtensor<[?,32,8,128],bf16>, !torch.list<optional<vtensor>>, !torch.vtensor<[?,32,8,128],bf16>, !torch.bool -> !torch.vtensor<[?,32,8,128],bf16>

Steps to reproduce your issue

iree-base-compiler       3.3.0rc20250223
iree-base-runtime        3.3.0rc20250223
iree-turbine             3.3.0rc20250223

The input.mlir can be downloaded here https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir
or exported by nod-ai/shark-ai#994

 python3 -m sharktank.examples.export_paged_llm_v1 --irpa-file=/sharedfile/llama3_8b_fp8.irpa  \
  --output-mlir=/sharedfile/32/fp8_32_kv16.mlir \
  --output-config=/sharedfile/32/config_32.json \
  --bs=1 --attention-kernel torch \
  --attention-dtype=bfloat16 \
  --kv-cache-dtype=bfloat16 \
  --activation-dtype=bfloat16 \
  --use-hf
wget  https://sharkpublic.blob.core.windows.net/sharkpublic/chi/llama/fp8_32_kv16.mlir

iree-compile /sharedfile/32/fp8_32_kv16.mlir \
  --iree-hip-target=gfx942 \
  -o=/sharedfile/32/fp8_32_kv16.vmfb \
  --iree-hal-target-device=hip \
  --iree-dispatch-creation-enable-aggressive-fusion=true \
  --iree-global-opt-propagate-transposes=true \
  --iree-opt-aggressively-propagate-transposes=true \
  --iree-opt-data-tiling=false \
  --iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \
  --iree-hal-indirect-command-buffers=true \
  --iree-stream-resource-memory-model=discrete \
  --iree-hal-memoization=true \
  --iree-opt-strip-assertions

What component(s) does this issue relate to?

Compiler

Version information

iree-base-compiler 3.3.0rc20250223

Additional context

No response

@AmosLewis AmosLewis added the bug 🐞 Something isn't working label Feb 24, 2025
@IanWood1
Copy link
Contributor

Its a problem with a scatter dispatch (see https://gist.github.com/IanWood1/5ac5ed8c1a9ee2b37b9727b93d42b028) where the config is being set on the bit extend linalg generic op.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants