Skip to content

GEMM example fails verification on NPU (output is all zeros) #112

@hungryDodo

Description

@hungryDodo

Description

When running the example_gemm/gemm_bf16.py script on a Ryzen AI 9 HX 370 processor, the project compiles successfully, but the final verification step fails with a large number of errors. The execution log indicates that the NPU output buffer (bufC) is filled with zeros, while the expected output (srcVec2) contains non-zero values. This suggests the NPU kernel may not have executed correctly or produced any output.

I found a potentially related issue, #70, which also reports verification errors. However, there seems to be a key difference. In issue #70, the errors are minor floating-point discrepancies (e.g., 472.356537!=472.356232). In my case, the output is entirely zero (e.g., 8.000000!=0.000000), indicating a more fundamental problem, rather than a precision issue.

Any help in diagnosing this issue would be greatly appreciated. Thank you!

My Environment

Item Details
CPU AMD Ryzen AI 9 HX 370
Operating System Ubuntu 24.04.2 LTS
Aries Project Version c54706b
Toolchain Info mlir_aie and llvm-aie were installed using the Aries/utils/quick_setup.sh script.
Linux Kernel 6.14.0-28-generic
Vitis Version 2024.2
XRT Version 2.20.0
NPU Firmware Version 255.0.2.7

Full xbutil examine output:

System Configuration
 OS Name : Linux
 Release : 6.14.0-28-generic
 Machine : x86_64
 CPU Cores : 24
 Memory : 23640 MB
 Distribution : Ubuntu 24.04.2 LTS
 GLIBC : 2.39
 Model : AI Series
 BIOS Vendor : American Megatrends International, LLC.
 BIOS Version : 1.04

XRT
 Version : 2.20.0
 Branch : HEAD
 Hash : a62adc1020c901af79529457c46f210aa05f15a3
 Hash Date : 2025-08-22 19:59:38
 amdxdna : 2.20.0_20250822, e9d2788a884784e3531e95d65b923c2252a1132e
 virtio-pci : unknown, unknown
 NPU Firmware Version : 255.0.2.7

Device(s) Present
|BDF |Name |
|----------------|-----------|
|[0000:c5:00.1] |NPU Strix |

Full Log

Here is the complete log from the execution of the script with make run.

mkdir -p build
/home/ai/Aries/my_install/llvm-aie/bin/clang++ -O2 -v -std=c++20 --target=aie2-none-unknown-elf -Wno-parentheses -Wno-attributes -Wno-macro-redefined -DNDEBUG -I /home/ai/Aries/example_new/example_NPU/example_gemm/../../../templates/aie2/origin/common -I /home/ai/Aries/my_install/mlir_aie/include -c aie/kernel_gemm.cc -o build/kernel_gemm.o
mkdir -p .
cd build && /home/ai/Aries/my_install/mlir_aie/bin/aiecc.py \
		--alloc-scheme=basic-sequential \
		--aie-generate-cdo \
		--no-compile-host \
		--xclbin-name=gemm.xclbin \
		--no-xchesscc \
		--no-xbridge \
		--peano /home/ai/Aries/my_install/llvm-aie \
		--aie-generate-npu --npu-insts-name=insts.txt ../gemm.adf.mlir


****** Bootgen v2024.2
  **** Build date : Nov  8 2024-16:21:57
    ** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
    ** Copyright 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.


[INFO]   : Bootimage generated successfully

Info: Embedded Metadata section is missing project.platform.device.core element, adding it.
Found xchesscc at /tools/Xilinx/Vitis/2024.2/aietools
 AIE Compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 17/17 4 Workers
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_elfs.bin
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_init.bin
Generating: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/build/gemm.adf.mlir.prj/aie_cdo_enable.bin
rm -rf _build
mkdir -p _build
cd _build &&  cmake ../ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DMLIR_AIE_DIR=/home/ai/Aries/my_install/mlir_aie/../../externals/mlir-aie/ -D CMAKE_C_COMPILER=gcc-13 -D CMAKE_CXX_COMPILER=g++-13 -DTARGET_NAME=hostexe -Dsubdir=host \
					&&  cmake --build . --config Release
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc-13 - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++-13 - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")
-- Configuring done (0.3s)
-- Generating done (0.0s)
-- Build files have been written to: /home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build
gmake[1]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[2]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[3]: Entering directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
[ 33%] Building CXX object CMakeFiles/hostexe.dir/home/ai/Aries/externals/mlir-aie/runtime_lib/test_lib/test_utils.cpp.o
[ 66%] Building CXX object CMakeFiles/hostexe.dir/host/host.cpp.o
[100%] Linking CXX executable hostexe
gmake[3]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
[100%] Built target hostexe
gmake[2]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
gmake[1]: Leaving directory '/home/ai/Aries/example_new/example_NPU/example_gemm/my_project_bf16/project/_build'
cp -r _build/hostexe ./
./hostexe -x build/gemm.xclbin -i build/insts.txt -k MLIR_AIE -v 2 --verify 1 --warmup 10 --iters 20
Sequence instr count: 1092
Loading xclbin: build/gemm.xclbin
Kernel opcode: MLIR_AIE
Name: MLIR_AIE
Registering xclbin: build/gemm.xclbin
Getting hardware context.
Getting handle to kernel:MLIR_AIE
Warmup Kernel.
Running Kernel.
NPU execution time: 4.096s
Error found srcVec2[0]!=bufC[0], 8.000000!=0.000000 
Error found srcVec2[1]!=bufC[1], 11.000000!=0.000000 
Error found srcVec2[3]!=bufC[3], -5.000000!=0.000000 
Error found srcVec2[4]!=bufC[4], 1.000000!=0.000000 
Error found srcVec2[5]!=bufC[5], -2.000000!=0.000000 
Error found srcVec2[6]!=bufC[6], 3.000000!=0.000000 
Error found srcVec2[7]!=bufC[7], 2.000000!=0.000000 
Error found srcVec2[8]!=bufC[8], -6.000000!=0.000000 
Error found srcVec2[9]!=bufC[9], 1.000000!=0.000000 
Error found srcVec2[10]!=bufC[10], 2.000000!=0.000000 
...
...
Error found srcVec2[1048565]!=bufC[1048565], 4.000000!=0.000000 
Error found srcVec2[1048566]!=bufC[1048566], -1.000000!=0.000000 
Error found srcVec2[1048567]!=bufC[1048567], 5.000000!=0.000000 
Error found srcVec2[1048568]!=bufC[1048568], -4.000000!=0.000000 
Error found srcVec2[1048570]!=bufC[1048570], -3.000000!=0.000000 
Error found srcVec2[1048571]!=bufC[1048571], -2.000000!=0.000000 
Error found srcVec2[1048572]!=bufC[1048572], -2.000000!=0.000000 
Error found srcVec2[1048573]!=bufC[1048573], -1.000000!=0.000000 
Error found srcVec2[1048574]!=bufC[1048574], -1.000000!=0.000000 
Error found srcVec2[1048575]!=bufC[1048575], 4.000000!=0.000000 
TEST failed with 916200 errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions