Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] gemma-2b for Android. OpenCL Error Code=-54: CL_INVALID_WORK_GROUP_SIZE #1844

Closed
qc903113684 opened this issue Feb 27, 2024 · 7 comments
Labels
bug Confirmed bugs

Comments

@qc903113684
Copy link
Contributor

🐛 Bug

Compile Gemma-2b for Android in q4f16_0. Load model successful, but chat get error: OpenCL Error Code=-54: CL_INVALID_WORK_GROUP_SIZE Stack trace: File "/home/chaoqin/mlcllm/3rdpaty/tvm/scr/runtime/opencl/opencl_module.cc", line 90

To Reproduce

Steps to reproduce the behavior:

  1. compile gemma-2b by q4f16_0 target android
  2. compile android jar
  3. build app by Android studio

Expected behavior

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):Android
  • Operating system (e.g. Ubuntu/Windows/MacOS/...):Ubuntu
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Android Qualcomm Snapdragon 865
  • How you installed MLC-LLM (conda, source): conda
  • How you installed TVM-Unity (pip, source): pip
  • Python version (e.g. 3.10): 3.10
  • GPU driver version (if applicable): 535.86.05
  • CUDA/cuDNN version (if applicable): CUDA 11.8
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
    USE_NVTX: OFF
    USE_GTEST: AUTO
    SUMMARIZE: OFF
    USE_IOS_RPC: OFF
    USE_MSC: OFF
    USE_ETHOSU:
    CUDA_VERSION: NOT-FOUND
    USE_LIBBACKTRACE: AUTO
    DLPACK_PATH: 3rdparty/dlpack/include
    USE_TENSORRT_CODEGEN: OFF
    USE_THRUST: OFF
    USE_TARGET_ONNX: OFF
    USE_AOT_EXECUTOR: ON
    BUILD_DUMMY_LIBTVM: OFF
    USE_CUDNN: OFF
    USE_TENSORRT_RUNTIME: OFF
    USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
    USE_CCACHE: AUTO
    USE_ARM_COMPUTE_LIB: OFF
    USE_CPP_RTVM:
    USE_OPENCL_GTEST: /path/to/opencl/gtest
    USE_MKL: OFF
    USE_PT_TVMDSOOP: OFF
    MLIR_VERSION: NOT-FOUND
    USE_CLML: OFF
    USE_STACKVM_RUNTIME: OFF
    USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
    ROCM_PATH: /opt/rocm
    USE_DNNL: OFF
    USE_VITIS_AI: OFF
    USE_MLIR: OFF
    USE_RCCL: OFF
    USE_LLVM: llvm-config --ignore-libllvm --link-static
    USE_VERILATOR: OFF
    USE_TF_TVMDSOOP: OFF
    USE_THREADS: ON
    USE_MSVC_MT: OFF
    BACKTRACE_ON_SEGFAULT: OFF
    USE_GRAPH_EXECUTOR: ON
    USE_NCCL: OFF
    USE_ROCBLAS: OFF
    GIT_COMMIT_HASH: 79991133c17bb8685185e1f03cc2f688ea37c974
    USE_VULKAN: ON
    USE_RUST_EXT: OFF
    USE_CUTLASS: OFF
    USE_CPP_RPC: OFF
    USE_HEXAGON: OFF
    USE_CUSTOM_LOGGING: OFF
    USE_UMA: OFF
    USE_FALLBACK_STL_MAP: OFF
    USE_SORT: ON
    USE_RTTI: ON
    GIT_COMMIT_TIME: 2024-02-21 22:31:30 -0500
    USE_HEXAGON_SDK: /path/to/sdk
    USE_BLAS: none
    USE_ETHOSN: OFF
    USE_LIBTORCH: OFF
    USE_RANDOM: ON
    USE_CUDA: OFF
    USE_COREML: OFF
    USE_AMX: OFF
    BUILD_STATIC_RUNTIME: OFF
    USE_CMSISNN: OFF
    USE_KHRONOS_SPIRV: OFF
    USE_CLML_GRAPH_EXECUTOR: OFF
    USE_TFLITE: OFF
    USE_HEXAGON_GTEST: /path/to/hexagon/gtest
    PICOJSON_PATH: 3rdparty/picojson
    USE_OPENCL_ENABLE_HOST_PTR: OFF
    INSTALL_DEV: OFF
    USE_PROFILER: ON
    USE_NNPACK: OFF
    LLVM_VERSION: 15.0.7
    USE_MRVL: OFF
    USE_OPENCL: OFF
    COMPILER_RT_PATH: 3rdparty/compiler-rt
    RANG_PATH: 3rdparty/rang/include
    USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
    USE_OPENMP: OFF
    USE_BNNS: OFF
    USE_CUBLAS: OFF
    USE_METAL: OFF
    USE_MICRO_STANDALONE_RUNTIME: OFF
    USE_HEXAGON_EXTERNAL_LIBS: OFF
    USE_ALTERNATIVE_LINKER: AUTO
    USE_BYODT_POSIT: OFF
    USE_HEXAGON_RPC: OFF
    USE_MICRO: OFF
    DMLC_PATH: 3rdparty/dmlc-core/include
    INDEX_DEFAULT_I64: ON
    USE_RELAY_DEBUG: OFF
    USE_RPC: ON
    USE_TENSORFLOW_PATH: none
    TVM_CLML_VERSION:
    USE_MIOPEN: OFF
    USE_ROCM: OFF
    USE_PAPI: OFF
    USE_CURAND: OFF
    TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++
    HIDE_PRIVATE_SYMBOLS: ON
  • Any other relevant information:

Additional context

  1. Gemma-2b with same code and enviroment on Qualcomm 8gen2 can work successful, but snapdragon 865 chat failed.
  2. Compiled qwen-1.8b for snadragon 865 work successful.
    I think this error relative to gemma's implementation.
@qc903113684 qc903113684 added the bug Confirmed bugs label Feb 27, 2024
@bulutthecat
Copy link

🐛 Bug

Compile Gemma-2b for Android in q4f16_0. Load model successful, but chat get error: OpenCL Error Code=-54: CL_INVALID_WORK_GROUP_SIZE Stack trace: File "/home/chaoqin/mlcllm/3rdpaty/tvm/scr/runtime/opencl/opencl_module.cc", line 90

To Reproduce

Steps to reproduce the behavior:

  1. compile gemma-2b by q4f16_0 target android
  2. compile android jar
  3. build app by Android studio

Expected behavior

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):Android
  • Operating system (e.g. Ubuntu/Windows/MacOS/...):Ubuntu
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Android Qualcomm Snapdragon 865
  • How you installed MLC-LLM (conda, source): conda
  • How you installed TVM-Unity (pip, source): pip
  • Python version (e.g. 3.10): 3.10
  • GPU driver version (if applicable): 535.86.05
  • CUDA/cuDNN version (if applicable): CUDA 11.8
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
    USE_NVTX: OFF
    USE_GTEST: AUTO
    SUMMARIZE: OFF
    USE_IOS_RPC: OFF
    USE_MSC: OFF
    USE_ETHOSU:
    CUDA_VERSION: NOT-FOUND
    USE_LIBBACKTRACE: AUTO
    DLPACK_PATH: 3rdparty/dlpack/include
    USE_TENSORRT_CODEGEN: OFF
    USE_THRUST: OFF
    USE_TARGET_ONNX: OFF
    USE_AOT_EXECUTOR: ON
    BUILD_DUMMY_LIBTVM: OFF
    USE_CUDNN: OFF
    USE_TENSORRT_RUNTIME: OFF
    USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
    USE_CCACHE: AUTO
    USE_ARM_COMPUTE_LIB: OFF
    USE_CPP_RTVM:
    USE_OPENCL_GTEST: /path/to/opencl/gtest
    USE_MKL: OFF
    USE_PT_TVMDSOOP: OFF
    MLIR_VERSION: NOT-FOUND
    USE_CLML: OFF
    USE_STACKVM_RUNTIME: OFF
    USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
    ROCM_PATH: /opt/rocm
    USE_DNNL: OFF
    USE_VITIS_AI: OFF
    USE_MLIR: OFF
    USE_RCCL: OFF
    USE_LLVM: llvm-config --ignore-libllvm --link-static
    USE_VERILATOR: OFF
    USE_TF_TVMDSOOP: OFF
    USE_THREADS: ON
    USE_MSVC_MT: OFF
    BACKTRACE_ON_SEGFAULT: OFF
    USE_GRAPH_EXECUTOR: ON
    USE_NCCL: OFF
    USE_ROCBLAS: OFF
    GIT_COMMIT_HASH: 79991133c17bb8685185e1f03cc2f688ea37c974
    USE_VULKAN: ON
    USE_RUST_EXT: OFF
    USE_CUTLASS: OFF
    USE_CPP_RPC: OFF
    USE_HEXAGON: OFF
    USE_CUSTOM_LOGGING: OFF
    USE_UMA: OFF
    USE_FALLBACK_STL_MAP: OFF
    USE_SORT: ON
    USE_RTTI: ON
    GIT_COMMIT_TIME: 2024-02-21 22:31:30 -0500
    USE_HEXAGON_SDK: /path/to/sdk
    USE_BLAS: none
    USE_ETHOSN: OFF
    USE_LIBTORCH: OFF
    USE_RANDOM: ON
    USE_CUDA: OFF
    USE_COREML: OFF
    USE_AMX: OFF
    BUILD_STATIC_RUNTIME: OFF
    USE_CMSISNN: OFF
    USE_KHRONOS_SPIRV: OFF
    USE_CLML_GRAPH_EXECUTOR: OFF
    USE_TFLITE: OFF
    USE_HEXAGON_GTEST: /path/to/hexagon/gtest
    PICOJSON_PATH: 3rdparty/picojson
    USE_OPENCL_ENABLE_HOST_PTR: OFF
    INSTALL_DEV: OFF
    USE_PROFILER: ON
    USE_NNPACK: OFF
    LLVM_VERSION: 15.0.7
    USE_MRVL: OFF
    USE_OPENCL: OFF
    COMPILER_RT_PATH: 3rdparty/compiler-rt
    RANG_PATH: 3rdparty/rang/include
    USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
    USE_OPENMP: OFF
    USE_BNNS: OFF
    USE_CUBLAS: OFF
    USE_METAL: OFF
    USE_MICRO_STANDALONE_RUNTIME: OFF
    USE_HEXAGON_EXTERNAL_LIBS: OFF
    USE_ALTERNATIVE_LINKER: AUTO
    USE_BYODT_POSIT: OFF
    USE_HEXAGON_RPC: OFF
    USE_MICRO: OFF
    DMLC_PATH: 3rdparty/dmlc-core/include
    INDEX_DEFAULT_I64: ON
    USE_RELAY_DEBUG: OFF
    USE_RPC: ON
    USE_TENSORFLOW_PATH: none
    TVM_CLML_VERSION:
    USE_MIOPEN: OFF
    USE_ROCM: OFF
    USE_PAPI: OFF
    USE_CURAND: OFF
    TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++
    HIDE_PRIVATE_SYMBOLS: ON
  • Any other relevant information:

Additional context

  1. Gemma-2b with same code and enviroment on Qualcomm 8gen2 can work successful, but snapdragon 865 chat failed.
  2. Compiled qwen-1.8b for snadragon 865 work successful.
    I think this error relative to gemma's implementation.

I am having this issue as well, but with all the 7B models.
it cannot possibly be a memory issue, as 12GB should be more then enough RAM for any of these models (and it not an allocation or out of range error) so I suspect it might be some form of matrix multiplication issue from whatever lib is being used (so OpenCL) where a value being returned is above its maximum allocated range of work items (basically values).
I haven't looked at opencl_module.cc yet, but my suspicion is that some dynamic allocation stuff is happening thats messing with a function call.

I cannot think of why this would be happening, but I might pull and see what I could do about it. For now my recommendation would be to try different models and see if any of them work for you, as I have found that all other models other then the 7B ones work for me.
It might be different on your end.
Hopefully this gets patched.

@CharlieFRuan
Copy link
Contributor

Hi @bulutthecat @qc903113684 apologies for the inconvenience. Could you check whether #1955 was included when you ran into this issue? Or perhaps try again with the latest package? I suspect that this is fixed via #1955. Thank you!

@Kartik14
Copy link
Contributor

@qc903113684 Unfortunately, I am unable to reproduce it on my end. Can you please build tvm and mlc again after fetching the latest changes and then recompile the model library?

@bulutthecat
Copy link

Hi @bulutthecat @qc903113684 apologies for the inconvenience. Could you check whether #1955 was included when you ran into this issue? Or perhaps try again with the latest package? I suspect that this is fixed via #1955. Thank you!

Thanks for letting me know, I will get back to you if it works.

@qc903113684
Copy link
Contributor Author

this PR may fixed problem, I have no time to test yet. #1850

@CharlieFRuan
Copy link
Contributor

Hi @qc903113684, #1850 is superseded by #1822, which was merged 3 weeks ago.

i.e. #1822 and #1955 can both be potential fix to the problem described in this issue

@sinaSPOGames
Copy link

got same error

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/tvm/src/runtime/opencl/opencl_module.cc", line 90

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.prefill(ChatModule.java:54)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$1.invoke(AppViewModel.kt:666)
at ai.mlc.mlcchat.AppViewModel$ChatState$requestGenerate$1$1.invoke(AppViewModel.kt:666)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.requestGenerate$lambda$4(AppViewModel.kt:666)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$lluIrcsPALEW5nCb2tohZYadhTY(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:6)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:462)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:919)

Error message:
InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-54: CL_INVALID_WORK_GROUP_SIZE
Stack trace:
File "/Users/kartik/mlc/tvm/src/runtime/opencl/opencl_module.cc", line 90

@tqchen tqchen closed this as completed Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

6 participants