kernel benchmark #16

Dujianhua1008 · 2025-03-04T12:54:40Z

Hello,guys

Thank you for your work on FlatQuant. I am currently testing the kernel performance and have encountered some issues:

Kernel Testing Errors:

While qattention.py runs successfully, other tests result in errors such as

 FlatQuant/third-party/cutlass/include/cutlass/integer_subbyte.h:96: cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(int) [with int Bits = 4; __nv_bool Signed = true]: block: [45,5,0], thread: [30,7,0] Assertion `value < upper_bound` failed.
Assertion failed: Error Internal

Import Statement Clarification:
In the deploy module, there is an import statement: import deploy._CUDA.
I am unable to locate the corresponding module or understand its implementation.
Could you please provide guidance on resolving the kernel testing errors and clarify the purpose and implementation of the deploy._CUDA module?
Looking forward to your reply.

The text was updated successfully, but these errors were encountered:

han65487312 · 2025-03-07T02:57:59Z

It seems like the cutlass kernel accesses illegal memory. You may test the kernel using RTX3090 following our environment setup guidance. And you can provide more details about the environment, GPU version, Pip list, and CUDA version, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel benchmark #16

kernel benchmark #16

Dujianhua1008 commented Mar 4, 2025 •

edited

Loading

han65487312 commented Mar 7, 2025

kernel benchmark #16

kernel benchmark #16

Comments

Dujianhua1008 commented Mar 4, 2025 • edited Loading

han65487312 commented Mar 7, 2025

Dujianhua1008 commented Mar 4, 2025 •

edited

Loading