Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel benchmark #16

Open
Dujianhua1008 opened this issue Mar 4, 2025 · 1 comment
Open

kernel benchmark #16

Dujianhua1008 opened this issue Mar 4, 2025 · 1 comment

Comments

@Dujianhua1008
Copy link

Dujianhua1008 commented Mar 4, 2025

Hello,guys

Thank you for your work on FlatQuant. I am currently testing the kernel performance and have encountered some issues:

Kernel Testing Errors:

While qattention.py runs successfully, other tests result in errors such as

 FlatQuant/third-party/cutlass/include/cutlass/integer_subbyte.h:96: cutlass::integer_subbyte<Bits, Signed>::integer_subbyte(int) [with int Bits = 4; __nv_bool Signed = true]: block: [45,5,0], thread: [30,7,0] Assertion `value < upper_bound` failed.
Assertion failed: Error Internal

Import Statement Clarification:
In the deploy module, there is an import statement: import deploy._CUDA.​
I am unable to locate the corresponding module or understand its implementation.​
Could you please provide guidance on resolving the kernel testing errors and clarify the purpose and implementation of the deploy._CUDA module?
Looking forward to your reply.

@han65487312
Copy link

It seems like the cutlass kernel accesses illegal memory. You may test the kernel using RTX3090 following our environment setup guidance. And you can provide more details about the environment, GPU version, Pip list, and CUDA version, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants