RuntimeError: CUDA error: an illegal memory access was encountered 

Recently, we often see this issue based on the 
https://github.com/huggingface/huggingface-llama-recipes/blob/main/local_inference/fp8-405B.ipynb

We easily get illegal memory access for model like 8B quantized  (shallow layers) with FBGEMM