Recently, we often see this issue based on the https://github.com/huggingface/huggingface-llama-recipes/blob/main/local_inference/fp8-405B.ipynb We easily get illegal memory access for model like 8B quantized (shallow layers) with FBGEMM