Despite small model size, the inference is slow. I am getting RTX 2 even with A100 GPU. Is something wrong with my setup? <img width="1573" height="282" alt="Image" src="https://github.com/user-attachments/assets/64bb6840-747f-4965-95b5-b6083d935ba0" />