Open
Description
i get a quantized model using torchtune package
The test log show me: INFO:torchtune.utils._logging:Time for inference: 66.56 sec total, 4.51 tokens/sec
4.51 tokens/sec is even lower than that of the unquantized model.
It is normal for QAT ? Thank you very much if someone is willing to answer.