Referring to [this PR from mlx-lm](https://github.com/ml-explore/mlx-lm/pull/990#issuecomment-4410052832), someone found out that using --dtype float16 will help M1/M2 class processor to take advantage from MTP properly. Our oQ should be able to convert MTP with FP16 for M1/M2 as well.
Referring to this PR from mlx-lm, someone found out that using --dtype float16 will help M1/M2 class processor to take advantage from MTP properly.
Our oQ should be able to convert MTP with FP16 for M1/M2 as well.