no halting in evaluation

https://github.com/SamsungSAILMontreal/TinyRecursiveModels/blob/7de0d20c8f26df706e2c7b3a21ceaf0b3542c953/models/recursive_reasoning/trm.py#L275

trm model in training loop use `q_head` to introduce halting so it will not reach $N_{sup}=16$, however, in evaluation, there is no halting.  It should match the convergent steps in training, otherwise the model will overthink.