Hello! I attempted to reproduce the code using the qwen2.5-vl-3b-instruct model, modifying the dimension of the hidden layer to 2048, and the loss curve is shown below. Evaluation was conducted using the file lora_stage234_merged. However, the results do not seem to contain any tokens such as think, answer, sam_pad, as shown in the figure. My modifications are as follows: using Ascend NPU for computation, while disabling flash_attn and GPU-related parameters. But this does not seem to resolve the issue where the answer output does not conform to the preset format. Could you please advise on the possible causes?

Hello! I attempted to reproduce the code using the qwen2.5-vl-3b-instruct model, modifying the dimension of the hidden layer to 2048, and the loss curve is shown below. Evaluation was conducted using the file lora_stage234_merged. However, the results do not seem to contain any tokens such as think, answer, sam_pad, as shown in the figure. My modifications are as follows: using Ascend NPU for computation, while disabling flash_attn and GPU-related parameters. But this does not seem to resolve the issue where the answer output does not conform to the preset format. Could you please advise on the possible causes?