I am a beginner trying to reproduce the code, but during the pre-training phase, I encountered issues with "Abnormal instance: 0" and "(AAFN loss) second value is nan", which resulted in the final result not meeting the requirements of the paper. May I ask where the problem lies? I would greatly appreciate it if you could provide me with an answer!