How do you get the same performance for a previous task?

Hi, thanks for the great work. I try to use your code for my own project, however, I find it is hard to obtain the same accuracy on the previously trained task. I think the main reason is that the masked parameters do not include the batch norm layer. Do you have any idea about this behaviour?