-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training with own dataset gives poor results #107
Comments
I've also encountered the problem. I suggest you check if the data are properly normalized and try adding layernorms before calculating attention scores. |
May I ask how did you normalize it, I normalized the vertex values to [0,1] using max-min. Did you end up getting good results? |
Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently. |
Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization. |
Thank you very much for your suggestion. I tried adding layernorm in the original faceformer network before the transformer decoder layer, but found that it had no effect. I will try z-score normalization later. |
I am currently using my own prepared data set, 48 videos, each video contains different people. Among them, there are 40 training data, 3 validation data, and 5 test data. But my current result is that the mouth opening of the human face is very small, and the mouth barely moves.
The text was updated successfully, but these errors were encountered: