Training with own dataset gives poor results #107

Echo-jyt · 2024-10-30T11:51:32Z

I am currently using my own prepared data set, 48 videos, each video contains different people. Among them, there are 40 training data, 3 validation data, and 5 test data. But my current result is that the mouth opening of the human face is very small, and the mouth barely moves.

fredkingdom · 2025-01-08T03:17:55Z

I've also encountered the problem. I suggest you check if the data are properly normalized and try adding layernorms before calculating attention scores.

Echo-jyt · 2025-01-17T02:41:54Z

I've also encountered the problem. I suggest you check if the data are properly normalized and try adding layernorms before calculating attention scores.

May I ask how did you normalize it, I normalized the vertex values to [0,1] using max-min. Did you end up getting good results?

fredkingdom · 2025-01-17T03:36:08Z

May I ask how did you normalize it, I normalized the vertex values to [0,1] using max-min. Did you end up getting good results?

Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.

fredkingdom · 2025-01-17T03:37:56Z

Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.

Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization.

Echo-jyt · 2025-01-18T02:35:01Z

Besides raw data normalization, you could try adding nn.layernorm in the original faceformer network before the transformer decoder layer, or simply passing norm_first=True to faceformer.decoder_layer during initiliaztion. I haven't tested on faceformer, but I fixed a very similar problem by this technique recently.

Also, try different normalization techniques when dealing with raw data, e.g. z-score normalization.

Thank you very much for your suggestion. I tried adding layernorm in the original faceformer network before the transformer decoder layer, but found that it had no effect. I will try z-score normalization later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with own dataset gives poor results #107

Training with own dataset gives poor results #107

Echo-jyt commented Oct 30, 2024

fredkingdom commented Jan 8, 2025

Echo-jyt commented Jan 17, 2025

fredkingdom commented Jan 17, 2025

fredkingdom commented Jan 17, 2025

Echo-jyt commented Jan 18, 2025

Training with own dataset gives poor results #107

Training with own dataset gives poor results #107

Comments

Echo-jyt commented Oct 30, 2024

fredkingdom commented Jan 8, 2025

Echo-jyt commented Jan 17, 2025

fredkingdom commented Jan 17, 2025

fredkingdom commented Jan 17, 2025

Echo-jyt commented Jan 18, 2025