-
Notifications
You must be signed in to change notification settings - Fork 107
rotate face to neutral pose first #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Actually a simpler solution than rotating the landmarks would be to project the points onto a plane defined by some local axes. |
I actually got those points in a better way, but couldn't had the time to implement it properly yet. |
Currently, the code uses metric landmarks or normalized landmarks (image pixel space) to calculate blendshape values. I tried both methods and the results look awful. However, both ways ignore the face identity. Different people have varied faces. I even try the rigid transformation to map my metric landmarks to the canonical face provided by mediapipe. However, even the neural faces in the transformed space (canonical space) look different. Do you have any suggestions? I am also working on data-driven blendshape solver (deep learning by collecting enough metahuman faces and their blendshape values). |
deep learning base approach seems ok but requires mush pair-data for training |
Yes. It needs massive paired-data such as hundrends of faces. Luckily, Metahuman is real enough to compensate for the real human face collection. I am working on writing a metahuman project to receive blendshape values and save the results as image. |
I am no more at NetEase, but the image-bs pair from metahuman could be easily acquired if you are familiar with UE. |
Some Updates: Method: training a neural network from synthesized metahuman faces to 52 bs. Result: The neural network is converged well on the synthesized datasets. Testing on the synthesized datasets worked well. However, it does not generalize to real human faces. |
@qhanson I suggest training the model to directly use MediaPipe landmarks to predict blendshape values. To generate the ground truth blendshape values for the dataset, you'll have to use something like LiveLinkFace mentioned in the README. This MediaPipe -> blendshape model is the missing piece to replacing LiveLinkFace |
In my experiment, directly learning the mapping (468*3 -> 52) with a 4-layer MLP does not work well. With l1 loss, the output keeps the same. With l2 loss, the mouth can open and close while the eye keeps open all the time. This reminds me of the mesh classification problem. Passing the render mesh or point cloud of 468 landmarks may work. In this way, we can not exploit the pretrained-weights of mediapipe. I do not know the minimum number of paired image2bs. Tip: I have not tested this way. |
I would try to use a smaller input, you don't need all the 468 Keypoints, I would try to start with the ones I'm using in my config file and slowly adding more (by looking at the ones that really matter when doing facial stuff). With that you will need way less training data (and training time). |
Can you share your data, I want to use it to train a mediapipe2blendshape network, if it works well I will share the network with you. |
For simple experiments, you do not need these datasets to train on model. You can try https://github.com/yeemachine/kalidokit |
What loss function did you use to train this network? There's another morphable head model named FLAME, which offers a tool to generate 3d mesh with its 100 expression parameters (something like blendshape) as inputs. With this tool we could build loss functions by mapping it back to the image space (3d -> 2d) and thus compare the 3d landmarks of the face. But it seems that ARKit lack this kind of tool to do the mapping. If you use statistical L1 loss or so, it will only focus on the similarity of the numbers, but not the similarity of the actual expressions. Guess that's why your model not performing well in generalization. |
I noticed that the results vary with different face rotation / head tilt, since the values used are tuned to the neutral / upright face rotation. I think you should first rotate the landmarks into a neutral pose before doing the calculations, so that results are rotation invariant. Are there plans on adding this feature?
The text was updated successfully, but these errors were encountered: