New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

syncnet的训练代码是不是有bug #113

Open

iloveOREO opened this issue Dec 26, 2024 · 5 comments

iloveOREO commented Dec 26, 2024

在获取数据的代码中 https://github.com/anliyuan/Ultralight-Digital-Human/blob/762e3b6de9e82b6927ce7cf414dcef67dd533ff3/syncnet.py#L84C5-L95C31
每次都把y设成了1, 没有用到ex的img, 不是相当于永远用到了同步的数据? 这样模型只需要无脑输出两个相同的向量, 后续计算loss就极小.
训练的时候BCELoss很快就下降到0.000xxx了
应该不太对吧

drakitLiu commented Dec 27, 2024

他这个训练方法不对的
你可以参考wav2lip的口型判别器方法！

xiao-keeplearning commented Dec 27, 2024

这个训练syncnet图像特征就输入一帧也不合理，16帧长的音频特征对应1帧图像

feipengheart commented Jan 7, 2025

没有用到ex的img,会不会是随机到的音频特征未必是负样本，有可能嘴型和正样本也是相似的，这样反而效果更差，所以作者没用

zhang759740844 commented Jan 10, 2025

不用 syncnet 和用了差别不大，即使改成 wav2lip 的方法也差别不大。

nybhl commented Feb 23, 2025

+1 模型只要学会忽视输入然后永远输出两个相同向量就行了从这个SYNC-NET什么也学不到

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment