-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Training] Stops with an error : The algorithm failed to converge because the input matrix contained non-finite values. #69
Comments
Hi @Yashaswini-Srirangarajan, However, Asking @billl-jiang for any support with this issue and debugging. |
UPDATE:
|
@zybermonk Thanks for the inputs.. How did you debug for NANs. Looks like all my files in new_joint_vecs and new_joints don't have NANs. I am missing any step from generating the HumanML3D dataset? Thanks a lot!
|
Tried this approach as well, but I seem to getting some other error as below. Had you faced this before? Thanks!
|
@Yashaswini-Srirangarajan I hit the same issue and using scipy==1.11.1 solved my problem, although I'm not sure which version is mathematically more correct. See: |
Hi @Yashaswini-Srirangarajan, sorry for the late response. Evidently, the You will find the following files also have faulty data, as encountered previously after using the 2nd notebook from HumanML3D Next step would be to delete these files in
|
If anyone has any input on which version is more mathematically correct, that would be great. |
Just adding to this question, changing these libraries indirectly requires finding the right numpy version as well. |
At least a partial fix has come through at scipy/scipy#20212. We recommend trying again once SciPy 1.13.0 is released, to see whether the problems are gone. |
@lucascolley, This fix now works for me :) thanks !! |
fantastic - 1.13.0 should be out within the next few weeks |
It was just released. |
Running python -m train --cfg configs/config_h3d_stage1.yaml --nodebug after setting up the database proceeds training for 9 epochs and runs into the below error.
How do we fix this ?
The text was updated successfully, but these errors were encountered: