You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, there was an error when I used the Sophia optimizer to train GPT3 with Megatron. The error point is that grad cannot be substituted into the optimizer with require_grad = True state to calculate the second derivative. Do you know how to solve this problem?
File "/root/miniconda3/envs/torch18/lib/python3.7/site-packages/torch/autograd/__init__.py", line 277, in grad allow_unused, accumulate_grad=False) # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.
Upvote & Fund
We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.
The text was updated successfully, but these errors were encountered:
classHutchinsonEstimator(HessianEstimator):
defestimate(self, p, grad):
u=torch.randn_like(grad)
grad_dot_u=torch.sum(grad*u)
print(f"grad_dot_u requires grad: {grad_dot_u.requires_grad}") # -> False# ↓ RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.hessian_vector_product=torch.autograd.grad(
grad_dot_u, p, retain_graph=True)[0]
returnu*hessian_vector_product
Hello, there was an error when I used the Sophia optimizer to train GPT3 with Megatron. The error point is that
grad
cannot be substituted into the optimizer withrequire_grad = True
state to calculate the second derivative. Do you know how to solve this problem?File "/root/miniconda3/envs/torch18/lib/python3.7/site-packages/torch/autograd/__init__.py", line 277, in grad allow_unused, accumulate_grad=False) # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.
Upvote & Fund
The text was updated successfully, but these errors were encountered: