Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on grad #7

Open
phusroyal opened this issue May 25, 2023 · 3 comments
Open

Issue on grad #7

phusroyal opened this issue May 25, 2023 · 3 comments

Comments

@phusroyal
Copy link

phusroyal commented May 25, 2023

File "/home/phu/Desktop/gatedtabtransformer/sophia_custom.py", line 46, in step hessian_estimate = self.hutchinson(p, grad) File "/home/phu/Desktop/gatedtabtransformer/sophia_custom.py", line 61, in hutchinson hessian_vector_product = torch.autograd.grad(grad.dot(u), p, retain_graph=True)[0] File "/home/phu/miniconda3/envs/ner-py38-conda-env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 303, in grad return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I also tried to use torch.sum(grad * u) but it did not work!

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@kyegomez
Copy link
Owner

Just upgraded with the original implementation upgrade with pip and try again!

@dhbrojas
Copy link

dhbrojas commented May 28, 2023

Experiencing the same issue using current main version git+https://github.com/kyegomez/Sophia.git@a4db3506fffdab3a06cd4dd07ff54fb311450980 with DecoupledSophia

@Kingsleyandher
Copy link

I meet the same question in Megatron when training distributed model...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants