Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient shape unexpected #3

Open
snykral opened this issue May 25, 2023 · 7 comments
Open

Gradient shape unexpected #3

snykral opened this issue May 25, 2023 · 7 comments

Comments

@snykral
Copy link

snykral commented May 25, 2023

`File ~\Python\PyTorch\RL\utils\optim.py:61, in Sophia.hutchinson(self, p, grad)
59 def hutchinson(self, p, grad):
60 u = torch.randn_like(grad)
---> 61 hessian_vector_product = torch.autograd.grad(grad.dot(u), p, retain_graph=True)[0]
62 return u * hessian_vector_product

RuntimeError: 1D tensors expected, but got 4D and 4D tensors`
Does it run on any network architecture?

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@bbbxyz
Copy link

bbbxyz commented May 25, 2023

Looks like a bug. torch.dot() only works on 1D vectors. You could try using torch.sum(grad * u) instead.
Unless you need this urgently, I'd suggest waiting for the official implementation to be released tomorrow.

@kyegomez
Copy link
Owner

Just fixed it try upgrading to new version please!

@kyegomez
Copy link
Owner

Looks like a bug. torch.dot() only works on 1D vectors. You could try using torch.sum(grad * u) instead. Unless you need this urgently, I'd suggest waiting for the official implementation to be released tomorrow.

"we are aiming to release tomorrow" -Lol aiming

@snykral
Copy link
Author

snykral commented May 25, 2023

Just fixed it try upgrading to new version please!

It worked, but now I'm facing the same issue as #7. Somehow, grad.requires_grad is False when it arrives at the optimizer.

Also, I had to comment some lines at init.py, because their files didn't come with the library:
#from experiments.training import trainer
#from Sophia.decoupled_sophia.decoupled_sophia import DecoupledSophia

@bbbxyz
Copy link

bbbxyz commented May 25, 2023

FYI https://github.com/Liuhong99/Sophia

@kyegomez
Copy link
Owner

I've upgraded it, now try upgrading with pip 😊

@Kingsleyandher
Copy link

did you solve this question? I also meet this question in megatron...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants