Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APOLLO optimizer #2175

Open
5 tasks done
fblgit opened this issue Dec 11, 2024 · 4 comments
Open
5 tasks done

APOLLO optimizer #2175

fblgit opened this issue Dec 11, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@fblgit
Copy link

fblgit commented Dec 11, 2024

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Any plan or timeline for APOLLO optimizer?

https://arxiv.org/abs/2412.05270

looks like.. very interesting

✔️ Solution

Implementing https://arxiv.org/abs/2412.05270

❓ Alternatives

buy me 4x gpu :D

📝 Additional Context

No response

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@fblgit fblgit added the enhancement New feature or request label Dec 11, 2024
@winglian
Copy link
Collaborator

Thanks @fblgit! We'll look into this and see how feasible this is without a reference implementation

@fizzAI
Copy link

fizzAI commented Jan 21, 2025

https://github.com/zhuhanqing/APOLLO/tree/main/apollo_torch there's a reference implementation now
would be cool to see :3

@ehartford
Copy link
Collaborator

FYI implementing an optimizer is quite independent from Axolotl. It's a huggingface transformers thing.

Check out how I implemented grokAdamW

huggingface/transformers#32521

@fizzAI
Copy link

fizzAI commented Jan 22, 2025

Yes, but axo already has some optimizers of its own that it patches into TF's on its own, ie ADOPT, and it's probably much easier to get a PR doing that working here than dealing with upstream transformers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants