-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Description
There's an inconsistency in how p.ndim is used in the optimize.py file. In the get_optimizer function, parameters are selected with the condition p.ndim >= 2, but in the Muon class constructor, there's an assertion assert p.ndim == 2, p.ndim which only allows 2D parameters.
This inconsistency can cause issues when using the Muon optimizer with parameters that have more than 2 dimensions.
In get_optimizer
muon_params = [
p
for name, p in model.named_parameters()
if p.ndim >= 2 and "classifiers" not in name and "embedding" not in name
]
In Muon class constructor
for p in muon_params:
# Use Muon for every parameter in muon_params which is >= 2D and doesn't look like an embedding or head layer
assert p.ndim == 2, p.ndim
self.state[p]["use_muon"] = True
could you give any suggestions?
Qintendo, qinyuzhuang, lingfengli-2333, zszheng147 and ZyoungXu
Metadata
Metadata
Assignees
Labels
No labels