Skip to content

Inconsistent p.ndim usage in Muon optimizer #37

@N-damo

Description

@N-damo

There's an inconsistency in how p.ndim is used in the optimize.py file. In the get_optimizer function, parameters are selected with the condition p.ndim >= 2, but in the Muon class constructor, there's an assertion assert p.ndim == 2, p.ndim which only allows 2D parameters.

This inconsistency can cause issues when using the Muon optimizer with parameters that have more than 2 dimensions.

In get_optimizer
muon_params = [
    p
    for name, p in model.named_parameters()
    if p.ndim >= 2 and "classifiers" not in name and "embedding" not in name
]
In Muon class constructor
for p in muon_params:
    # Use Muon for every parameter in muon_params which is >= 2D and doesn't look like an embedding or head layer
    assert p.ndim == 2, p.ndim
    self.state[p]["use_muon"] = True

could you give any suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions