Question about interchangeability

First, I would like to express my gratitude to the researchers who transparently shared their outstanding work.

I have a question regarding the interchangeability discussed in Section 3.5.1 of the technical report. The report states that the best learning results were observed when Muon was used as the optimizer in both the pre-training and SFT processes. I am curious whether the same results were observed in RL beyond SFT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about interchangeability #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about interchangeability #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions