feat(trainer): symmetric DPPO-Binary TV default loss (no KL, no advantage conditioning)#2434
Closed
samsja wants to merge 5 commits into
Closed
feat(trainer): symmetric DPPO-Binary TV default loss (no KL, no advantage conditioning)#2434samsja wants to merge 5 commits into
samsja wants to merge 5 commits into
Commits
Commits on May 8, 2026
Commits on May 9, 2026
- andcommitted
- authored andcommitted
- authored andcommitted