Skip to content

[WiP] DDPO video WM trainer#16

Open
pmcurtin wants to merge 9 commits into
Overworldai:mainfrom
pmcurtin:main
Open

[WiP] DDPO video WM trainer#16
pmcurtin wants to merge 9 commits into
Overworldai:mainfrom
pmcurtin:main

Conversation

@pmcurtin
Copy link
Copy Markdown

for now, no reward function as that will depend on the teacher model reward architecture. Will test with random rewards or something like that. Ignore the resources/ dir, it contains reference impl. of DDPO from the original ddpo-pytorch repo, and will be removed.

@wendlerc
Copy link
Copy Markdown

wendlerc commented Jul 5, 2025

to get rid of my vibe-fix of the log probs we might be able to make use of https://github.com/yifan123/flow_grpo/blob/main/flow_grpo/diffusers_patch/sd3_sde_with_logprob.py#L71 (thanks @fairy for digging this out!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants