[WiP] DDPO video WM trainer by pmcurtin · Pull Request #16 · Overworldai/owl-wms

pmcurtin · 2025-06-20T17:48:34Z

for now, no reward function as that will depend on the teacher model reward architecture. Will test with random rewards or something like that. Ignore the resources/ dir, it contains reference impl. of DDPO from the original ddpo-pytorch repo, and will be removed.

…diffusion model and sample fresh noise at each steps, also we estimate the variances for the schedule

wendlerc · 2025-07-05T22:51:02Z

to get rid of my vibe-fix of the log probs we might be able to make use of https://github.com/yifan123/flow_grpo/blob/main/flow_grpo/diffusers_patch/sd3_sde_with_logprob.py#L71 (thanks @fairy for digging this out!)

ychu12 and others added 9 commits June 20, 2025 13:45

claude

4fb9428

basic_ddpo

15ab85c

update conf

8e3ab2b

getting vaes working

134c1e7

got running

1d0dbc9

gitignore and comment out debug print

40051e3

more ignores

5ccb5dd

simple reward func and tweaks

5f5a118

vibe-fixed with claude; now we pretend our rectified flow model is a …

f8b916c

…diffusion model and sample fresh noise at each steps, also we estimate the variances for the schedule

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WiP] DDPO video WM trainer#16

[WiP] DDPO video WM trainer#16
pmcurtin wants to merge 9 commits into
Overworldai:mainfrom
pmcurtin:main

pmcurtin commented Jun 20, 2025

Uh oh!

wendlerc commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pmcurtin commented Jun 20, 2025

Uh oh!

wendlerc commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants