TODO:
Data:
- See
owl_vaes/data/video_dir_loader.py
- Create an equivalent for VR data in
-
- I'm currently assuming VR data has a format such that each data instance has a folder with a recording and controls
- New loader should go in
owl_vaes/data/vr_video_dir_loader.py
- Some specs for new loader
-
- It should have randomization to ensure different workers get different videos (existing loader takes
rank and world_size)
-
- It should have
target_size as Tuple[int,int] to control the per-eye resolution from the video.
-
window_length as int to control the number of frames that are sampled (for a video autoencoder)
-
- Note that return shape is
[b,c,h,w] in the window_length=1 case and [b,n,c,h,w], with c=6. This assumes channel-wise concat for both eyes views, which I think it fine. Feel free to push back on this if you disagree.
-
- There may be a need for other VR-specific kwargs for sanity/debugging purposes. If you feel these are needed, feel free to add
- Once a dataloader is created, and a get_loader function is created, add it to
owl_vaes/data/__init__.py
- In
owl_vaes/data/vr_video_dir_loader.py you can add a testing function to ensure loader works
Logging:
- See
owl_vaes/utils/logging.py
- Admittedly it's getting a bit cluttered, best to put new logging code in
owl_vaes/utils/vr_logging.py
- You can see how these are used in example trainers (will list later)
to_wandb_vr should be a function similar to to_wandb, which takes two [b,6,h,w] images and puts them side by side (original and reconstruction)
to_wandb_vr_video should be a function similar to to_wandb_video_sidebyside
Modelling:
- For the time being nothing fancy is needed on modelling side, just use
dcae with a channel count of 6 for both eyes. This is more so just intended to get you started with the codebase
Trainer:
- You should be able to copy
owl_vaes/trainers/rec.py or owl_vaes/trainers/video_rec.py and just replace the logging function. It might also make sense to just add logging information to the config so that this can all be wrapped into existing trainers.
Configs and Launching a Training Job:
- See example configs in
configs/waypoint_1/owl_vae_f16_c16.yml
- You launch train runs with
python -m train --config_path path_to_config.yml or torchrun --nproc_per_node=8 -m train --config_path path_to_config.yml
- Use skypilot configs for multinode jobs, but keep in mind that for image vae you don't need more than one node
TODO:
Data:
owl_vaes/data/video_dir_loader.pyowl_vaes/data/vr_video_dir_loader.pyrankandworld_size)target_sizeasTuple[int,int]to control the per-eye resolution from the video.window_lengthasintto control the number of frames that are sampled (for a video autoencoder)[b,c,h,w]in thewindow_length=1case and[b,n,c,h,w], withc=6. This assumes channel-wise concat for both eyes views, which I think it fine. Feel free to push back on this if you disagree.owl_vaes/data/__init__.pyowl_vaes/data/vr_video_dir_loader.pyyou can add a testing function to ensure loader worksLogging:
owl_vaes/utils/logging.pyowl_vaes/utils/vr_logging.pyto_wandb_vrshould be a function similar toto_wandb, which takes two [b,6,h,w] images and puts them side by side (original and reconstruction)to_wandb_vr_videoshould be a function similar toto_wandb_video_sidebysideModelling:
dcaewith a channel count of 6 for both eyes. This is more so just intended to get you started with the codebaseTrainer:
owl_vaes/trainers/rec.pyorowl_vaes/trainers/video_rec.pyand just replace the logging function. It might also make sense to just add logging information to the config so that this can all be wrapped into existing trainers.Configs and Launching a Training Job:
configs/waypoint_1/owl_vae_f16_c16.ymlpython -m train --config_path path_to_config.ymlortorchrun --nproc_per_node=8 -m train --config_path path_to_config.yml