Experiments with HunyuanVideo model? #5

anm-ol · 2025-02-19T11:20:25Z

Hi, this is a very interesting paper and thank you for providing the code!

I wanted to ask whether you have tested your method with HunyuanVideo, as it is one of the state-of-the-art open-source video generation models.

If you have used it, could you share any insights or results from your experiments?
If not, was there a specific reason for not including it (e.g., performance issues, compatibility constraints, or other limitations)?
What kind of difference do you see with U-net vs DiT based models (say AnimateDiff vs CogVideoX-2B) is the increased VRAM usage justified in the better results?

YujieOuO · 2025-03-11T13:26:57Z

Thanks for your attention.

Now, we release the Light-A-Video with Wan2.1 backbone. Wan2.1 is one of the best DiT-based video foundation model.
For a resolution of 512×512, AnimateDiff requires approximately 23 GB of GPU memory to generate a 16-frame video. It is more suitable for consumer-grade GPUs, such as the RTX 3090. In contrast, Wan2.1 consumes around 36 GB of GPU memory to generate a 49-frame video at the same resolution.
Actually, for the relighting quality, AnimateDiff may be more suitable as the vdm backbone. This is because the current IC-Light model we are using is based on a U-Net architecture, which is more aligned with the Stable Diffusion model used in AnimateDiff. This alignment ensures greater consistency and coherence in the relighting process.
DiT-based VDM preserves finer details and supports longer, more diverse resolutions.

YujieOuO · 2025-03-11T13:50:21Z

bear.mp4

relight_prompt: "a bear walking on the rock, nature lighting, soft light"
bg_source: "TOP"

Provide feedback