You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, this is a very interesting paper and thank you for providing the code!
I wanted to ask whether you have tested your method with HunyuanVideo, as it is one of the state-of-the-art open-source video generation models.
If you have used it, could you share any insights or results from your experiments?
If not, was there a specific reason for not including it (e.g., performance issues, compatibility constraints, or other limitations)?
What kind of difference do you see with U-net vs DiT based models (say AnimateDiff vs CogVideoX-2B) is the increased VRAM usage justified in the better results?
The text was updated successfully, but these errors were encountered:
Now, we release the Light-A-Video with Wan2.1 backbone. Wan2.1 is one of the best DiT-based video foundation model.
For a resolution of 512×512, AnimateDiff requires approximately 23 GB of GPU memory to generate a 16-frame video. It is more suitable for consumer-grade GPUs, such as the RTX 3090. In contrast, Wan2.1 consumes around 36 GB of GPU memory to generate a 49-frame video at the same resolution.
Actually, for the relighting quality, AnimateDiff may be more suitable as the vdm backbone. This is because the current IC-Light model we are using is based on a U-Net architecture, which is more aligned with the Stable Diffusion model used in AnimateDiff. This alignment ensures greater consistency and coherence in the relighting process.
DiT-based VDM preserves finer details and supports longer, more diverse resolutions.
Hi, this is a very interesting paper and thank you for providing the code!
I wanted to ask whether you have tested your method with HunyuanVideo, as it is one of the state-of-the-art open-source video generation models.
The text was updated successfully, but these errors were encountered: