-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static Noise at ~193rd Frame on more than 8 sec #392
Comments
I have to admit I never fully understood what they mean by that, the So I suppose all they mean is that it helps reduce VRAM use to allow such high frame counts, and in general doing that many frames is slow. I have also noticed the soft cap in frame count and how it turns into noise after that with this model, so far haven't found way around it. Not that sure it's useful anyway, I think better approach would be simply to continue from last frame, as the results will loop at 201 frames already. |
I've noticed sometimes it loops but sometimes it continues as if the noise didn't happen. The main issue with using last frame is the motion then isn't continuous, so I'm trying to figure out why their repo implies they've had no problems up to 12 seconds, as that'd be preferable to using last frame. |
I think I found the cause of the 192-frame limit. It's in the VAE, specifically autoencoder_kl_hunyuan_video.py. The original code runs into an out-of-bounds at the 25th tile (193rd frame at 8 frames per tile). Find:
Replace:
Is this something you can test? I'm not sure where the fix should be applied in the ComfyUI context. |
But decoding with the base model never was issue in any frame count though? |
I never took an interest in the base model (lack of i2v) but also maybe no one bothered to go for more than 8 seconds. I can double check later today. |
I've done over 500 frames before with context windows and that decodes fine, and also you can see the noisy frames using the VHS nodes live preview feature on comfy even before decoding. It's pretty weird as it's just like 2-3 latents worth of frames after 193...but then continues fine after. |
What node is that for previewing before the VAE decode? I'd like to check as well. |
Video helper suite options have a toggle for live previews, then in ComfyUI Manager enable latent2rgb preview mode (or auto). I think the issue is more about the image condition and looping that would occur around that frame count normally. |
Is there some way to not use the Hunyuan VAE or use a different one? See if that eliminates the issue. Sorry if it's a weird question - just diving into this for the first time so I'm not as familiar. |
Only way without training one is the latent2rgb method, which at best can only give an approximate preview. What would make this issue clear would be using their repo to do the 201 frames and see if it still happens, if it doesn't then it's something that can be fixed here as well. |
May have something for this now, there was this project released: https://github.com/thu-ml/RIFLEx Which is really simple addition to the rotary positional embed code, I've made a node for the native implementation to use it and been able to do 253 frames without looping or noise: SkyReelHyVidComfyNative_00008.2.mp4 |
Nice, I look forward to trying that when you have it available - I was in the middle of trying to patch the python code for the Hunyuan vae at run time with a node and went down a deep rabbithole lol If you're curious - got stuck on the vae input at runtime not actually matching the autoencoder_kl_hunyuan_video.py I'm trying to patch (it was AutoencoderKLCausal3D instead):
|
If you want to share the python code for the node, I can also try testing how far it can extend - if it goes all the way until your memory runs out. I have an 8-GPU server currently rented so it'd cost nothing to throw this test on top haha |
It's already available in KJNodes, for native Comfy Hunyuan that is. |
Awesome - I'll give it a shot and let you know how it goes |
I was just about to ask - thanks - I prefer the wrapper workflow haha |
Our solutions ended up being quite similar at least according to Grok. Quite interesting read haha
|
Uhm... this still has nothing to do with VAE... |
My understanding is it's trimming before it gets to VAE, which would be another way to avoid the overflow I think is causing the frame limit from before. Is that not the case? I guess we'll see. I'm testing something on the main Skyreels repo to see if it fixes the frame limit problem there too which would confirm if the VAE is the problem or not. |
No, like I said before then it would fail to decode latents from any model, not just SkyReels I2V, the latent space and VAE is the same between all these models. |
I'm trying to test the frame limit on this on a H100. I ran into an issue where the video would be fine most the way except at roughly the 193rd frame where it'd suddenly become a burst of static for a moment before continuing just fine again. I posted this on huggingface as well: https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy/discussions/5
I found this on the SkyReels github:
"At maximum VRAM capacity, a 544px960px289f 12s video can be produced (using --sequence_batch, taking ~1.5h on one RTX 4090; adding GPUs greatly reduces time)."
This seems to imply that it can handle more than 200 frames if the argument "--sequence_batch" is passed to SkyReels. Is there a way to do this in comfyui nodes?
The text was updated successfully, but these errors were encountered: