Seeking Optimization for skyreels I2V: Addressing Speed and Quality Challenges with Negative Prompts and TEA Acceleration #378

ptmaster · 2025-02-19T09:32:05Z

Today, everyone has successfully run I2V, thanx! but the primary issue remains the speed. The inclusion of negative prompts has doubled the computation time, and while enabling TEA acceleration improves speed, it unfortunately compromises the visual quality. These two factors, one positive and one negative, present a rather challenging situation of time. On average, generating a 97-frame video takes approximately 15 minutes on one 4090. I kindly request your assistance in optimizing this process at your earliest convenience. Thank you very much! :)

kijai · 2025-02-19T10:27:50Z

Losing cfg distillation really hurts with the speed indeed, best way to mitigate that is to find out how many steps we really need cfg for, as the nodes already support scheduling it over time. My default workflow only runs it with cfg for half the steps, speeding it up considerably with little quality loss. However it's hard to judge both the value of cfg itself and how many steps to do it for, this just requires testing.

ObiLeek · 2025-02-19T10:49:16Z

15 minutes is too much. On my RTX 4090 the times for 97 frames are as follows (including model loading):

attention_mode: sageattn_varlen
teacache: 0.15
model quantization: fp8_e4m3fn
time: 156 seconds

attention_mode: sageattn_varlen
teacache: disabled
model quantization: fp8_e4m3fn
time: 186 seconds

attention_mode: sageattn_varlen
teacache: 0.15
model quantization: disabled
time: 197 seconds

BlockSwap set to fill VRAM to 95%.

I would recommend checking VRAM fill to see if it is at 97 percent or more. If it is, then pull it down to 95 or less.

wwwffbf · 2025-02-19T11:35:56Z

@ObiLeek what is your resolution? so fast!

He is using 544x960x97F 30steps, I guess.

ptmaster · 2025-02-19T11:36:51Z

15 minutes is too much. On my RTX 4090 the times for 97 frames are as follows (including model loading):

attention_mode: sageattn_varlen teacache: 0.15 model quantization: fp8_e4m3fn time: 156 seconds

attention_mode: sageattn_varlen teacache: disabled model quantization: fp8_e4m3fn time: 186 seconds

attention_mode: sageattn_varlen teacache: 0.15 model quantization: disabled time: 197 seconds

BlockSwap set to fill VRAM to 95%.

I would recommend checking VRAM fill to see if it is at 97 percent or more. If it is, then pull it down to 95 or less.

With 960 544 ? Perhaps you should take a look at the previous post, which documented a method to avoid bursting memory and having to enable blockswap.
#372

ObiLeek · 2025-02-19T11:52:08Z

Resolution is 544x960, 30 steps. All other parameters are the same as in the example workflow.

But auto_cpu_offload is not the same as blockswap. Blockswapping is much more faster. At least in my environment.

wwwffbf · 2025-02-19T12:14:12Z

Resolution is 544x960, 30 steps. All other parameters are the same as in the example workflow.

But auto_cpu_offload is not the same as blockswap. Blockswapping is much more faster. At least in my environment.

Incredible speed! Could you share your workflow?

ptmaster · 2025-02-19T12:17:17Z

Resolution is 544x960, 30 steps. All other parameters are the same as in the example workflow.

But auto_cpu_offload is not the same as blockswap. Blockswapping is much more faster. At least in my environment.

I understand that purely from a data speed perspective, your data seems credible and you own a real 4090. However, we can't overlook the importance of image quality and dynamic outcomes. While using a flowmatch sampler with TEA acceleration might easily achieve speeds around three minutes, it's challenging to obtain satisfactory results. Could we please avoid focusing solely on data speed? It feels a bit misleading and making misconception of update for KJ, and I hope you understand my concern. :)

ObiLeek · 2025-02-19T12:41:44Z

Sure, I understand. TeaCache is a bit of a quality lottery with this model, but with the fixed seed I used for the tests, the quality was great. However, I shared the result without it:

attention_mode: sageattn_varlen
teacache: disabled
model quantization: fp8_e4m3fn
time: 186 seconds
scheduler: SDE-DPMSolverMultistepScheduler
resolution: 544x960
steps: 30
frames: 97

It's not a problem for me to do any test if there is interest. The workflow is identical to hyvideo_skyreel_img2vid_example_01 in the latest version, but with the above parameters and modified blockswapping.

wwwffbf · 2025-02-19T13:43:40Z

blockswapping.
How did you set the block swap.
when using fp8 e4m3fn and triton, Even if you minimize the block swap or even just delete it, the video memory will not take up more than 90% of the memory.

ganicus · 2025-02-19T16:40:09Z

Where is tea cache in the workflow? Is it part of the Torch Compile node somewhere?

pftq · 2025-02-22T16:37:33Z

It's a new node called "Hunyuan TeaCache" and connects to the teacache_args in the Hunyuan Sampler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking Optimization for skyreels I2V: Addressing Speed and Quality Challenges with Negative Prompts and TEA Acceleration #378

Seeking Optimization for skyreels I2V: Addressing Speed and Quality Challenges with Negative Prompts and TEA Acceleration #378

ptmaster commented Feb 19, 2025

kijai commented Feb 19, 2025

ObiLeek commented Feb 19, 2025

wwwffbf commented Feb 19, 2025 •

edited

Loading

ptmaster commented Feb 19, 2025

ObiLeek commented Feb 19, 2025 •

edited

Loading

wwwffbf commented Feb 19, 2025

ptmaster commented Feb 19, 2025

ObiLeek commented Feb 19, 2025 •

edited

Loading

wwwffbf commented Feb 19, 2025

ganicus commented Feb 19, 2025

pftq commented Feb 22, 2025

Seeking Optimization for skyreels I2V: Addressing Speed and Quality Challenges with Negative Prompts and TEA Acceleration #378

Seeking Optimization for skyreels I2V: Addressing Speed and Quality Challenges with Negative Prompts and TEA Acceleration #378

Comments

ptmaster commented Feb 19, 2025

kijai commented Feb 19, 2025

ObiLeek commented Feb 19, 2025

wwwffbf commented Feb 19, 2025 • edited Loading

ptmaster commented Feb 19, 2025

ObiLeek commented Feb 19, 2025 • edited Loading

wwwffbf commented Feb 19, 2025

ptmaster commented Feb 19, 2025

ObiLeek commented Feb 19, 2025 • edited Loading

wwwffbf commented Feb 19, 2025

ganicus commented Feb 19, 2025

pftq commented Feb 22, 2025

wwwffbf commented Feb 19, 2025 •

edited

Loading

ObiLeek commented Feb 19, 2025 •

edited

Loading

ObiLeek commented Feb 19, 2025 •

edited

Loading