This is the official implementation of the paper Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion, which represents real-world videos with a series of "prompts" for delivery and employs Stable Diffusion to generate pixel-aligned videos at the receiver.
*To start, it is recommended to run the 'Real-time Generation
' with the provided pre-trained prompts, as it is the simplest way to experience Promptus.
*The inversion code will be open-sourced immediately after publication. If you need it before that, please email [email protected]
with the following information:
- Your name, title, affilation and advisor (if you are currently a student)
- Your intended use of the code
I will promptly send you the inversion code. Before requesting the inversion code, the current repository's code is enough to experience the real-time generation.
Clone this repository, enter the 'Promptus'
folder and create local environment:
$ conda env create -f environment.yml
$ conda activate promptus
Alternatively, you can also configure the environment manually as follows:
$ conda create -n promptus
$ conda activate promptus
$ conda install python=3.10.14
$ conda install pytorch=2.5.1 torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
$ pip install tensorrt==10.7.0
$ pip install tensorrt-cu12-bindings==10.7.0
$ pip install tensorrt-cu12-libs==10.7.0
$ pip install diffusers==0.26.1
$ pip install opencv-python==4.10.0.84
$ pip install polygraphy==0.49.9
$ conda install onnx=1.17.0
$ pip install onnx_graphsurgeon==0.5.2
$ pip install cuda-python==12.6.2.post1
# At this point, the environment is ready to run the real-time generation.
$ pip install torchmetrics==1.3.0.post0
$ pip install huggingface_hub==0.25.0
$ pip install streamlit==1.31.0
$ pip install einops==0.7.0
$ pip install invisible-watermark
$ pip install omegaconf==2.3.
$ pip install pytorch-lightning==2.0.1
$ pip install kornia==0.6.9
$ pip install open-clip-torch==2.24.0
$ pip install transformers==4.37.2
$ pip install openai-clip==1.0.1
$ pip install scipy==1.12.0
$ pip install accelerate
If you only want to experience real-time generation, please skip to the 'Real-time Generation'
part. We provide some pre-trained prompts for testing, allowing you to generate directly without inversion.
Download the official SD Turbo model 'sd_turbo.safetensors'
from here, and place it in the 'checkpoints'
folder.
As a demo, we provide two example videos ('sky'
and 'uvg'
) in the 'data'
folder, which you can test directly.
You can also use your own videos, as long as they are organized in the same format as the example above.
$ python inversion.py -frame_path "data/sky" -max_id 140 -rank 8 -interval 10
Where '-frame_path'
refers to the video folder, '-max_id'
is the largest frame index. '-rank'
and '-interval'
together determines the target bitrate (Please refer to the paper for details).
As an example, the inverse prompts are saved in the 'data/sky/results/rank8_interval10'
folder.
After training, you can generate videos from the inverse prompts. For example:
$ python generation.py -frame_path "data/sky" -rank 8 -interval 10
the generated frames are saved in the 'data/sky/results/rank8_interval10'
folder.
*Note that the generation through generation.py
is primarily for debugging and accuracy evaluation, but the generation speed is slow (due to being implemented in PyTorch). For real-time generation, please refer to the 'Real-time Generation'
part later on.
We provide pre-trained prompts (in 225 kbps) for 'sky'
and 'uvg'
examples, allowing you to generate directly without training.
We release the real-time generation engines.
If your GPU is an Nvidia GeForce 4090D, the compatible engines can be downloaded directly. Please download the engines from here, and place the 'denoise_batch_10.engine'
and 'decoder_batch_10.engine'
in the 'engine'
folder.
If you use a different GPU, Promptus will automatically build engines for your machine. Please download the 'denoise_batch_10.onnx'
and 'decoder_batch_10.onnx'
files from here, and place them in the 'engine'
folder.
In this case, please wait a few minutes during the first run for the engines to be built.
We provide pre-trained prompts (in 225 kbps) for 'sky'
and 'uvg'
examples, allowing you to generate directly without inversion.
For example:
$ python realtime_demo.py -prompt_dir "data/sky/results/rank8_interval10" -batch 10 -visualize True
the generated frames are saved in the 'data/sky/results/rank8_interval10'
folder.
You can also train your own videos as described above and use the generation engines for real-time generation.
On a single NVIDIA GeForce 4090D, the generation speed reaches 170 FPS. The following video shows an example:
Promptus is integrated into a browser-side video streaming platform: Puffer.
Within the media server, we replace 'video chunks'
with 'inverse prompts'
.
Inverse prompts have multiple bitrate levels and are requested by the browser client.
At the client, the received prompts are forwarded to the Promptus process. Within the Promptus process, the real-time engine and a GPU are invoked to generate videos. The generated videos are played via the browser's Media Source Extensions (MSE).
The following videos show examples:
Promptus is built based on these repositories:
@article{wu2024promptus,
title={Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion},
author={Wu, Jiangkai and Liu, Liming and Tan, Yunpeng and Hao, Junlin and Zhang, Xinggong},
journal={arXiv preprint arXiv:2405.20032},
year={2024}
}