Skip to content

Official impl. of "Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion". A general semantic video communication framework.

License

Notifications You must be signed in to change notification settings

pku-netvideo/Promptus

 
 

Repository files navigation

Promptus: Representing Real-World Video as Prompts for Video Streaming

This is the official implementation of the paper Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion, which represents real-world videos with a series of "prompts" for delivery and employs Stable Diffusion to generate pixel-aligned videos at the receiver.

teaser1

The original video   VS   The video regenerated from inverse prompts (ours)

 

*To start, it is recommended to run the 'Real-time Generation' with the provided pre-trained prompts, as it is the simplest way to experience Promptus.

*The inversion code will be open-sourced immediately after publication. If you need it before that, please email [email protected] with the following information:

  • Your name, title, affilation and advisor (if you are currently a student)
  • Your intended use of the code

I will promptly send you the inversion code. Before requesting the inversion code, the current repository's code is enough to experience the real-time generation.

Inversion

(0) Getting Started

Clone this repository, enter the 'Promptus' folder and create local environment:

$ conda env create -f environment.yml
$ conda activate promptus

Alternatively, you can also configure the environment manually as follows:

$ conda create -n promptus
$ conda activate promptus
$ conda install python=3.10.14
$ conda install pytorch=2.5.1 torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
$ pip install tensorrt==10.7.0
$ pip install tensorrt-cu12-bindings==10.7.0
$ pip install tensorrt-cu12-libs==10.7.0
$ pip install diffusers==0.26.1
$ pip install opencv-python==4.10.0.84
$ pip install polygraphy==0.49.9
$ conda install onnx=1.17.0
$ pip install onnx_graphsurgeon==0.5.2
$ pip install cuda-python==12.6.2.post1
# At this point, the environment is ready to run the real-time generation.
$ pip install torchmetrics==1.3.0.post0
$ pip install huggingface_hub==0.25.0
$ pip install streamlit==1.31.0
$ pip install einops==0.7.0
$ pip install invisible-watermark
$ pip install omegaconf==2.3.
$ pip install pytorch-lightning==2.0.1
$ pip install kornia==0.6.9
$ pip install open-clip-torch==2.24.0
$ pip install transformers==4.37.2
$ pip install openai-clip==1.0.1
$ pip install scipy==1.12.0
$ pip install accelerate

If you only want to experience real-time generation, please skip to the 'Real-time Generation' part. We provide some pre-trained prompts for testing, allowing you to generate directly without inversion.

(1) Stable Diffusion Model

Download the official SD Turbo model 'sd_turbo.safetensors' from here, and place it in the 'checkpoints' folder.

(2) Data preparation

As a demo, we provide two example videos ('sky' and 'uvg') in the 'data' folder, which you can test directly.

You can also use your own videos, as long as they are organized in the same format as the example above.

(3) Training

$ python inversion.py -frame_path "data/sky" -max_id 140 -rank 8 -interval 10

Where '-frame_path' refers to the video folder, '-max_id' is the largest frame index. '-rank' and '-interval' together determines the target bitrate (Please refer to the paper for details).

As an example, the inverse prompts are saved in the 'data/sky/results/rank8_interval10' folder.

(4) Testing

After training, you can generate videos from the inverse prompts. For example:

$ python generation.py -frame_path "data/sky" -rank 8 -interval 10

the generated frames are saved in the 'data/sky/results/rank8_interval10' folder.

*Note that the generation through generation.py is primarily for debugging and accuracy evaluation, but the generation speed is slow (due to being implemented in PyTorch). For real-time generation, please refer to the 'Real-time Generation' part later on.

We provide pre-trained prompts (in 225 kbps) for 'sky' and 'uvg' examples, allowing you to generate directly without training.

Real-time Generation

(0) Getting real-time engines

We release the real-time generation engines.

If your GPU is an Nvidia GeForce 4090D, the compatible engines can be downloaded directly. Please download the engines from here, and place the 'denoise_batch_10.engine' and 'decoder_batch_10.engine' in the 'engine' folder.

If you use a different GPU, Promptus will automatically build engines for your machine. Please download the 'denoise_batch_10.onnx' and 'decoder_batch_10.onnx' files from here, and place them in the 'engine' folder. In this case, please wait a few minutes during the first run for the engines to be built.

(1) Real-time generating

We provide pre-trained prompts (in 225 kbps) for 'sky' and 'uvg' examples, allowing you to generate directly without inversion. For example:

$ python realtime_demo.py -prompt_dir "data/sky/results/rank8_interval10" -batch 10 -visualize True

the generated frames are saved in the 'data/sky/results/rank8_interval10' folder.

You can also train your own videos as described above and use the generation engines for real-time generation.

On a single NVIDIA GeForce 4090D, the generation speed reaches 170 FPS. The following video shows an example:

Real-time Demo: Generation in 170 FPS

Integrated into browsers and video streaming platforms

Promptus is integrated into a browser-side video streaming platform: Puffer.

Media Server

Within the media server, we replace 'video chunks' with 'inverse prompts'. Inverse prompts have multiple bitrate levels and are requested by the browser client.

Browser Player

At the client, the received prompts are forwarded to the Promptus process. Within the Promptus process, the real-time engine and a GPU are invoked to generate videos. The generated videos are played via the browser's Media Source Extensions (MSE).

The following videos show examples:

Promptus in Browser-side Video Streaming

Promptus under Real-world Network Traces

Acknowledgement

Promptus is built based on these repositories:

pytorch-quantization-demo GitHub stars

generative-models GitHub stars

StreamDiffusion GitHub stars

DiffDVR GitHub stars

taesd GitHub stars

puffer GitHub stars

Citation

@article{wu2024promptus,
  title={Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion},
  author={Wu, Jiangkai and Liu, Liming and Tan, Yunpeng and Hao, Junlin and Zhang, Xinggong},
  journal={arXiv preprint arXiv:2405.20032},
  year={2024}
}

About

Official impl. of "Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion". A general semantic video communication framework.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%