This is a simple yet powerful web application built with Streamlit that allows you to generate short videos from text prompts using the state-of-the-art Damo-VILAB Text-to-Video Diffusion Model.
-
Text-to-Video Generation: Convert descriptive text prompts into dynamic video clips.
-
Intuitive UI: Easy-to-use interface powered by Streamlit.
-
Video Playback: Directly view generated videos within the application.
-
Download Option: Download your generated videos for offline use.
-
GPU Accelerated: Leverages CUDA for faster video generation (requires compatible NVIDIA GPU).
Before you begin, ensure you have the following installed:
-
Python 3.8+: Recommended version.
-
pip: Python package installer.
-
NVIDIA GPU (CUDA Compatible): Highly recommended for performance. The model runs on
torch.float16
and is moved to"cuda"
, so a GPU is essential for practical use. Without a GPU, generation will be extremely slow or may not work due to memory constraints.
Follow these steps to set up and run the application locally:
-
Clone the Repository (or save the code):
If your code is part of a larger project, navigate to your project directory. If it's a standalone file, save it as app.py.
-
Create a Virtual Environment (Recommended):
It's good practice to use a virtual environment to manage dependencies.
python -m venv venv venv\Scripts\activate # On Windows
-
Install Dependencies:
Install the required libraries using pip. The torch installation specifically points to a CUDA-enabled version. Adjust cu118 to match your CUDA version if different (e.g., cu121).
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118 pip install diffusers accelerate streamlit
- Note on
torch
: If you do not have a CUDA-compatible GPU or encounter installation issues, you might need to install a CPU-only version of PyTorch, but expect significantly slower performance. Refer to the PyTorch website for specific installation commands based on your system.
- Note on
-
Run the Streamlit Application:
Navigate to the directory containing your main.py file in your terminal and execute:
streamlit run main.py
-
Access the App:
Your web browser will automatically open to the Streamlit application (usually at http://localhost:8501).
-
Generate Videos:
-
Enter your desired text prompt in the input box.
-
Click the "Generate Video" button.
-
The app will display a "Generating video..." message. This process can take several minutes depending on your GPU, internet speed (for model download), and the prompt complexity.
-
Once generated, the video will appear on the page, and a "Download Video" button will be available.
-
-
Performance: Video generation is computationally intensive. A powerful NVIDIA GPU with sufficient VRAM (e.g., 8GB+) is strongly recommended for reasonable generation times.
-
First Run: The first time you run the application, the model will be downloaded, which can take a while depending on your internet connection. Subsequent runs will use the cached model.
-
Video Length: The current setup generates a fixed number of frames per iteration (
num_iterations = 4
). You can experiment with this value in the code, but keep in mind that more frames will increase generation time and VRAM usage.