Video Composer Agent

Setup

pip install uv

uv sync

or alternatively:

uv add -r requirements.txt

Environment Variables

You will need to use the environment variables defined in .env.example to run Video Composer Agent. It's recommended you use Vercel Environment Variables for this, but a .env file is all that is necessary.

Note: You should not commit your .env file or it will expose secrets that will allow others to control access to your various OpenAI and authentication provider accounts.

Run Agent

To run the main script:

uv run main.py

Feel free to modify the main.py script to add new tools and modify the agent's behavior.

Demo

agent_launch.mp4

Documentation Search

The documentation search system provides semantic search capabilities for Diffusion Studio's documentation:

Usage

from src.tools.docs_search import DocsSearchTool

# Initialize search tool
docs_search = DocsSearchTool()

# Basic search
results = docs_search.forward(query="how to add text overlay")

# With reranking for more accurate results
results = docs_search.forward(query="how to add text overlay", rerank_results=True)

# Limit number of results
results = docs_search.forward(query="how to add text overlay", limit=10)

# With filters
results = docs_search.forward(
    query="video transitions",
    filter_conditions={"section": "video-effects"}
)

The search tool:

Uses vector embeddings for fast semantic search
Supports optional semantic reranking for higher accuracy
Allows filtering by documentation sections
Auto-embeds documentation from configured URL
Maintains embedding cache with hash checking

Development

See CONTRIBUTING.md for development setup and guidelines.

ToDos PRs Welcome

Make python agent fully async
Add TS implementation of agent
Stream the console logs of browser back to the agent
Add support for feedback for more modalities like audio
- Speech to text to remove certain centences
- Waveform analysis to sync audio to video
- Moderation analysis to remove certain phrases
Add MCP integration

MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.
Add BM25 to DocsSearchTool to enable hybrid search
Add support for video understanding models like VideoLLaMA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!