pip install uv
uv sync
or alternatively:
uv add -r requirements.txt
You will need to use the environment variables defined in .env.example
to run Video Composer Agent. It's recommended you use Vercel Environment Variables for this, but a .env
file is all that is necessary.
Note: You should not commit your .env
file or it will expose secrets that will allow others to control access to your various OpenAI and authentication provider accounts.
To run the main script:
uv run main.py
Feel free to modify the main.py
script to add new tools and modify the agent's behavior.
agent_launch.mp4
The documentation search system provides semantic search capabilities for Diffusion Studio's documentation:
from src.tools.docs_search import DocsSearchTool
# Initialize search tool
docs_search = DocsSearchTool()
# Basic search
results = docs_search.forward(query="how to add text overlay")
# With reranking for more accurate results
results = docs_search.forward(query="how to add text overlay", rerank_results=True)
# Limit number of results
results = docs_search.forward(query="how to add text overlay", limit=10)
# With filters
results = docs_search.forward(
query="video transitions",
filter_conditions={"section": "video-effects"}
)
The search tool:
- Uses vector embeddings for fast semantic search
- Supports optional semantic reranking for higher accuracy
- Allows filtering by documentation sections
- Auto-embeds documentation from configured URL
- Maintains embedding cache with hash checking
See CONTRIBUTING.md for development setup and guidelines.
- Make python agent fully async
- Add TS implementation of agent
- Stream the console logs of browser back to the agent
- Add support for feedback for more modalities like audio
- Speech to text to remove certain centences
- Waveform analysis to sync audio to video
- Moderation analysis to remove certain phrases
- Add MCP integration
MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.
- Add BM25 to
DocsSearchTool
to enable hybrid search - Add support for video understanding models like VideoLLaMA