Skip to content

diffusionstudio/agent

Repository files navigation

Library Banner

Video Composer Agent

discord Static Badge Static Badge


Setup

pip install uv
uv sync

or alternatively:

uv add -r requirements.txt

Environment Variables

You will need to use the environment variables defined in .env.example to run Video Composer Agent. It's recommended you use Vercel Environment Variables for this, but a .env file is all that is necessary.

Note: You should not commit your .env file or it will expose secrets that will allow others to control access to your various OpenAI and authentication provider accounts.

Run Agent

To run the main script:

uv run main.py

Feel free to modify the main.py script to add new tools and modify the agent's behavior.

Demo

agent_launch.mp4

Documentation Search

The documentation search system provides semantic search capabilities for Diffusion Studio's documentation:

Usage

from src.tools.docs_search import DocsSearchTool

# Initialize search tool
docs_search = DocsSearchTool()

# Basic search
results = docs_search.forward(query="how to add text overlay")

# With reranking for more accurate results
results = docs_search.forward(query="how to add text overlay", rerank_results=True)

# Limit number of results
results = docs_search.forward(query="how to add text overlay", limit=10)

# With filters
results = docs_search.forward(
    query="video transitions",
    filter_conditions={"section": "video-effects"}
)

The search tool:

  • Uses vector embeddings for fast semantic search
  • Supports optional semantic reranking for higher accuracy
  • Allows filtering by documentation sections
  • Auto-embeds documentation from configured URL
  • Maintains embedding cache with hash checking

Development

See CONTRIBUTING.md for development setup and guidelines.

ToDos PRs Welcome

  • Make python agent fully async
  • Add TS implementation of agent
  • Stream the console logs of browser back to the agent
  • Add support for feedback for more modalities like audio
    • Speech to text to remove certain centences
    • Waveform analysis to sync audio to video
    • Moderation analysis to remove certain phrases
  • Add MCP integration

    MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.

  • Add BM25 to DocsSearchTool to enable hybrid search
  • Add support for video understanding models like VideoLLaMA

About

The agentic video editing framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages