Skip to content

Conversation

@raveeshbhalla
Copy link

Goal

  • Add first-class video inputs to DSPy so users can run any analysis with Gemini (currently the only LLM with video support in LiteLLM).

Approach

  • Introduce a dspy.Video custom type that follows DSPy’s existing media-type pattern (Image/Audio), so signatures and adapters can serialize video content into OpenAI-style blocks that LiteLLM maps to Gemini’s Files API.
  • Support multiple sources in one abstraction: local files (<20MB inline via data URI), remote HTTP(S) URLs, YouTube URLs (Gemini-native), raw bytes, and pre-uploaded Gemini file_ids.
  • Provide upload helpers that call litellm.create_file with custom_llm_provider="gemini" and GEMINI_API_KEY, handling MIME detection and a 20MB inline guard.
  • Update docs (API + tutorial) to describe usage, limits, and Gemini model fit; wire navigation and API doc generation to include Video.
  • Add tests to cover constructors, formatting, signature integration (single/list/optional/mixed media), and save/load flows.

Files Touched (why)

  • Core
    • dspy/adapters/types/video.py: New Video type, MIME helpers, inline size guard, upload methods (path and in-memory), formatting to {"type": "file", "file": ...} blocks, YouTube handling, repr/str.
    • dspy/__init__.py, dspy/adapters/__init__.py, dspy/adapters/types/__init__.py: Export Video publicly.
  • Docs
    • docs/docs/api/primitives/Video.md: API reference for dspy.Video.
    • docs/docs/tutorials/video/index.md: Gemini-focused tutorial and recipes.
    • docs/docs/learn/programming/signatures.md: Example showing dspy.Video; update special-type list.
    • docs/mkdocs.yml: Add tutorial and API nav entries.
    • docs/scripts/generate_api_docs.py: Include dspy.Video in generated API.
    • Minor API doc member additions for Audio/Embeddings pages to surface relevant members.
  • Tests
    • tests/adapters/test_video.py: Unit tests for helpers, constructors, formatting, immutability, encode helper.
    • tests/signatures/test_adapter_video.py: Signature integration (single/list/optional), mixed media with Image, save/load, ChainOfThought.

Test Runs

  • pytest tests/adapters/test_video.py tests/signatures/test_adapter_video.py
    • Result: PASS (84 tests). Only warning: pytest-asyncio default loop scope deprecation.

Reviewer Validation Guide

  1. Re-run tests: pytest tests/adapters/test_video.py tests/signatures/test_adapter_video.py.
  2. Smoke-test Gemini integration (needs GEMINI_API_KEY and LiteLLM with Files API):
    export GEMINI_API_KEY=your-key
    python - <<'PY'
    import dspy
    dspy.configure(lm=dspy.LM("gemini/gemini-2.0-flash"))
    result = dspy.Predict("video: dspy.Video -> summary: str")(
        video=dspy.Video.from_url("https://storage.googleapis.com/generativeai-downloads/data/quickdraw.mp4")
    )
    print(result.summary)
    PY
    Expect a file content block in the request and a text summary back.
  3. Large-file path: dspy.Video.upload_from_path("<path>", mime_type="video/mp4") should return a Video with file_id; run a simple predict using that instance.
  4. Docs spot-check (optional): mkdocs serve and confirm Video tutorial and API pages appear in nav.

Rationale for the dspy.Video Approach

  • Aligns with DSPy’s existing custom-type pipeline so video can flow through signatures, adapters, and content splitting without bespoke code.
  • LiteLLM/Gemini currently require the Files API; encapsulating upload/formatting in one type avoids users juggling file IDs, MIME types, and inline limits.
  • Immutable Pydantic model ensures constructed videos don’t mutate mid-call, reducing adapter surprises.

Perspective and Potential Follow-ups

  • Validate LiteLLM payload shape: format() uses {"type": "file", "file": {"file_data": <url|data>, ...}} and sets format: "video/mp4" for YouTube. Confirm against live Gemini whether file_uri vs file_data is preferred and whether to omit format for YouTube to avoid implying mp4.
  • LiteLLM Files API is in beta for create_file; see their docs: https://docs.litellm.ai/docs/providers/google_ai_studio/files. Cross-check with Google’s Gemini video understanding guidance: https://ai.google.dev/gemini-api/docs/video-understanding#upload-video.
  • Remote upload safety: upload() will download arbitrary URLs with requests.get; consider requiring local path/data URI for upload or making remote fetch opt-in with a size check.
  • Native LM adaptation: Could override adapt_to_native_lm_feature to flag Gemini video capability if we want more automatic behavior.
  • Large uploads: upload_from_path reads whole files; streaming/chunking would be safer for very large videos.
  • Dependency clarity: If requests is absent, remote-upload path will fail; document or guard accordingly.

@raveeshbhalla
Copy link
Author

Here's a example notebook with the video files: https://gist.github.com/raveeshbhalla/e64d2bf401f75b01a7af2591b53fface

@raveeshbhalla raveeshbhalla marked this pull request as ready for review December 8, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant