Added support for video via Gemini and LiteLM #9117

raveeshbhalla · 2025-12-08T01:19:38Z

Goal

Add first-class video inputs to DSPy so users can run any analysis with Gemini (currently the only LLM with video support in LiteLLM).

Approach

Introduce a dspy.Video custom type that follows DSPy’s existing media-type pattern (Image/Audio), so signatures and adapters can serialize video content into OpenAI-style blocks that LiteLLM maps to Gemini’s Files API.
Support multiple sources in one abstraction: local files (<20MB inline via data URI), remote HTTP(S) URLs, YouTube URLs (Gemini-native), raw bytes, and pre-uploaded Gemini file_ids.
Provide upload helpers that call litellm.create_file with custom_llm_provider="gemini" and GEMINI_API_KEY, handling MIME detection and a 20MB inline guard.
Update docs (API + tutorial) to describe usage, limits, and Gemini model fit; wire navigation and API doc generation to include Video.
Add tests to cover constructors, formatting, signature integration (single/list/optional/mixed media), and save/load flows.

Files Touched (why)

Core
- dspy/adapters/types/video.py: New Video type, MIME helpers, inline size guard, upload methods (path and in-memory), formatting to {"type": "file", "file": ...} blocks, YouTube handling, repr/str.
- dspy/__init__.py, dspy/adapters/__init__.py, dspy/adapters/types/__init__.py: Export Video publicly.
Docs
- docs/docs/api/primitives/Video.md: API reference for dspy.Video.
- docs/docs/tutorials/video/index.md: Gemini-focused tutorial and recipes.
- docs/docs/learn/programming/signatures.md: Example showing dspy.Video; update special-type list.
- docs/mkdocs.yml: Add tutorial and API nav entries.
- docs/scripts/generate_api_docs.py: Include dspy.Video in generated API.
- Minor API doc member additions for Audio/Embeddings pages to surface relevant members.
Tests
- tests/adapters/test_video.py: Unit tests for helpers, constructors, formatting, immutability, encode helper.
- tests/signatures/test_adapter_video.py: Signature integration (single/list/optional), mixed media with Image, save/load, ChainOfThought.

Test Runs

pytest tests/adapters/test_video.py tests/signatures/test_adapter_video.py
- Result: PASS (84 tests). Only warning: pytest-asyncio default loop scope deprecation.

Reviewer Validation Guide

Re-run tests: pytest tests/adapters/test_video.py tests/signatures/test_adapter_video.py.

Smoke-test Gemini integration (needs GEMINI_API_KEY and LiteLLM with Files API):

export GEMINI_API_KEY=your-key
python - <<'PY'
import dspy
dspy.configure(lm=dspy.LM("gemini/gemini-2.0-flash"))
result = dspy.Predict("video: dspy.Video -> summary: str")(
    video=dspy.Video.from_url("https://storage.googleapis.com/generativeai-downloads/data/quickdraw.mp4")
)
print(result.summary)
PY

Expect a file content block in the request and a text summary back.

Large-file path: dspy.Video.upload_from_path("<path>", mime_type="video/mp4") should return a Video with file_id; run a simple predict using that instance.
Docs spot-check (optional): mkdocs serve and confirm Video tutorial and API pages appear in nav.

Rationale for the `dspy.Video` Approach

Aligns with DSPy’s existing custom-type pipeline so video can flow through signatures, adapters, and content splitting without bespoke code.
LiteLLM/Gemini currently require the Files API; encapsulating upload/formatting in one type avoids users juggling file IDs, MIME types, and inline limits.
Immutable Pydantic model ensures constructed videos don’t mutate mid-call, reducing adapter surprises.

Perspective and Potential Follow-ups

Validate LiteLLM payload shape: format() uses {"type": "file", "file": {"file_data": <url|data>, ...}} and sets format: "video/mp4" for YouTube. Confirm against live Gemini whether file_uri vs file_data is preferred and whether to omit format for YouTube to avoid implying mp4.
LiteLLM Files API is in beta for create_file; see their docs: https://docs.litellm.ai/docs/providers/google_ai_studio/files. Cross-check with Google’s Gemini video understanding guidance: https://ai.google.dev/gemini-api/docs/video-understanding#upload-video.
Remote upload safety: upload() will download arbitrary URLs with requests.get; consider requiring local path/data URI for upload or making remote fetch opt-in with a size check.
Native LM adaptation: Could override adapt_to_native_lm_feature to flag Gemini video capability if we want more automatic behavior.
Large uploads: upload_from_path reads whole files; streaming/chunking would be safer for very large videos.
Dependency clarity: If requests is absent, remote-upload path will fail; document or guard accordingly.

raveeshbhalla · 2025-12-08T01:24:01Z

Here's a example notebook with the video files: https://gist.github.com/raveeshbhalla/e64d2bf401f75b01a7af2591b53fface

raveeshbhalla added 2 commits December 7, 2025 16:14

Tested and working for YT and uploaded videos

ed6806b

Fixed YouTube mimeType error

ef19120

raveeshbhalla marked this pull request as ready for review December 8, 2025 16:06

added google cloud storage support

8e3bace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for video via Gemini and LiteLM #9117

Added support for video via Gemini and LiteLM #9117

Uh oh!

raveeshbhalla commented Dec 8, 2025

Uh oh!

raveeshbhalla commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Added support for video via Gemini and LiteLM #9117

Are you sure you want to change the base?

Added support for video via Gemini and LiteLM #9117

Uh oh!

Conversation

raveeshbhalla commented Dec 8, 2025

Goal

Approach

Files Touched (why)

Test Runs

Reviewer Validation Guide

Rationale for the dspy.Video Approach

Perspective and Potential Follow-ups

Uh oh!

raveeshbhalla commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Rationale for the `dspy.Video` Approach