Skip to content

Conversation

@adampiispanen
Copy link
Contributor

Add Google Gemini MCP Server

Google's multimodal AI with 2M token context window supporting text, images, video, audio, and PDF analysis.

Features

  • Text Generation - Gemini 1.5 Pro, Flash, and 2.0 experimental models
  • Multi-Turn Chat - Conversational AI with context retention
  • Vision Analysis - Image understanding and description
  • Video Analysis - Frame-by-frame video content analysis
  • PDF Processing - Extract and analyze PDF documents
  • Function Calling - Tool use and structured outputs
  • Text Embeddings - text-embedding-004 for semantic search
  • Streaming - Real-time response streaming
  • Token Counting - Estimate costs before generation
  • Batch Generation - Parallel processing for efficiency
  • JSON Mode - Structured output generation

Configuration

  • Authentication: Simple API key (no OAuth)
  • Transport: streamable-http
  • Resources: 512Mi memory, 500m CPU
  • Free Tier: 15 RPM, 1M tokens/minute, 1500 requests/day

Use Cases

  • Multimodal AI applications
  • Document analysis and extraction
  • Video content understanding
  • Image captioning and analysis
  • Long-context document processing (2M tokens!)
  • Function calling for agentic workflows
  • Semantic search with embeddings

Models

  • gemini-1.5-pro - Highest quality, 2M context
  • gemini-1.5-flash - Fast and efficient
  • gemini-2.0-flash-exp - Latest experimental features
  • text-embedding-004 - Text embeddings

Key Advantages

  • 2 million token context window (largest available)
  • True multimodal support (text, image, video, audio, PDF)
  • Built-in function calling
  • Competitive with GPT-4 and Claude

Validation

✅ All validations pass: npm run validate-servers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants