Skip to content

Releases: dannwaneri/vectorize-mcp-worker

V3: Multimodal Search with Vision

20 Jan 03:05

Choose a tag to compare

🎉 V3: Multimodal Search is Here!

Your RAG system can now "see" - upload images, search by visual content, and extract text automatically.

✨ What's New

Multimodal Features:

  • 📸 Image ingestion with Llama 4 Scout vision
  • 🔍 OCR text extraction (1,000+ characters)
  • 🖼️ Reverse image search
  • 📊 Search screenshots, receipts, diagrams

Performance Optimizations:

  • ⚡ 60-second cache (0ms cached searches)
  • 🚀 Batch embeddings (3x faster ingestion)
  • 📄 Pagination support

Real-World Tested:

  • ✅ Financial receipts (Access Bank: 1,043 chars extracted)
  • ✅ Dashboard screenshots (semantic + OCR matching)
  • ✅ Technical diagrams (architecture patterns)

📊 Performance

  • Image ingestion: ~7.9s (vision + OCR + embedding)
  • First search: ~900ms
  • Cached search: 0ms
  • Cost: Still $5/month

🚀 Try It

Live Demo: vectorize-mcp-worker.fpl-test.workers.dev/dashboard

Deploy:

git clone https://github.com/dannwaneri/vectorize-mcp-worker.git
cd vectorize-mcp-worker
npm install
wrangler deploy

📖 Read More

Full article: https://medium.com/@danielnwaneri41/i-added-image-search-to-my-5-ai-system-openai-charges-100-f9d51549875f

🙏 Credits

Built with:

  • Cloudflare Workers AI
  • Llama 4 Scout (Meta)
  • BGE embeddings (BAAI)
  • Vectorize + D1

⭐ Star the repo if this helps your project!