Releases · dannwaneri/vectorize-mcp-worker · GitHub

20 Jan 03:05

dannwaneri

V3: Multimodal Search with Vision Latest

Latest

🎉 V3: Multimodal Search is Here!

Your RAG system can now "see" - upload images, search by visual content, and extract text automatically.

✨ What's New

Multimodal Features:

📸 Image ingestion with Llama 4 Scout vision
🔍 OCR text extraction (1,000+ characters)
🖼️ Reverse image search
📊 Search screenshots, receipts, diagrams

Performance Optimizations:

⚡ 60-second cache (0ms cached searches)
🚀 Batch embeddings (3x faster ingestion)
📄 Pagination support

Real-World Tested:

✅ Financial receipts (Access Bank: 1,043 chars extracted)
✅ Dashboard screenshots (semantic + OCR matching)
✅ Technical diagrams (architecture patterns)

📊 Performance

Image ingestion: ~7.9s (vision + OCR + embedding)
First search: ~900ms
Cached search: 0ms ✨
Cost: Still $5/month

🚀 Try It

Live Demo: vectorize-mcp-worker.fpl-test.workers.dev/dashboard

Deploy:

git clone https://github.com/dannwaneri/vectorize-mcp-worker.git
cd vectorize-mcp-worker
npm install
wrangler deploy

📖 Read More

Full article: https://medium.com/@danielnwaneri41/i-added-image-search-to-my-5-ai-system-openai-charges-100-f9d51549875f

🙏 Credits

Built with:

Cloudflare Workers AI
Llama 4 Scout (Meta)
BGE embeddings (BAAI)
Vectorize + D1

⭐ Star the repo if this helps your project!

Assets 2