Skip to content

OpenWebUI Function to intercept images, send them to a vision capable model, and forward description of images to text only model

License

Notifications You must be signed in to change notification settings

feliscat/local-vision-bridge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Local Vision Bridge (OpenWebUI Function)

Give your text-only LLMs the ability to "see" using a secondary local Vision model.

This OpenWebUI Function intercepts image uploads, sends them to a local Vision-Language Model (like Qwen2.5-VL running on llama.cpp), and seamlessly injects detailed text descriptions into the chat context.

This allows you to use massive, high-intelligence text-only models (like large MOE) while still enjoying multi-modal capabilities via a smaller, faster dedicated vision model.

Features

  • Zero-Latency Caching: Hashes images so you only pay the "GPU tax" once. Subsequent turns in the chat are instant.
  • History Aware: Scans the full conversation context to ensure the model doesn't "forget" images in multi-turn chats.
  • Model Agnostic: Works with any text-only model in OpenWebUI.
  • Universal Compatibility: Handles both modern OpenAI-format image uploads and legacy/Ollama formats.
  • System Framing: Injects descriptions as "System Tool Output" so the model knows it is seeing the image, rather than thinking the user typed the description.

About

OpenWebUI Function to intercept images, send them to a vision capable model, and forward description of images to text only model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages