Open WebUI Tools Collection

🚀 A modular collection of tools, function pipes, and filters to supercharge your Open WebUI experience.

Transform your Open WebUI instance into a powerful AI workstation with this comprehensive toolkit. From academic research and image generation to music creation and autonomous agents, this collection provides everything you need to extend your AI capabilities.

✨ What's Inside

This repository contains 20+ specialized tools and functions designed to enhance your Open WebUI experience:

🛠️ Tools

arXiv Search - Academic paper discovery (no API key required!)
Perplexica Search - Web search using Perplexica API with citations
Pexels Media Search - High-quality photos and videos from Pexels API
YouTube Search & Embed - Search YouTube and play videos in embedded player
Native Image Generator - Direct Open WebUI image generation with Ollama model management
Hugging Face Image Generator - AI-powered image creation
ComfyUI Image-to-Image (Qwen Edit 2509) - Advanced image editing with multi-image support
ComfyUI ACE Step Audio - Advanced music generation
ComfyUI Text-to-Video - Generate short videos from text using ComfyUI (default WAN 2.2 workflow)
Flux Kontext ComfyUI - Professional image editing

🔄 Function Pipes

Planner Agent v2 - Advanced autonomous agent with specialized models, interactive guidance, and comprehensive execution management
arXiv Research MCTS - Advanced research with Monte Carlo Tree Search
Multi Model Conversations - Multi-agent discussions
Resume Analyzer - Professional resume analysis
Mopidy Music Controller - Music server management
Letta Agent - Autonomous agent integration
Google Veo Text-to-Video & Image-to-Video - Generate videos from text or a single image using Google Veo (only one image supported as input)

🔧 Filters

Prompt Enhancer - Automatic prompt improvement
Semantic Router - Intelligent model selection
Full Document - File processing capabilities
Clean Thinking Tags - Conversation cleanup
OpenRouter WebSearch Citations - Enable web search for OpenRouter models with citation handling

🚀 Quick Start

Option 1: Open WebUI Hub (Recommended)

Visit https://openwebui.com/u/haervwe
Browse the collection and click "Get" for desired tools
Follow the installation prompts in your Open WebUI instance

Option 2: Manual Installation

Copy .py files from tools/, functions/, or filters/ directories
Navigate to Open WebUI Workspace > Tools/Functions/Filters
Paste the code, provide a name and description, then save

🎯 Key Features

🔌 Plug-and-Play: Most tools work out of the box with minimal configuration
🎨 Visual Integration: Seamless integration with ComfyUI workflows
🤖 AI-Powered: Advanced features like MCTS research and autonomous planning
📚 Academic Focus: arXiv integration for research and academic work
🎵 Creative Tools: Music generation and image editing capabilities
🔍 Smart Routing: Intelligent model selection and conversation management
📄 Document Processing: Full document analysis and resume processing

📋 Prerequisites

Open WebUI: Version 0.6.0+ recommended
Python: 3.8 or higher
Optional Dependencies:
- ComfyUI (for image/music generation tools)
- Mopidy (for music controller)
- Various API keys (Hugging Face, Tavily, etc.)

🔧 Configuration

Most tools are designed to work with minimal configuration. Key configuration areas:

API Keys: Required for some tools (Hugging Face, Tavily, etc.)
ComfyUI Integration: For image and music generation tools
Model Selection: Choose appropriate models for your use case
Filter Setup: Enable filters in your model configuration

📖 Detailed Documentation

arXiv Search Tool
Perplexica Search Tool
Pexels Media Search Tool
YouTube Search & Embed Tool
Native Image Generator
Hugging Face Image Generator
Cloudflare Workers AI Image Generator
SearxNG Image Search Tool
ComfyUI Image-to-Image Tool (Qwen Image Edit 2509)
ComfyUI ACE Step Audio Tool
ComfyUI Text-to-Video Tool
Flux Kontext ComfyUI Pipe
Google Veo Text-to-Video & Image-to-Video Pipe
Planner Agent v2
arXiv Research MCTS Pipe
Multi Model Conversations Pipe
Resume Analyzer Pipe
Mopidy Music Controller
Letta Agent Pipe
OpenRouter Image Pipe
OpenRouter WebSearch Citations Filter
Prompt Enhancer Filter
Semantic Router Filter
Full Document Filter
Clean Thinking Tags Filter
Using the Provided ComfyUI Workflows
Installation
Contributing
License
Credits
Support

🧪 Tools

arXiv Search Tool

Description

Search arXiv.org for relevant academic papers on any topic. No API key required!

Configuration

No configuration required. Works out of the box.

Usage

Example:

Search for recent papers about "tree of thought"

Returns up to 5 most relevant papers, sorted by most recent.

Example arXiv search result in Open WebUI

Perplexica Search Tool

Description

Search the web for factual information, current events, or specific topics using the Perplexica API. This tool provides comprehensive search results with citations and sources, making it ideal for research and information gathering. Perplexica is an open-source AI-powered search engine and alternative to Perplexity AI that must be self-hosted locally. It uses advanced language models to provide accurate, contextual answers with proper source attribution.

Configuration

BASE_URL (str): Base URL for the Perplexica API (default: http://host.docker.internal:3001)
OPTIMIZATION_MODE (str): Search optimization mode - "speed" or "balanced" (default: balanced)
CHAT_MODEL (str): Default chat model for search processing (default: llama3.1:latest)
EMBEDDING_MODEL (str): Default embedding model for search (default: bge-m3:latest)
OLLAMA_BASE_URL (str): Base URL for Ollama API (default: http://host.docker.internal:11434)

Prerequisites: You must have Perplexica installed and running locally at the configured URL. Perplexica is a self-hosted open-source search engine that requires Ollama with the specified chat and embedding models. Follow the installation instructions in the Perplexica repository to set up your local instance.

Usage

Example:

Search for "latest developments in AI safety research 2024"

Returns comprehensive search results with proper citations
Automatically emits citations for source tracking in Open WebUI
Provides both summary and individual source links

Features

Web Search Integration: Direct access to current web information
Citation Support: Automatic citation generation for Open WebUI
Model Flexibility: Configurable chat and embedding models
Real-time Status: Progress updates during search execution
Source Tracking: Individual source citations with metadata

Pexels Media Search Tool

Description

Search and retrieve high-quality photos and videos from the Pexels API. This tool provides access to Pexels' extensive collection of free stock photos and videos, with comprehensive search capabilities, automatic citation generation, and direct image display in chat. Perfect for finding professional-quality media for presentations, content creation, or creative projects.

Configuration

PEXELS_API_KEY (str): Free Pexels API key from https://www.pexels.com/api/ (required)
DEFAULT_PER_PAGE (int): Default number of results per search (default: 5, recommended for LLMs)
MAX_RESULTS_PER_PAGE (int): Maximum allowed results per page (default: 15, prevents overwhelming LLMs)
DEFAULT_ORIENTATION (str): Default photo orientation - "all", "landscape", "portrait", or "square" (default: "all")
DEFAULT_SIZE (str): Default minimum photo size - "all", "large" (24MP), "medium" (12MP), or "small" (4MP) (default: "all")

Prerequisites: Get a free API key from Pexels API and configure it in the tool's Valves settings.

Usage

Photo Search Example:

Search for photos of "modern office workspace"

Video Search Example:

Search for videos of "ocean waves at sunset"

Curated Photos Example:
```
Get curated photos from Pexels
```

Features

Three Search Functions: search_photos, search_videos, and get_curated_photos
Direct Image Display: Images are automatically formatted with markdown for immediate display in chat
Advanced Filtering: Filter by orientation, size, color, and quality
Attribution Support: Automatic citation generation with photographer credits
Rate Limit Handling: Built-in error handling for API limits and invalid keys
LLM Optimized: Results are limited and formatted to prevent overwhelming language models
Real-time Status: Progress updates during search execution

YouTube Search & Embed Tool

Description

Search YouTube for videos and display them in a beautiful embedded player directly in your Open WebUI chat. This tool provides comprehensive YouTube search capabilities with automatic citation generation, detailed video information, and a custom-styled embedded player. Perfect for finding tutorials, music videos, educational content, or any video content you need.

Configuration

YOUTUBE_API_KEY (str): YouTube Data API v3 key from https://console.cloud.google.com/apis/credentials (required)
MAX_RESULTS (int): Maximum number of search results to return (default: 5, range: 1-10)
SHOW_EMBEDDED_PLAYER (bool): Show embedded YouTube player for the first result (default: True)
REGION_CODE (str): Region code for search results, e.g., "US", "GB", "JP" (default: "US")
SAFE_SEARCH (str): Safe search filter - "none", "moderate", or "strict" (default: "moderate")

Prerequisites: Get a free YouTube Data API v3 key from Google Cloud Console and enable the YouTube Data API v3 in your project.

Usage

Search for Videos:

Search YouTube for "python tutorial for beginners"

Play Specific Video:
```
Play YouTube video dQw4w9WgXcQ
```

Search with Custom Results:

Search YouTube for "cooking recipes" with 10 results

Features

Two Main Functions: search_youtube for searching and play_video for playing specific video IDs
Embedded Player: Beautiful custom-styled YouTube player embedded directly in chat with responsive design
Safe Search: Built-in content filtering options
Region Support: Localized search results based on region code
Direct Links: Provides YouTube links and "Watch on YouTube" buttons
Rate Limit Handling: Proper error handling for API quota limits
Real-time Status: Progress updates during search and loading

Getting Started

Get a YouTube API Key:
- Visit Google Cloud Console
- Create a new project or select an existing one
- Enable the "YouTube Data API v3"
- Create credentials (API Key)
- Copy the API key
Configure the Tool:
- Open the tool's Valves settings in Open WebUI
- Paste your API key into the YOUTUBE_API_KEY field
- Adjust other settings as desired (region, max results, etc.)
Start Searching:
- Use natural language: "Search YouTube for [topic]"
- Or use the function directly: search_youtube("topic")

Example of YouTube video embedded in Open WebUI chat

Native Image Generator

Description

Generate images using Open WebUI's native image generation middleware configured in admin settings. This tool leverages whatever image generation backend you have configured (such as AUTOMATIC1111, ComfyUI, or OpenAI DALL-E) through Open WebUI's built-in image generation system, with optional Ollama model management to free up VRAM when needed.

Configuration

unload_ollama_models (bool): Whether to unload all Ollama models from VRAM before generating images (default: False)
ollama_url (str): Ollama API URL for model management (default: http://host.docker.internal:11434)
emit_embeds (bool): Whether to emit HTML image embeds via the embeds event so generated images are displayed inline in the chat (default: True). When False, the tool will skip emitting embeds and only return bare download URLs. If emit_embeds is True but no event emitter is available, images cannot be displayed inline and only the URLs will be returned.

Prerequisites: You must have image generation configured in Open WebUI's admin settings under Settings > Images. This tool works with any image generation backend you have set up (AUTOMATIC1111, ComfyUI, OpenAI, etc.).

Usage

Example:

Generate an image of "a serene mountain landscape at sunset"

Uses whatever image generation backend is configured in Open WebUI admin settings
Automatically manages model resources if Ollama unloading is enabled
Returns markdown-formatted image links for immediate display

Features

Native Integration: Uses Open WebUI's native image generation middleware without external dependencies
Backend Agnostic: Works with any image generation backend configured in admin settings (AUTOMATIC1111, ComfyUI, OpenAI, etc.)
Memory Management: Optional Ollama model unloading to optimize VRAM usage
Flexible Model Support: You can prompt de agent to change the image generation model, providing the name is given to it.
Real-time Status: Provides generation progress updates via event emitter
Error Handling: Comprehensive error reporting and recovery

Hugging Face Image Generator

Description

Generate high-quality images from text descriptions using Hugging Face's Stable Diffusion models.

Configuration

API Key (Required): Obtain a Hugging Face API key from your HuggingFace account and set it in the tool's configuration in Open WebUI.
API URL (Optional): Uses Stability AI's SD 3.5 Turbo model as default. Can be customized to use other HF text-to-image model endpoints.

Usage

Example:

Create an image of "beautiful horse running free"

Multiple image format options: Square, Landscape, Portrait, etc.

Example image generated with Hugging Face tool

Cloudflare Workers AI Image Generator

Description

Generate images using Cloudflare Workers AI text-to-image models, including FLUX, Stable Diffusion XL, SDXL Lightning, and DreamShaper LCM. This tool provides model-specific prompt preprocessing, parameter optimization, and direct image display in chat. It supports fast and high-quality image generation with minimal configuration.

Configuration

cloudflare_api_token (str): Your Cloudflare API Token (required)
cloudflare_account_id (str): Your Cloudflare Account ID (required)
default_model (str): Default model to use (e.g., @cf/black-forest-labs/flux-1-schnell)

Prerequisites: Obtain a Cloudflare API Token and Account ID from your Cloudflare dashboard. No additional dependencies beyond requests.

Usage

Example:

# Generate an image with a prompt
await tools.generate_image(prompt="A futuristic cityscape at sunset, vibrant colors")

Returns a markdown-formatted image link for immediate display in chat.

Features

Multiple Models: Supports FLUX, SDXL, SDXL Lightning, DreamShaper LCM
Prompt Optimization: Automatic prompt enhancement for best results per model
Parameter Handling: Smart handling of steps, guidance, negative prompts, and size
Direct Image Display: Returns markdown image links for chat
Error Handling: Comprehensive error and status reporting
Real-time Status: Progress updates via event emitter

SearxNG Image Search Tool

Description

Search and retrieve images from the web using a self-hosted SearxNG instance. This tool provides privacy-respecting, multi-engine image search with direct image display in chat. Ideal for finding diverse images from multiple sources without tracking or ads.

Configuration

SEARXNG_ENGINE_API_BASE_URL (str): The base URL for the SearxNG search engine API (default: http://searxng:4000/search)
MAX_RESULTS (int): Maximum number of images to return per search (default: 5)

Prerequisites: You must have a running SearxNG instance. See SearxNG documentation for setup instructions.

Usage

Example:

# Search for images of cats
await tools.search_images(query="cats", max_results=3)

Returns a list of markdown-formatted image links for immediate display in chat.

Features

Privacy-Respecting: No tracking, ads, or profiling
Multi-Engine: Aggregates results from multiple search engines
Direct Image Display: Images are formatted for chat display
Customizable: Choose engines, result count, and more
Error Handling: Handles connection and search errors gracefully

ComfyUI Image-to-Image Tool (Qwen Image Edit 2509)

Description

Edit and transform images using ComfyUI workflows with AI-powered image editing. Features the Qwen Image Edit 2509 model as default, supporting up to 3 images for advanced editing with context, style transfer, and multi-image blending. Also includes Flux Kontext workflow for artistic transformations. Images are automatically extracted from message attachments and rendered as beautiful HTML embeds.

Configuration

comfyui_api_url (str): ComfyUI HTTP API endpoint (default: http://localhost:8188)
workflow_type (str): Choose your workflow—"Flux_Kontext", "QWen_Edit", or "Custom" (default: QWen_Edit)
custom_workflow (Dict): Custom ComfyUI workflow JSON (only used when workflow_type='Custom')
max_wait_time (int): Maximum wait time in seconds for job completion (default: 600)
unload_ollama_models (bool): Automatically unload Ollama models from VRAM before generating images (default: False)
ollama_api_url (str): Ollama API URL for model management (default: http://localhost:11434)
return_html_embed (bool): Return a beautiful HTML image embed with comparison view (default: True)

Prerequisites: You must have ComfyUI installed and running with the required models and custom nodes:

For Flux Kontext: Flux Dev model, Flux Kontext LoRA, and required ComfyUI nodes
For Qwen Edit 2509: Qwen Image Edit 2509 model, Qwen CLIP, VAE, and ETN_LoadImageBase64 custom node
See the Extras folder for workflow JSON files: flux_context_owui_api_v1.json and image_qwen_image_edit_2509_api_owui.json

Usage

Example:

# Attach image(s) and provide editing instructions
"Remove the background"
"Change car to red"
"Apply lighting from first image to second image"

Features

Qwen Edit 2509 (Default): State-of-the-art image editing with precise control and instruction-following
Multi-Image Support: Qwen Edit workflow accepts 1-3 images for advanced editing with context and style transfer
Dual Workflow Support: Switch to Flux Kontext for artistic transformations and creative reimagining
Automatic Image Handling: Images are extracted from messages and passed to the AI automatically
VRAM Management: Optional Ollama model unloading to free GPU memory before generation
Beautiful HTML Embeds: Displays results with elegant before/after comparison view
OpenWebUI Integration: Automatically uploads generated images to OpenWebUI storage
Flexible Workflows: Use built-in workflows or provide your own custom ComfyUI JSON

Workflow Details

Qwen Edit 2509 (Default):

Supports 1-3 images with multi-image context and style transfer
Lightning-fast 4-step generation
Best for: precise edits, object manipulation, style transfer

Flux Kontext (Alternative):

Single image input (multi-image support planned)
20-step high-quality generation
Best for: artistic transformations, creative reimagining

Custom Workflow:

Bring your own ComfyUI workflow JSON
Full flexibility for advanced users

Getting Started

Set up ComfyUI:
- Install ComfyUI
- Download required models (Flux Dev, Qwen Edit 2509, etc.)
- Install necessary custom nodes (especially ETN_LoadImageBase64 for Qwen workflow)
Import workflows:
- Load Extras/flux_context_owui_api_v1.json or Extras/image_qwen_image_edit_2509_api_owui.json in ComfyUI
- Verify all nodes are recognized (install missing custom nodes if needed)
Configure the tool:
- Set comfyui_api_url to your ComfyUI server address
- Choose your preferred workflow type
- Optionally enable Ollama model unloading if you have limited VRAM
Start editing:
- Attach an image (or up to 3 for multi-image editing) to your message
- Describe your desired transformation in natural language
- Watch the magic happen!

Note for Custom Workflows: If you're using a custom workflow with different capabilities (e.g., single-image only or different prompting requirements), you should modify the edit_image function's docstring in the tool code. The docstring instructs the AI on how to use the tool and what prompting strategies work best. Adjust it to match your workflow's specific capabilities and requirements.

Multi-Image Support Status:

Qwen Edit 2509: Full support for 1-3 images (default workflow)
Flux Kontext: Single image currently; multi-image support planned for future release
Custom workflows: Depends on your workflow implementation

Example of Qwen Image Edit 2509 transforming a cyberpunk dolphin into a natural mountain scene

ComfyUI ACE Step Audio Tool

Description

Generate music using the ACE Step AI model via ComfyUI. This tool lets you create songs from tags and lyrics, with full control over the workflow JSON and node numbers. Features a beautiful, transparent custom audio player with play/pause controls, progress tracking, volume adjustment, and a clean scrollable lyrics display. Designed for advanced music generation and can be customized for different genres and moods.

Configuration

comfyui_api_url (str): ComfyUI API endpoint (e.g., http://localhost:8188)
model_name (str): Model checkpoint to use (default: ACE_STEP/ace_step_v1_3.5b.safetensors)
workflow_json (str): Full ACE Step workflow JSON as a string. Use {tags}, {lyrics}, and {model_name} as placeholders.
tags_node (str): Node number for the tags input (default: "14")
lyrics_node (str): Node number for the lyrics input (default: "14")
model_node (str): Node number for the model checkpoint input (default: "40")
save_local (bool): Copy the generated song to Open WebUI storage backend (default: True)
owui_base_url (str): Your Open WebUI base URL (default: http://localhost:3000)
show_player_embed (bool): Show the embedded audio player. If false, only returns download link (default: True)

Usage

Import the ACE Step workflow:
- In ComfyUI, go to the workflow import section and load extras/ace_step_api.json.
- Adjust nodes as needed for your setup.
Configure the tool in Open WebUI:
- Set the comfyui_api_url to your ComfyUI backend.
- Paste the workflow JSON (from the file or your own) into workflow_json.
- Set the correct node numbers if you modified the workflow.
Generate music:
- Provide a song title, tags, and (optionally) lyrics.
- The tool will return either an embedded audio player or a download link based on your configuration.

Example:

  Generate a song About Ai and Humanity friendship

The sleek, transparent audio player embedded in Open WebUI chat

Features

Custom Audio Player: Beautiful, semi-transparent player with blur effects
Full Playback Controls: Play/pause, seek, volume control with SVG icons
Song Title Display: User-defined song titles prominently shown
Scrollable Lyrics: Clean lyrics display with custom scrollbar (max 120px height)
Transparent UI: Integrates seamlessly with any Open WebUI theme
Toggle Player: Option to show/hide player embed and just return download links
Local Storage: Optionally saves songs to Open WebUI cache for persistence

Returns an embedded audio player with download link or just the link, depending on configuration. Advanced users can fully customize the workflow for different genres, moods, or creative experiments.

ComfyUI Text-to-Video Tool

Description

Generate short videos from text prompts using a ComfyUI workflow that defaults to the WAN 2.2 text-to-video models. This tool wraps the ComfyUI HTTP + WebSocket API, waits for the job to complete, extracts the produced video, and (optionally) uploads it to Open WebUI storage so it can be embedded in chat.

The default workflow file included in this repository is extras/video_wan2_2_14B_t2v.json and the tool implementation lives at tools/text_to_video_comfyui_tool.py.

Configuration

comfyui_api_url (str): ComfyUI HTTP API endpoint (default: http://localhost:8188)
prompt_node_id (str): Node ID in the workflow that receives the text prompt (default: "89")
workflow (json/dict): ComfyUI workflow JSON; if empty the bundled WAN 2.2 workflow is used
max_wait_time (int): Maximum seconds to wait for the ComfyUI run (default: 600)
unload_ollama_models (bool): Whether to unload Ollama models from VRAM before running (default: False)
ollama_api_url (str): Ollama API URL used when unloading models (default: http://localhost:11434)

Usage

Import the workflow

In ComfyUI, import the workflow JSON extras/video_wan2_2_14B_t2v.json if you want to inspect or modify nodes.

Install / Configure the tool

Copy tools/text_to_video_comfyui_tool.py into your Open WebUI tools and set the comfyui_api_url and other valves as needed in the tool settings.

Generate a video

Call the tool with a prompt (e.g. "A cyberpunk panda skating through neon streets, 3s shot") and wait for the job to complete. The tool emits progress events and will provide an embedded HTML player or a direct ComfyUI URL.

Example:

Generate a 3 second shot of "a cyberpunk panda skating through neon city streets" using the default WAN 2.2 workflow

Example short video generated via ComfyUI WAN 2.2 workflow (thumbnail).

Features

Uses WAN 2.2 text-to-video model workflow by default (video_wan2_2_14B_t2v.json)
Submits workflow to ComfyUI and listens on WebSocket for completion
Extracts produced video files and optionally uploads them to Open WebUI storage for inline embedding
Optional Ollama VRAM unloading to free memory before runs
Configurable prompt node and wait timeout

🔄 Function Pipes

Flux Kontext ComfyUI Pipe

Description

A pipe that connects Open WebUI to the Flux Kontext image-to-image editing model through ComfyUI. This integration allows for advanced image editing, style transfers, and other creative transformations using the Flux Kontext workflow. Features an interactive /setup command system for easy configuration by administrators.

Configuration

The pipe includes an interactive setup system that allows administrators to configure all settings through chat commands. Most configuration can be done using the /setup command, which provides an interactive form for easy adjustment of parameters.

Key Configuration Options:

COMFYUI_ADDRESS: Address of the running ComfyUI server (default: http://127.0.0.1:8188)
COMFYUI_WORKFLOW_JSON: The entire ComfyUI workflow in JSON format
PROMPT_NODE_ID: Node ID for text prompt input (default: "6")
IMAGE_NODE_ID: Node ID for Base64 image input (default: "196")
KSAMPLER_NODE_ID: Node ID for the sampler node (default: "194")
ENHANCE_PROMPT: Enable vision model-based prompt enhancement (default: False)
VISION_MODEL_ID: Vision model to use for prompt enhancement
UNLOAD_OLLAMA_MODELS: Free RAM by unloading Ollama models before generation (default: False)
MAX_WAIT_TIME: Maximum wait time for generation in seconds (default: 1200)
AUTO_CHECK_MODEL_LOADER: Auto-detect model loader type for .safetensors or .gguf (default: False)

Usage

Initial Setup

Import the workflow:
- In ComfyUI, import extras/flux_context_owui_api_v1.json as a workflow
- Adjust node IDs if you modify the workflow
Configure using /setup command (Admin only):
- Type /setup in the chat to launch the interactive configuration form
- The form will display all current settings with input fields
- Adjust any settings you need to change
- Submit the form to apply and optionally save the configuration
- Settings can be persisted to a backend config file for permanent storage
Alternative: Manual configuration:
- Access the pipe's Valves in Open WebUI's admin panel
- Set COMFYUI_ADDRESS to your ComfyUI backend
- Paste the workflow JSON into COMFYUI_WORKFLOW_JSON
- Configure node IDs and other parameters as needed

Using the Pipe

Basic image editing:
- Upload an image to the chat
- Provide a text prompt describing the desired changes
- The pipe processes the image through ComfyUI and returns the edited result
Enhanced prompts (optional):
- Enable ENHANCE_PROMPT in settings
- Set a VISION_MODEL_ID (e.g., a multimodal model like LLaVA or GPT-4V)
- The vision model will analyze the input image and automatically refine your prompt for better results
Memory management:
- Enable UNLOAD_OLLAMA_MODELS to free RAM before generation
- The default workflow includes a Clean VRAM node for VRAM management in ComfyUI

Example - Image editing:

Prompt: "Edit this image to look like a medieval fantasy king, preserving facial features."
[Upload image]

Example of Flux Kontext /setup command interface

Example of Flux Kontext image editing output

Google Veo Text-to-Video & Image-to-Video Pipe

Description

Generate high-quality videos from text prompts or a single image using Google Veo via the Gemini API. This pipe enables advanced video generation capabilities directly from Open WebUI, supporting creative and professional use cases. It supports both text-to-video and image-to-video generation.

Note: Only one image is supported as input at this time. Multi-image input is not available.

Configuration

GOOGLE_API_KEY (str): Google API key for Gemini API access (required)
MODEL (str): The Veo model to use for video generation (default: "veo-3.1-generate-preview")
ENHANCE_PROMPT (bool): Use vision model to enhance prompt (default: False)
VISION_MODEL_ID (str): Vision model to be used as prompt enhancer
ENHANCER_SYSTEM_PROMPT (str): System prompt for prompt enhancement process
MAX_WAIT_TIME (int): Max wait time for video generation in seconds (default: 1200)

Prerequisites:

You must have access to the Google Gemini API and a valid API key.
Only one image is supported as input for image-to-video generation (Gemini API limitation).

Usage

Text-to-Video Example:

Generate a video of "a futuristic city at sunset with flying cars"

Image-to-Video Example:

Create a video from this image: [Attach image]

Features

Text-to-Video: Generate videos from descriptive text prompts
Image-to-Video: Animate a single image into a video sequence
High Quality: Leverages Google Veo's advanced video generation models
Direct Embedding: Returns markdown-formatted video links for display in chat
Status Updates: Progress and error reporting during generation

Limitations

Only one image is supported as input for image-to-video generation (Gemini API limitation)
Multi-image or video editing features are not available

Example Output

Example of Google Veo video generation output in Open WebUI

Planner Agent v2

Advanced autonomous agent with specialized model support, interactive user guidance, and comprehensive execution management.

This powerful agent autonomously generates and executes multi-step plans to achieve complex goals. It's a generalist agent capable of handling any text-based task, making it ideal for complex requests that would typically require multiple prompts and manual intervention.

🚀 Key Features

🧠 Intelligent Planning: Automatically breaks down goals into actionable steps with dependency mapping
🎨 Specialized Models: Dedicated models for writing (WRITER_MODEL), coding (CODER_MODEL), and tool usage (ACTION_MODEL) with automatic routing
🔍 Quality Control: Real-time output analysis with quality scoring (0.0-1.0) and iterative improvement
🎭 Interactive Error Handling: When actions fail or produce low-quality outputs, the system pauses and prompts you with options: retry with custom guidance/instructions, retry as-is, approve current output despite warnings, or abort the entire plan execution
📊 Live Progress: Real-time Mermaid diagrams with color-coded status indicators
🧩 Template System: Final synthesis using {{action_id}} placeholders for seamless content assembly
🔧 Native Tool Integration: Automatically discovers and uses all available Open WebUI tools
⚡ Advanced Features: Lightweight context mode, concurrent execution, cross-action references (@action_id), and comprehensive validation
🔮 MCP(OpenAPI servers) Support: Model Context Protocol integration coming soon for extended tool capabilities

⚙️ Configuration

Core Models:

MODEL: Main planning LLM
ACTION_MODEL: Tool-based actions and general tasks
WRITER_MODEL: Creative writing and documentation
CODER_MODEL: Code generation and development

Temperature Controls:

PLANNING_TEMPERATURE (0.8): Planning creativity
ACTION_TEMPERATURE (0.7): Tool execution precision
WRITER_TEMPERATURE (0.9): Creative writing freedom
CODER_TEMPERATURE (0.3): Code generation accuracy
ANALYSIS_TEMPERATURE (0.4): Output analysis precision

Execution Settings:

MAX_RETRIES (3): Retry attempts per action
CONCURRENT_ACTIONS (1): Parallel processing limit
ACTION_TIMEOUT (300): Individual action timeout
SHOW_ACTION_SUMMARIES (true): Detailed execution summaries
AUTOMATIC_TAKS_REQUIREMENT_ENHANCEMENT (false): AI-enhanced requirements

💡 Usage Examples

Multi-Media Content:

search the latest AI news and create a song based on that, with that , search for stock images to use a “album cover” and create a mockup of the spotify in a plain html file with vanilla js layout using those assets embeded for interactivity

Example of Planner Agent in action Using Gemini 2.5 flash and local music generation

Creative Writing:

create an epic sci fi Adult novel based on the current trends on academia news and social media about AI and other trending topics, with at least 10 chapters, well crafter world with rich characters , save each chapter in a folter named as the novel in obsidian with an illustration

Example of Planner Agent in action Using Gemini 2.5 flash and local image generation, local saving to obsidian and websearch

Interactive Error Recovery: The Planner Agent features intelligent error handling that engages with users when actions fail or produce suboptimal results. When issues occur, the system pauses execution and presents you with interactive options:

Retry with Guidance: Provide custom instructions to help the agent understand what went wrong and how to improve
Retry As-Is: Attempt the action again without modifications
Approve Output: Accept warning-level outputs despite quality concerns
Abort Execution: Stop the entire plan if the issue is critical

Example scenario: If an action fails to generate proper code or retrieve expected data, 
you'll be prompted to either provide specific guidance ("try using a different approach") 
or decide whether to continue with the current output.

Interactive error recovery dialog showing user options when an action encounters issues during plan execution

Technical Development:

Create a fully-featured Conway's Game of Life SPA with responsive UI, game controls, and pattern presets using vanilla HTML/CSS/JS

Example of Planner Agent in action Using local Hermes 8b (previous verision of the script)

arXiv Research MCTS Pipe

Description

Search arXiv.org for relevant academic papers and iteratively refine a research summary using a Monte Carlo Tree Search (MCTS) approach.

Configuration

model: The model ID from your LLM provider
tavily_api_key: Required. Obtain your API key from tavily.com
max_web_search_results: Number of web search results to fetch per query
max_arxiv_results: Number of results to fetch from the arXiv API per query
tree_breadth: Number of child nodes explored per MCTS iteration
tree_depth: Number of MCTS iterations
exploration_weight: Controls balance between exploration and exploitation
temperature_decay: Exponentially decreases LLM temperature with tree depth
dynamic_temperature_adjustment: Adjusts temperature based on parent node scores
maximum_temperature: Initial LLM temperature (default 1.4)
minimum_temperature: Final LLM temperature at max tree depth (default 0.5)

Usage

Example:

Do a research summary on "DPO laser LLM training"

Example of arXiv Research MCTS Pipe output

Multi Model Conversations Pipe

Description

Simulate conversations between multiple language models, each acting as a distinct character. Configure up to 5 participants with unique personas, models, and behaviors. This pipe enables engaging multi-agent discussions, debates, and collaborative problem-solving scenarios.

Configuration

This pipe features User Valves - configurable settings that are unique per user and can be customized for each individual chat session. This means each user can have their own conversation setups, and different chats can have different participant configurations.

Core Settings:

number_of_participants: Set the number of participants (1-5)
rounds_per_user_message: How many rounds of replies before the user can send another message

Per-Participant Configuration (for each of the 5 participants):

participant_[1-5]_model: Model for each participant (e.g., qwen3:14b, gpt-4, claude-3)
participant_[1-5]_alias: Display name for each participant (e.g., "Optimist", "Skeptic")
participant_[1-5]_system_message: Persona and instructions for each participant
all_participants_appended_message: Global instruction appended to each prompt

Model Parameters:

temperature, top_k, top_p: Standard model parameters for generation control

Accessing User Valves

To configure the conversation participants and settings:

In the new chat interface: Click the settings icon (gear icon) in the chat input area
A "Valves" panel will open showing all configurable parameters
Adjust the number of participants, models, aliases, and system messages
Each chat can have its own unique configuration
Settings are saved per-chat, allowing you to maintain different conversation setups

Example of Multi Model Conversations User Valves configuration panel

Usage

Example:

Start a conversation between three AI agents about climate change.

Use Cases:

Debates: Set up opposing viewpoints (optimist vs. skeptic)
Brainstorming: Multiple creative perspectives on a problem
Role-playing: Interactive storytelling with multiple characters
Analysis: Different analytical approaches to the same topic
Expert Panels: Simulate domain experts discussing a complex issue

Resume Analyzer Pipe

Description

Analyze resumes and provide tags, first impressions, adversarial analysis, potential interview questions, and career advice.

Configuration

model: The model ID from your LLM provider
dataset_path: Local path to the resume dataset CSV file
rapidapi_key (optional): For job search functionality
web_search: Enable/disable web search for relevant job postings
prompt_templates: Customizable templates for all steps

Usage

Requires the Full Document Filter (see below) to work with attached files.
Example:

Analyze this resume:
[Attach resume file]

Screenshots of Resume Analyzer Pipe output

Mopidy Music Controller

Description

Control your Mopidy music server to play songs from the local library or YouTube, manage playlists, and handle various music commands. This pipe provides an intuitive interface for music playback, search, and playlist management through natural language commands.

⚠️ Requirements: This pipe requires Mopidy-Iris to be installed for the player interface. Iris provides a beautiful, feature-rich web interface for controlling Mopidy.

Configuration

model: The model ID from your LLM provider
mopidy_url: URL for the Mopidy JSON-RPC API endpoint (default: http://localhost:6680/mopidy/rpc) - Iris UI must be installed
youtube_api_key: YouTube Data API key for search functionality
temperature: Model temperature (default: 0.7)
max_search_results: Maximum number of search results to return (default: 5)
system_prompt: System prompt for request analysis

Prerequisites

Mopidy Server: Install and configure Mopidy
Mopidy-Iris: Install the Iris web interface:
```
pip install Mopidy-Iris
```
Optional Extensions:
- Mopidy-Local (for local library)
- Mopidy-YouTube (for YouTube playback)

Usage

Example:
```
Play the song "Imagine" by John Lennon
```
Quick text commands: stop, halt, play, start, resume, continue, next, skip, pause

Features

Natural Language Control: Use conversational commands to control playback
YouTube Integration: Search and play songs directly from YouTube
Local Library Support: Access and play songs from your local Mopidy library
Playlist Management: Create, modify, and manage playlists
Iris UI Integration: Beautiful, professional web interface with full playback controls
Seamless Embedding: Iris player embedded directly in Open WebUI chat interface

Example of Mopidy Music Controller Pipe with Iris UI (v0.7.0)

Letta Agent Pipe

Description

Connect with Letta agents, enabling seamless integration of autonomous agents into Open WebUI conversations. Supports task-specific processing and maintains conversation context while communicating with the agent API.

Configuration

agent_id: The ID of the Letta agent to communicate with
api_url: Base URL for the Letta agent API (default: http://localhost:8283)
api_token: Bearer token for API authentication
task_model: Model to use for title/tags generation tasks
custom_name: Name of the agent to be displayed
timeout: Timeout to wait for Letta agent response in seconds (default: 400)

Usage

Example:

Chat with the built in Long Term memory Letta MemGPT agent.

Example of Letta Agent Pipe

OpenRouter Image Pipe

Description

An adapter pipe for the OpenRouter API that enables streaming, multi-modal chat completions with built-in websearch and image generation support. This pipe focuses on image generation capabilities and web search integration, with no support for external tools and streaming-only completions. Images are automatically saved to the Open WebUI backend and URLs are emitted for stable access.

Configuration (Valves)

API_KEY (str): OpenRouter API key (Bearer token)
ALLOWED_MODELS (List[str]): List of allowed model slugs (only these models can be invoked by the pipe)
USE_WEBSEARCH (bool): Enable the web search plugin globally or enable per-model by appending :online to the model id
USE_IMAGE_EMBEDDING (bool): When True the pipe will emit generated images as HTML <img> embeds; otherwise images are emitted as markdown links

Features

Streaming text deltas to the client in real-time (low-latency partial responses)
Emits structured reasoning details when available from the model
Saves base64 image responses to the Open WebUI files backend and returns stable URLs (with cache-busting timestamps)
Built-in websearch integration for enhanced responses
Model capability detection (queries OpenRouter models endpoint to find supported modalities and adapts payloads automatically)
No support for external tools - focused on core image generation and websearch functionality

Usage

Copy functions/openrouter_image_pipe.py into your Open WebUI Functions and enable it in your workspace. The pipe registers ids in the format openrouter-<model>-pipe (for example: openrouter-openai/gpt-4o-pipe). When invoked it will stream messages/events back to the Open WebUI frontend using the event emitter API.

Example:

   "Explain this image"

   "Web search recent news about Argentina and make an image about it"

Example screenshots

Below are example screenshots showing the pipe in action inside Open WebUI — streaming assistant text, vision-capable model input/output, and generated images.

Example: image generation with websearch integration.

OpenRouter WebSearch Citations Filter

Description

Enables web search for OpenRouter models by adding plugins and options to the request payload. This filter provides a UI toggle to use OpenRouter's native websearch with proper citation handling. It processes web search results and emits structured citation events for proper source attribution in Open WebUI.

Configuration (Valves)

engine (str): Web search engine - "auto" (automatic selection), "native" (provider's built-in), or "exa" (Exa API)
max_results (int): Maximum number of web search results to retrieve (1-10)
search_prompt (str): Template for incorporating web search results. Use {date} placeholder for current date.
search_context_size (str): Search context size - "low" (minimal), "medium" (moderate), "high" (extensive)

Features

UI toggle for enabling web search on OpenRouter models
Automatic citation generation with markdown links using domain names
Structured citation events for Open WebUI integration
Flexible search engine selection (auto, native, or Exa)
Configurable search result limits and context size
Real-time status updates during search execution

Usage

Copy filters/openrouter_websearch_citations_filter.py into your Open WebUI Filters and enable it in your model configuration. The filter will add web search capabilities to OpenRouter models with proper citation handling.

Example search prompt template:

A web search was conducted on {date}. Incorporate the following web search results into your response.
IMPORTANT: Cite them using markdown links named using the domain of the source.
Example: [nytimes.com](https://nytimes.com/some-page).

The filter processes annotations in the response stream and emits citation events with source URLs, titles, and metadata for each web search result.

🔧 Filters

Prompt Enhancer Filter

Description

Uses an LLM to automatically improve the quality of your prompts before they are sent to the main language model.

Configuration

user_customizable_template: Tailor the instructions given to the prompt-enhancing LLM
show_status: Displays status updates during the enhancement process
show_enhanced_prompt: Outputs the enhanced prompt to the chat window
model_id: Select the specific model to use for prompt enhancement

Usage

Enable in your model configuration's filters section.
Toggle the filter on or off as needed in chat settings.
The filter will automatically process each user message before it's sent to the main LLM.

Semantic Router Filter

Description

Acts as an intelligent model router that analyzes the user's message and available models, then automatically selects the most appropriate model, pipe, or preset for the task. Features vision model filtering, dynamic vision re-routing, conversation persistence, knowledge base integration, and robust file handling with Open WebUI's RAG system.

The filter uses an innovative invisible text marker system to persist routing decisions across conversation turns. When a model is selected, the filter emits zero-width unicode characters in the first assistant message. These markers are invisible to the LLM (stripped before processing) but persist in the chat database, ensuring the same model, tools, and knowledge bases are used throughout the entire conversation without requiring metadata or system message manipulation.

The filter automatically detects when images are added to an existing conversation and intelligently re-routes to a vision-capable model if the current model lacks vision support. This enables seamless transitions from text-only conversations to image-based interactions without manual model switching.

Configuration Valves

vision_fallback_model_id: Fallback model for image queries when no vision-capable models are available
banned_models: List of model IDs to exclude from routing selection
allowed_models: List of model IDs to whitelist (when set, only these models will be considered)
router_model_id: Specific model to use for routing decisions (leave empty to use current model)
system_prompt: System prompt for the router model (customizable)
disable_qwen_thinking: Append /no_think to router prompt for Qwen models
show_reasoning: Display routing reasoning in chat
status: Show status updates in chat
debug: Enable debug logging

Features

Conversation Persistence: Routes only on first user message, then automatically maintains the selected model throughout the conversation using invisible text markers
Dynamic Vision Re-Routing: Automatically detects when images are added mid-conversation and re-routes to a vision-capable model if the current model lacks vision support
Vision Model Filtering: Automatically filters model selection to only vision-capable models when images are detected in the conversation (checks meta.capabilities.vision flag)
Smart Fallback: Uses vision_fallback_model_id only when no vision models are available in the filtered list
Knowledge Base Integration: Properly handles files from knowledge collections with full RAG retrieval support
Tool Preservation: Maintains model-specific tools across conversation turns
File Structure Compliance: Passes files in correct INPUT format to Open WebUI's get_sources_from_items() for proper RAG processing
Whitelist Support: Use allowed_models to restrict selection to specific models only
Cross-Backend Compatibility: Automatically converts payloads between OpenAI and Ollama formats when routing between different backend types
Automatic Fallback: Gracefully handles errors by falling back to the original model

Usage

Enable in your model configuration's filters section
Configure vision_fallback_model_id to specify a fallback model for image queries
Optionally set allowed_models to create a whitelist of preferred models, or use banned_models to exclude specific ones
The filter will automatically:
- Route on the first user message only (analyzes task requirements and available models)
- Emit an invisible marker that persists the routing decision in chat history
- Detect and restore routing on subsequent messages in the conversation
- Re-route dynamically when images are added to a conversation if the current model lacks vision capability
- Detect images in conversations and filter to vision-capable models when present
- Preserve the selected model's tools and knowledge bases throughout the conversation
- Attach relevant files from knowledge collections with proper RAG retrieval
- Convert payloads between OpenAI and Ollama formats as needed

How It Works

First Message (Routing):

Analyzes user message and available models
Filters to vision-capable models if images are detected
Routes to the best model for the task
Emits invisible unicode marker (e.g., ‌‍⁠model-id‌‍⁠) in first assistant message
Preserves model's tools, knowledge bases, and configuration

Subsequent Messages (Persistence):

Detects invisible marker in conversation history
Extracts persisted model ID
Checks if images are present but current model lacks vision capability ⭐ NEW
If vision mismatch detected, triggers fresh routing with vision filter
Otherwise, reconstructs full routing (model + tools + knowledge + metadata)
Strips marker from message content (invisible to LLM)
Continues conversation with same model and configuration

Dynamic Vision Re-Routing Example:

User: "Explain quantum physics"
→ Router selects text model (e.g., llama3.2:latest)

User: "Thanks! Now what's in this picture?" [attaches image]
→ Filter detects: images present + current model lacks vision
→ Automatically triggers re-routing with vision filter
→ Router selects vision model (e.g., llama3.2-vision:latest)
→ Vision model processes image and responds

How Vision Filtering Works

When images are detected in the conversation:

Filter checks all available models for meta.capabilities.vision flag
Only vision-capable models are included in the routing selection
If no vision models are found, uses vision_fallback_model_id as fallback
Router model receives images for contextual routing decisions
If router model doesn't support vision, automatically switches to vision fallback for routing

Full Document Filter

Description

Allows Open WebUI to process entire attached files (such as resumes or documents) as part of the conversation. Cleans and prepends the file content to the first user message, ensuring the LLM receives the full context.

Configuration

priority (int): Priority level for the filter operations (default: 0)
max_turns (int): Maximum allowable conversation turns for a user (default: 8)

User Valves

max_turns (int): Maximum allowable conversation turns for a user (default: 4)

Usage

Enable the filter in your model configuration.
When you attach a file in Open WebUI, the filter will automatically clean and inject the file content into your message.
No manual configuration is needed for most users.

Example:

Analyze this resume:
[Attach resume file]

Clean Thinking Tags Filter

Description

Checks if an assistant's message ends with an unclosed or incomplete "thinking" tag. If so, it extracts the unfinished thought and presents it as a user-visible message.

Configuration

No configuration required.

Usage

Works automatically when enabled.

🎨 Using the Provided ComfyUI Workflows

Importing a Workflow

Open ComfyUI.
Click the "Load Workflow" or "Import" button.
Select the provided JSON file (e.g., ace_step_api.json or flux_context_owui_api_v1.json).
Save or modify as needed.
Use the node numbers in your Open WebUI tool configuration.

Best Practices

Always check node numbers after importing, as they may change if you modify the workflow.
You can create and share your own workflows by exporting them from ComfyUI.

Why this matters

This approach allows you to leverage state-of-the-art image and music generation/editing models with full control and customization, directly from Open WebUI.

📦 Installation

From Open WebUI Hub (Recommended)

Visit https://openwebui.com/u/haervwe
Click "Get" for desired tool/pipe/filter.
Follow prompts in your Open WebUI instance.

Manual Installation

Copy .py files from tools/, functions/, or filters/ into Open WebUI via the Workspace > Tools/Functions/Filters section.
Provide a name and description, then save.

🤝 Contributing

Feel free to contribute to this project by:

Forking the repository
Creating your feature branch
Committing your changes
Opening a pull request

📄 License

MIT License

🙏 Credits

Developed by Haervwe
Credit to the amazing teams behind:
- https://github.com/ollama/ollama
- https://github.com/open-webui/open-webui
And all model trainers out there providing these amazing tools.

🎯 Usage Examples

Academic Research

# Search for recent papers on a topic
Search for recent papers about "large language model training"

# Conduct comprehensive research
Do a research summary on "DPO laser LLM training"

Creative Projects

# Generate images
Create an image of "beautiful horse running free"

# Create music
Generate a song in the style of "funk, pop, soul" with lyrics: "In the shadows where secrets hide..."

# Edit images
Edit this image to look like a medieval fantasy king, preserving facial features

Productivity

# Analyze documents
Analyze this resume: [Attach resume file]

# Plan complex tasks
Create a fully-featured Single Page Application (SPA) for Conway's Game of Life

Multi-Agent Conversations

# Start group discussions
Start a conversation between three AI agents about climate change

🌟 Community & Ecosystem

This collection is part of the broader Open WebUI ecosystem. Here's how you can get involved:

🔗 Open WebUI Hub: Discover more tools at openwebui.com
📚 Documentation: Learn more about Open WebUI at docs.openwebui.com
💡 Ideas: Share your ideas and feature requests
🐛 Bug Reports: Help improve the tools by reporting issues
🌟 Star the Repository: Show your support by starring this repo

💬 Support

For issues, questions, or suggestions, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 295 Commits
Extras		Extras
Pipelines/MCP_Pipeline_deprecated		Pipelines/MCP_Pipeline_deprecated
filters		filters
functions		functions
img		img
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

Haervwe/open-webui-tools

Folders and files

Latest commit

History

Repository files navigation

Open WebUI Tools Collection

✨ What's Inside

🛠️ Tools

🔄 Function Pipes

🔧 Filters

🚀 Quick Start

Option 1: Open WebUI Hub (Recommended)

Option 2: Manual Installation

🎯 Key Features

📋 Prerequisites

🔧 Configuration

📖 Detailed Documentation

Table of Contents

🧪 Tools

arXiv Search Tool

Description

Configuration

Usage

Perplexica Search Tool

Description

Configuration

Usage

Features

Pexels Media Search Tool

Description

Configuration

Usage

Features

YouTube Search & Embed Tool

Description

Configuration

Usage

Features

Getting Started

Native Image Generator

Description

Configuration

Usage

Features

Hugging Face Image Generator

Description

Configuration

Usage

Cloudflare Workers AI Image Generator

Description

Configuration

Usage

Features

SearxNG Image Search Tool

Description

Configuration

Usage

Features

ComfyUI Image-to-Image Tool (Qwen Image Edit 2509)

Description

Configuration

Usage

Features

Workflow Details

Getting Started

ComfyUI ACE Step Audio Tool

Description

Configuration

Usage

Features

ComfyUI Text-to-Video Tool

Description

Configuration

Usage

Features

🔄 Function Pipes

Flux Kontext ComfyUI Pipe

Description

Configuration