diff --git a/.agents/skills/comfyui/SKILL.md b/.agents/skills/comfyui/SKILL.md new file mode 100644 index 00000000..64950870 --- /dev/null +++ b/.agents/skills/comfyui/SKILL.md @@ -0,0 +1,55 @@ +--- +name: comfyui +description: Use when working with ComfyUI workflows in OpenMontage, including comfyui_image/comfyui_video, custom workflow_json/workflow_path inputs, output_node selection, missing model setup, LoRAs, low-VRAM workflow choices, and community workflow imports. +--- + +# ComfyUI Workflows in OpenMontage + +Use this skill before calling `comfyui_image` or `comfyui_video`, and when converting a community ComfyUI workflow into an OpenMontage tool call. + +## Server Contract + +- ComfyUI must be running before the tool can generate. The default server is `http://localhost:8188`; override it with `COMFYUI_SERVER_URL`. +- Health and hardware status come from `GET /system_stats`. +- Jobs are submitted to `POST /prompt`, completed outputs are read from `GET /history/{prompt_id}`, and artifact bytes are downloaded with `GET /view`. +- Export workflows with ComfyUI's API-format JSON, not the UI layout format. If a downloaded workflow will not submit, re-export it from ComfyUI with API format enabled. + +## Choosing a Workflow + +- Use bundled workflows when the requested operation matches and the local machine has the required models and VRAM. +- Use a custom `workflow_json` or `workflow_path` when the user needs a community recipe, a lower-VRAM model, a different style family, or custom nodes. +- For 8GB-12GB GPUs, prefer lower-footprint workflows such as Wan 2.1 1.3B, LTXV FP8 or quantized workflows, or Wan 2.2 GGUF/quantized community workflows. The bundled Wan 2.2 14B FP8 video workflows are a 16GB-class path, not a provider-wide floor. +- Do not promise that arbitrary custom workflows will fit a machine. The workflow, quantization, resolution, frame count, and offload settings determine the real resource envelope. + +## Output Node Contract + +- Custom workflows must pass `output_node`. +- Pick the node that writes the artifact, usually `SaveImage`, `SaveVideo`, `VHS_VideoCombine`, or another terminal saver node. +- Pass the node ID as a string, for example `"108"`. Do not pass the class name. +- If a workflow has multiple savers, choose the final deliverable node, not previews or intermediates. + +## Templated vs Fixed Nodes + +- Identify templated nodes before execution: prompt text, seed, dimensions, frame count, source image, sampler settings, and output filename prefix. +- Fixed nodes are model loaders, VAEs, text encoders, LoRA loaders, schedulers, and graph wiring. Do not mutate those unless the workflow author intended that customization. +- For community workflows, inspect each loader node and note every required model or custom node before running. Missing models should be handled through the tool's structured `missing_models` payload when available. + +## Model and LoRA Setup + +- Use ComfyUI Manager or the workflow author's model links when available, and respect model licenses. +- Place models in the folders expected by the loader nodes: diffusion models under `ComfyUI/models/diffusion_models/`, text encoders under `ComfyUI/models/text_encoders/`, VAEs under `ComfyUI/models/vae/`, and LoRAs under `ComfyUI/models/loras/`. +- For LoRA stacks, use `LoraLoader` or `LoraLoaderModelOnly` chains in the workflow. Record each LoRA name plus `strength_model` and `strength_clip` when applicable. +- The current ComfyUI tools do not inject LoRAs into arbitrary graphs. To use LoRAs, provide a workflow that already contains the LoRA loader chain and pass model-stack provenance. + +## Provenance + +- For custom workflows, provide `workflow_name` and `workflow_model` when known. +- Provide `workflow_model_stack` for reproducibility when the workflow is not bundled. Include base checkpoint or diffusion model, quantization, text encoder, VAE, LoRAs and strengths, sampler or scheduler, steps, and guidance if the workflow exposes them. +- The tools record the final workflow hash. Treat that hash plus the model stack, seed, dimensions, and prompt as the reproducibility contract. + +## Failure Handling + +- If the server is unavailable, surface the structured setup offer. Starting ComfyUI or setting `COMFYUI_SERVER_URL` is the first fix. +- If models are missing, read `data.missing_models[]`; each item should include the file name, role, destination hint, and download URL when OpenMontage knows it. +- If custom nodes are missing, ask the user to install them through ComfyUI Manager or the workflow author's documented install path, then restart ComfyUI. +- If a long render times out locally, check ComfyUI history before retrying from scratch; the server may still have completed the prompt. diff --git a/.gitignore b/.gitignore index c22b62da..2e617755 100644 --- a/.gitignore +++ b/.gitignore @@ -79,3 +79,4 @@ remotion-composer/public/* remotion-composer/public/demo-props/test-* remotion-composer/public/demo-props/talking-head-* remotion-composer/public/demo-props/caption-burn-* +.venv/ diff --git a/docs/comfyui-adapter-plan.md b/docs/comfyui-adapter-plan.md new file mode 100644 index 00000000..d60fb89d --- /dev/null +++ b/docs/comfyui-adapter-plan.md @@ -0,0 +1,473 @@ +# ComfyUI Provider Adapter for OpenMontage + +**RFC: Native ComfyUI backend for image and video generation** + +--- + +## Motivation + +OpenMontage's local GPU tools (`wan_video`, `hunyuan_video`, `cogvideo_video`, +`local_diffusion`) use HuggingFace `diffusers` directly. This works on x86 + +consumer GPUs but breaks on newer hardware where the PyTorch ecosystem hasn't +caught up: + +| Issue | Detail | +|-------|--------| +| **NVIDIA Blackwell (sm_121)** | No stable PyTorch wheels for aarch64 + CUDA 13.0. Requires NGC containers or nightly builds. | +| **Flash Attention** | Does not support sm_121. Must be replaced with SageAttention v3 or native SDPA. | +| **Unified Memory (GB10/DGX Spark)** | `nvidia-smi` cannot report VRAM. Diffusers' memory estimation breaks. | +| **Model format mismatch** | Diffusers expects HF repos. Production deployments use `.safetensors` checkpoints with quantized variants (NVFP4, FP8) that diffusers doesn't natively load. | + +ComfyUI already solves all of these. NVIDIA ships official ComfyUI containers +for DGX Spark. The community has optimized workflows for Blackwell (SageAttention, +NVFP4 quantization, LightX2V 4-step LoRAs). Models like WAN 2.2, FLUX 2, +and ACE-Step run reliably through ComfyUI on hardware where diffusers cannot. + +A ComfyUI adapter gives OpenMontage access to any model ComfyUI supports, +on any hardware ComfyUI runs on, without shipping or maintaining PyTorch builds. + +--- + +## Design + +### Architecture + +``` +OpenMontage Agent + | + v +video_selector / image_selector + | + v +comfyui_video comfyui_image (new tools) + | | + v v +ComfyUI REST API (POST /prompt, GET /history, GET /view) + | + v +GPU (any hardware ComfyUI supports) +``` + +### Integration model + +Two new `BaseTool` subclasses plus one shared client library: + +``` +tools/ + _comfyui/ + __init__.py + client.py # Shared ComfyUI REST client + workflows/ # Bundled workflow templates + flux2-txt2img.json + wan22-t2v-4step.json + wan22-i2v-4step.json + graphics/ + comfyui_image.py # capability="image_generation", provider="comfyui" + video/ + comfyui_video.py # capability="video_generation", provider="comfyui" +``` + +### Registry and selector integration + +The tools declare `capability` and `provider` as class attributes. +`tool_registry.discover()` picks them up automatically via `pkgutil.walk_packages`. +`video_selector` and `image_selector` find them via `registry.get_by_capability()`. +The only selector change is operation-specific filtering in `video_selector` so +ComfyUI is not selected for `image_to_video` when only the text-to-video bundled +models are installed, or vice versa. + +--- + +## Shared Client: `tools/_comfyui/client.py` + +Encapsulates the ComfyUI REST API pattern proven in production (used by the +Bard project's Airflow DAGs for thousands of generations): + +The endpoint contract was checked against current ComfyUI server documentation +and the April 2026 third-party developer guide: + +- Official routes: `POST /prompt`, `GET /history/{prompt_id}`, `GET /view`, + `POST /upload/image`, `GET /object_info/{node_class}`, `GET /models/{folder}`, + `GET /system_stats`, and `WS /ws` are documented server routes. +- `/prompt` accepts the workflow in API format under the `prompt` key and + returns `prompt_id`, `number`, and `node_errors` on validation. +- `/history/{prompt_id}` returns completed node outputs; artifact records include + `filename`, `subfolder`, and `type`. The client passes all three through to + `/view` instead of assuming `type=output`. +- Workflows must be exported in ComfyUI API format, not the regular visual + canvas workflow format. + +References: + +- https://docs.comfy.org/development/comfyui-server/comms_routes +- https://www.runflow.io/blog/comfyui-api-developer-guide + +```python +class ComfyUIClient: + """Thin client for the ComfyUI REST API.""" + + def __init__(self, server_url: str | None = None): + self.server_url = server_url or os.environ.get( + "COMFYUI_SERVER_URL", "http://localhost:8188" + ) + + def is_available(self) -> bool: + """Health check -- can we reach the server?""" + + def submit(self, workflow: dict) -> str: + """POST /prompt. Returns prompt_id. Raises on node_errors.""" + + def poll(self, prompt_id: str, timeout: int = 600, interval: int = 5) -> dict: + """GET /history/{prompt_id} until complete. Returns outputs dict.""" + + def download(self, filename: str, subfolder: str, dest: Path) -> Path: + """GET /view?filename=...&type=output. Writes bytes to dest.""" + + def upload_image(self, local_path: Path, name: str) -> str: + """POST /upload/image. Returns server-side filename for LoadImage nodes.""" + + def generate(self, workflow: dict, output_node: str, dest: Path, + timeout: int = 600) -> Path: + """Full cycle: submit -> poll -> download. Returns artifact path.""" +``` + +**Why a shared client?** The submit/poll/download cycle is identical across +image and video generation. The only differences are: which workflow template, +which nodes to customize, and which output node to read from. + +--- + +## Tool Specifications + +### `comfyui_image` -- Image Generation + +| Field | Value | +|-------|-------| +| capability | `image_generation` | +| provider | `comfyui` | +| runtime | `LOCAL_GPU` | +| tier | `GENERATE` | +| stability | `EXPERIMENTAL` | +| capabilities | `text_to_image`, `image_to_image` | +| dependencies | (runtime: ComfyUI server reachable) | +| fallback_tools | `flux_image`, `local_diffusion`, `openai_image` | +| cost | `$0.00` (local compute) | + +**Bundled workflow:** `flux2-txt2img.json` + +Loads FLUX 2 Dev (NVFP4) with Mistral text encoder. Templated nodes: + +| Node | Class | Templated field | +|------|-------|-----------------| +| 4 | CLIPTextEncode | `text` (prompt) | +| 6 | EmptyFlux2LatentImage | `width`, `height` | +| 7 | RandomNoise | `noise_seed` | +| 10 | Flux2Scheduler | `steps` | +| 13 | SaveImage | `filename_prefix` | + +**Input schema:** + +```yaml +prompt: string # required +width: integer # default 1024 +height: integer # default 1024 +steps: integer # default 20 +seed: integer # optional (random if omitted) +guidance: number # default 3.5 +output_path: string # where to save the image +workflow_json: string # optional custom workflow; requires output_node +workflow_path: string # optional path to workflow JSON; requires output_node +output_node: string # required for custom workflows +workflow_name: string # optional custom workflow provenance label +workflow_model: string # optional custom model/provenance label +workflow_model_stack: [] # optional custom dependency provenance +``` + +**get_status():** Pings ComfyUI server and checks bundled FLUX model names via +`/object_info`. Returns `AVAILABLE` when the server and bundled model set are +ready, `DEGRADED` when the server is reachable but bundled models are missing, +and `UNAVAILABLE` when the server cannot be reached. + +**execute() flow:** +1. Deep-copy workflow template +2. Inject prompt, seed, dimensions, steps into templated nodes +3. `client.generate(workflow, output_node="13", dest=output_path)` +4. Return `ToolResult` with artifact path, seed, model info + +For custom workflows, the caller must provide `workflow_json` or `workflow_path` +plus `output_node`. The tool does not assume bundled node IDs for custom +workflows, and provenance is reported as user-supplied unless the caller provides +`workflow_model`. Results also include the final workflow SHA-256 hash and, for +bundled workflows, the known model stack. + +--- + +### `comfyui_video` -- Video Generation + +| Field | Value | +|-------|-------| +| capability | `video_generation` | +| provider | `comfyui` | +| runtime | `LOCAL_GPU` | +| tier | `GENERATE` | +| stability | `EXPERIMENTAL` | +| capabilities | `text_to_video`, `image_to_video` | +| dependencies | (runtime: ComfyUI server reachable) | +| fallback_tools | `wan_video`, `hunyuan_video`, `ltx_video_local` | +| cost | `$0.00` (local compute) | + +**Bundled workflows:** + +1. **`wan22-i2v-4step.json`** -- Image-to-video (WAN 2.2 14B, fp8, 4-step LightX2V LoRA) +2. **`wan22-t2v-4step.json`** -- Text-to-video (WAN 2.2 14B, fp8, 4-step LightX2V LoRA) + +These bundled WAN 2.2 14B FP8 workflows are the high-quality profile and +recommend roughly 16GB VRAM. That is not a ComfyUI-wide requirement. The +`comfyui_video` tool's top-level `resource_profile` is an 8GB provider floor so +preflight does not imply ComfyUI itself requires 16GB. Low-VRAM users should use +custom workflows such as Wan 2.1 1.3B, LTX-Video/LTXV FP8 or quantized graphs, +or Wan 2.2 GGUF/quantized community workflows, with shorter frame counts and +lower resolutions as needed. + +**I2V workflow -- templated nodes:** + +| Node | Class | Templated field | +|------|-------|-----------------| +| 93 | CLIPTextEncode | `text` (positive prompt) | +| 97 | LoadImage | `image` (server filename from upload) | +| 98 | WanImageToVideo | `width`, `height`, `length` | +| 86 | KSamplerAdvanced | `noise_seed` | +| 108 | SaveVideo | `filename_prefix` | + +**Input schema:** + +```yaml +prompt: string # required +operation: string # "text_to_video" | "image_to_video" (default: t2v) +reference_image_path: string # local path (for i2v) +reference_image_url: string # URL (for i2v, downloaded first) +width: integer # default 640 +height: integer # default 640 +num_frames: integer # default 81 (5s at 16fps) +seed: integer # optional +output_path: string # where to save the video +workflow_json: string # optional custom workflow; requires output_node +workflow_path: string # optional path to workflow JSON; requires output_node +output_node: string # required for custom workflows +workflow_name: string # optional custom workflow provenance label +workflow_model: string # optional custom model/provenance label +workflow_model_stack: [] # optional custom dependency provenance +``` + +**execute() flow (i2v):** +1. Upload reference image via `client.upload_image()` +2. Deep-copy i2v workflow template +3. Inject prompt, uploaded image name, seed, dimensions +4. `client.generate(workflow, output_node="108", dest=output_path, timeout=900)` +5. Return `ToolResult` + +**execute() flow (t2v):** +1. Deep-copy t2v workflow template +2. Inject prompt, seed, dimensions +3. `client.generate(workflow, output_node="16", dest=output_path, timeout=900)` +4. Return `ToolResult` + +`comfyui_video` publishes `operation_statuses` in `get_info()` and implements +`is_operation_available(operation)` for selector routing. This keeps partial +ComfyUI installs useful for the installed mode without advertising unavailable +operation modes as ready. `video_selector` also applies this readiness check +when `operation="rank"` by using `target_operation`, so preflight rankings do +not promote ComfyUI for an operation whose bundled models are missing. + +--- + +### `comfyui_music` -- Music Generation (not shipped) + +We explored adding a `comfyui_music` tool using the ACE-Step 3.5B model. +The model runs well in ComfyUI, but the ComfyUI node interface for +ACE-Step is not standardized -- there are multiple custom node packs with +different class names (`AceStepModelLoader` vs native `TextEncodeAceStepAudio`, +etc.). Shipping a workflow that only works with one specific custom node +pack would break for most users. + +**Future path:** ACE-Step support should be revisited once OpenMontage decides +the music-generation routing shape and a portable ComfyUI audio workflow +contract. Current image/video workflow overrides are intentionally scoped to +image and video artifacts, not arbitrary audio workflows. + +--- + +## Workflow Override Mechanism + +The image and video tools accept either `workflow_json` or `workflow_path`. +When provided, the custom workflow replaces the bundled template entirely and +the caller must also provide `output_node`. This stricter contract is required +because community workflows use arbitrary node IDs. + +- Using newer model checkpoints without code changes +- Custom sampling strategies (different schedulers, step counts, LoRAs) +- Community workflows dropped in as-is +- A/B testing different generation approaches + +The agent can also read workflow files from `tools/_comfyui/workflows/` and +modify them programmatically before passing to `execute()`. + +Custom workflow result metadata reports `workflow_provenance.source` as +`user_supplied` and uses `workflow_model`, `model`, or `workflow_name` as the +model label when provided. If no custom label is supplied, the model is reported +as `custom-comfyui-workflow` instead of one of the bundled model names. The +provenance payload also records `workflow_hash_sha256`. For user-supplied +workflows, callers should provide `workflow_model_stack` with base model, text +encoder, VAE, LoRAs and strengths, scheduler, steps, and guidance when known. + +--- + +## Agent Skill and Setup Contract + +Both ComfyUI tools advertise the Layer 3 `comfyui` skill. Agents must read +`.agents/skills/comfyui/SKILL.md` before calling either tool so they know how to +load community workflows, identify output nodes, handle LoRA loader chains, and +record custom workflow provenance. + +Unavailable ComfyUI tools expose a structured `setup_offer` in `get_info()`, +`provider_menu()`, and `provider_menu_summary().setup_offers[]`: + +```yaml +kind: local_server +env_var: COMFYUI_SERVER_URL +default_url: http://localhost:8188 +health_check: GET /system_stats +``` + +When bundled models are missing, the tool returns a machine-readable +`data.missing_models[]` list with filename, role, destination hint, and download +URL when OpenMontage knows the canonical source. Agents should surface that +payload rather than parsing prose error text. + +--- + +## Configuration + +**Environment variables:** + +```bash +# .env +COMFYUI_SERVER_URL=http://localhost:8188 # ComfyUI API endpoint +COMFYUI_POLL_INTERVAL=5 # seconds between status checks +COMFYUI_POLL_TIMEOUT=600 # max wait for image gen +COMFYUI_VIDEO_TIMEOUT=900 # max wait for video gen +``` + +**For Docker Compose setups** (ComfyUI in a container): + +```bash +COMFYUI_SERVER_URL=http://host.docker.internal:8188 +# or +COMFYUI_SERVER_URL=http://comfyui:8188 # if on same docker network +``` + +--- + +## Provider Selection Behavior + +When the adapter is available, selectors will rank it alongside other providers +using OpenMontage's 7-dimension scoring: + +| Dimension | ComfyUI score | Rationale | +|-----------|---------------|-----------| +| Task fit | High | Supports t2i, i2v, t2v | +| Quality | High | Latest models (FLUX 2, WAN 2.2 14B) | +| Control | Highest | Full workflow customization | +| Reliability | High | Proven in production | +| Cost | $0 | Local compute | +| Latency | Medium | GPU-bound, no network round-trip | +| Continuity | High | Deterministic with seeds | + +When ComfyUI is unavailable (server down), selectors fall through to other +available providers. When only one video operation is configured, `video_selector` +uses the tool's operation-specific readiness to avoid selecting ComfyUI for the +missing mode. + +--- + +## What This Unlocks + +### Immediate (with existing models) + +- **FLUX 2 Dev NVFP4** image generation -- Blackwell-optimized, ~60s per image +- **WAN 2.2 14B FP8 high-quality profile** i2v with 4-step acceleration -- ~3.5 min per 5s clip, about 16GB VRAM recommended +- **WAN 2.2 14B FP8 high-quality profile** t2v (models downloaded, workflow included), about 16GB VRAM recommended + +### Low-VRAM profile + +ComfyUI can still be useful on 8GB-12GB GPUs when the user supplies an +appropriate `workflow_json` or `workflow_path`. Good candidates include: + +- Wan 2.1 1.3B workflows for lower-memory text-to-video. +- LTX-Video/LTXV FP8 or quantized workflows for fast short clips. +- Wan 2.2 GGUF/quantized community workflows at lower resolution and frame count. + +OpenMontage should treat those as custom workflow profiles until a blessed +low-VRAM workflow is bundled. For custom workflows, resource requirements are +workflow-supplied rather than inferred from the bundled WAN 2.2 14B profile. + +### Future (add models to ComfyUI, no code changes to OpenMontage) + +- Newer checkpoints (WAN 3.x, FLUX 3, etc.) -- just update workflow JSON +- ControlNet, IP-Adapter, AnimateDiff -- supported via ComfyUI custom nodes +- Upscaling, inpainting, outpainting -- ComfyUI nodes exist +- Any model the ComfyUI ecosystem supports + +### Hardware portability + +The same adapter works on: +- NVIDIA DGX Spark (GB10, aarch64, CUDA 13.0) +- Consumer GPUs (RTX 3090/4090, x86) +- Cloud instances (A100, H100) +- Multi-GPU setups (ComfyUI handles device placement) + +No PyTorch version pinning, no architecture-specific wheels, no CUDA +compatibility matrices. ComfyUI is the abstraction layer. + +--- + +## Implementation Scope + +| Component | Files | Estimated size | +|-----------|-------|----------------| +| Shared client | `tools/_comfyui/client.py` | ~180 lines | +| Shared metadata | `tools/_comfyui/metadata.py` | setup, model stack, provenance helpers | +| Image tool | `tools/graphics/comfyui_image.py` | ~140 lines | +| Video tool | `tools/video/comfyui_video.py` | ~190 lines | +| Layer 3 skill | `.agents/skills/comfyui/SKILL.md` | usage contract | +| Registry summary | `tools/tool_registry.py` | setup offer surfacing | +| Selector readiness filter | `tools/video/video_selector.py` | small operation-readiness check | +| Workflow templates | `tools/_comfyui/workflows/*.json` | 3 files | +| Tests | `tests/contracts/test_comfyui_tools.py` | ~200 lines | +| Docs | `docs/comfyui-adapter-plan.md` | This file | + +**Total:** ~500 lines of Python + 3 workflow JSONs. + +No changes to: `base_tool.py`, existing non-ComfyUI generation providers, any +pipeline definition, or any schema. + +--- + +## Open Questions + +1. **Workflow versioning:** Should workflow JSONs live in the repo or be + user-provided via a config directory? Bundling gives reproducibility; + external gives flexibility. + +2. **Async generation:** ComfyUI supports websocket connections for real-time + progress. Worth implementing for long video generations, or is polling + sufficient? + +3. **Multi-server:** Should the adapter support multiple ComfyUI instances + (e.g., one for images, one for video) via per-capability URLs? + +4. **Music generation:** ACE-Step works in ComfyUI but OpenMontage needs a + dedicated music-generation routing contract before adding `comfyui_music`. + The follow-up should decide selector integration, audio artifact schemas, and + a portable workflow/output-node contract rather than treating music as a + hidden image/video workflow override. diff --git a/tests/contracts/test_comfyui_tools.py b/tests/contracts/test_comfyui_tools.py new file mode 100644 index 00000000..05009beb --- /dev/null +++ b/tests/contracts/test_comfyui_tools.py @@ -0,0 +1,545 @@ +"""Contract tests for ComfyUI provider tools. + +These tests verify that the tools satisfy the BaseTool contract without +requiring a running ComfyUI server. They check class attributes, +schemas, status reporting, and cost estimates. +""" + +import json +from pathlib import Path + +import pytest + +from tools.base_tool import ( + BaseTool, + ToolRuntime, + ToolStability, + ToolStatus, + ToolTier, +) +from tools.graphics.comfyui_image import ComfyUIImage +from tools.tool_registry import ToolRegistry +from tools.video.video_selector import VideoSelector +from tools.video.comfyui_video import ComfyUIVideo + +TOOLS = [ComfyUIImage, ComfyUIVideo] +WORKFLOW_DIR = Path(__file__).resolve().parent.parent.parent / "tools" / "_comfyui" / "workflows" +PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent + + +# ------------------------------------------------------------------ +# Contract compliance +# ------------------------------------------------------------------ + +@pytest.mark.parametrize("cls", TOOLS, ids=lambda c: c.name) +class TestContract: + + def test_inherits_base_tool(self, cls): + assert issubclass(cls, BaseTool) + + def test_has_required_identity(self, cls): + tool = cls() + assert tool.name + assert tool.version + assert tool.capability + assert tool.provider == "comfyui" + assert tool.tier == ToolTier.GENERATE + assert tool.stability == ToolStability.EXPERIMENTAL + assert tool.runtime == ToolRuntime.LOCAL_GPU + + def test_has_input_schema(self, cls): + tool = cls() + schema = tool.input_schema + assert schema.get("type") == "object" + assert "prompt" in schema.get("properties", {}) + assert "prompt" in schema.get("required", []) + + def test_has_capabilities(self, cls): + tool = cls() + assert len(tool.capabilities) > 0 + + def test_has_agent_skills(self, cls): + tool = cls() + assert tool.agent_skills + assert "comfyui" in tool.agent_skills + + def test_comfyui_layer3_skill_exists(self, cls): + skill_path = PROJECT_ROOT / ".agents" / "skills" / "comfyui" / "SKILL.md" + assert skill_path.exists() + assert "output_node" in skill_path.read_text(encoding="utf-8") + + def test_has_fallbacks(self, cls): + tool = cls() + assert tool.fallback or tool.fallback_tools + + def test_cost_is_zero(self, cls): + tool = cls() + assert tool.estimate_cost({"prompt": "test"}) == 0.0 + + def test_runtime_estimate_positive(self, cls): + tool = cls() + assert tool.estimate_runtime({"prompt": "test"}) > 0 + + def test_get_info_returns_dict(self, cls): + tool = cls() + info = tool.get_info() + assert isinstance(info, dict) + assert info["name"] == tool.name + assert info["provider"] == "comfyui" + assert info["runtime"] == "local_gpu" + assert info["setup_offer"]["env_var"] == "COMFYUI_SERVER_URL" + + def test_video_resource_profile_does_not_mandate_16gb(self, cls): + if cls is not ComfyUIVideo: + return + tool = ComfyUIVideo() + info = tool.get_info() + assert info["resource_profile"]["vram_mb"] == 8000 + assert info["resource_profiles"]["provider_floor"]["vram_mb"] == 8000 + assert info["resource_profiles"]["bundled_wan22_14b_fp8"]["vram_mb"] == 16000 + assert "not a ComfyUI provider-wide requirement" in ( + info["resource_profiles"]["bundled_wan22_14b_fp8"]["applies_to"] + ) + + def test_status_unavailable_without_server(self, cls): + """Without a running server, status should be UNAVAILABLE.""" + tool = cls() + # Point to a port that's almost certainly not running ComfyUI + tool._client.server_url = "http://127.0.0.1:19999" + assert tool.get_status() == ToolStatus.UNAVAILABLE + + def test_idempotency_key_fields(self, cls): + tool = cls() + assert len(tool.idempotency_key_fields) > 0 + assert "prompt" in tool.idempotency_key_fields + + def test_custom_workflow_schema_requires_output_node_contract(self, cls): + tool = cls() + props = tool.input_schema.get("properties", {}) + assert "workflow_json" in props + assert "workflow_path" in props + assert "output_node" in props + + def test_custom_workflow_requires_output_node(self, cls): + tool = cls() + result = tool.execute({"prompt": "test", "workflow_json": "{}"}) + assert result.success is False + assert "output_node" in result.error + + +# ------------------------------------------------------------------ +# Workflow files +# ------------------------------------------------------------------ + +EXPECTED_WORKFLOWS = [ + "flux2-txt2img.json", + "wan22-i2v-4step.json", + "wan22-t2v-4step.json", +] + + +@pytest.mark.parametrize("filename", EXPECTED_WORKFLOWS) +def test_workflow_exists_and_valid_json(filename): + path = WORKFLOW_DIR / filename + assert path.exists(), f"Missing workflow: {path}" + with open(path) as f: + data = json.load(f) + assert isinstance(data, dict) + assert len(data) > 0 + + +def test_flux2_workflow_has_templated_nodes(): + with open(WORKFLOW_DIR / "flux2-txt2img.json") as f: + w = json.load(f) + assert "4" in w # CLIPTextEncode (prompt) + assert "7" in w # RandomNoise (seed) + assert "13" in w # SaveImage (output) + + +def test_i2v_workflow_has_templated_nodes(): + with open(WORKFLOW_DIR / "wan22-i2v-4step.json") as f: + w = json.load(f) + assert "93" in w # CLIPTextEncode (prompt) + assert "97" in w # LoadImage (reference) + assert "86" in w # KSamplerAdvanced (seed) + assert "108" in w # SaveVideo (output) + + +def test_t2v_workflow_has_templated_nodes(): + with open(WORKFLOW_DIR / "wan22-t2v-4step.json") as f: + w = json.load(f) + assert "2" in w # CLIPTextEncode (prompt) + assert "12" in w # KSamplerAdvanced (seed) + assert "16" in w # SaveVideo (output) + + +# ------------------------------------------------------------------ +# Client unit tests +# ------------------------------------------------------------------ + +class TestClientHelpers: + + def test_load_workflow(self): + from tools._comfyui.client import ComfyUIClient + w = ComfyUIClient.load_workflow(WORKFLOW_DIR / "flux2-txt2img.json") + assert isinstance(w, dict) + assert "1" in w + + def test_patch_workflow(self): + from tools._comfyui.client import ComfyUIClient + w = ComfyUIClient.load_workflow(WORKFLOW_DIR / "flux2-txt2img.json") + patched = ComfyUIClient.patch_workflow(w, { + "4": {"text": "hello world"}, + "7": {"noise_seed": 123}, + }) + assert patched["4"]["inputs"]["text"] == "hello world" + assert patched["7"]["inputs"]["noise_seed"] == 123 + # Original unchanged + assert w["4"]["inputs"]["text"] == "" + + def test_patch_workflow_bad_node(self): + from tools._comfyui.client import ComfyUIClient, ComfyUIError + w = {"1": {"inputs": {"x": 1}}} + with pytest.raises(ComfyUIError, match="not found"): + ComfyUIClient.patch_workflow(w, {"99": {"x": 2}}) + + def test_submit_surfaces_node_errors_before_http_error(self, monkeypatch): + from tools._comfyui.client import ComfyUIClient, ComfyUIError + + class FakeResponse: + status_code = 400 + + def json(self): + return { + "error": {"message": "Prompt outputs failed validation"}, + "node_errors": {"4": {"class_type": "MissingNode"}}, + } + + def raise_for_status(self): + raise AssertionError("HTTPError should not hide node_errors") + + monkeypatch.setattr( + "tools._comfyui.client.requests.post", + lambda *args, **kwargs: FakeResponse(), + ) + + with pytest.raises(ComfyUIError, match="Node errors"): + ComfyUIClient("http://comfy.test").submit({}) + + def test_random_seed_range(self): + from tools._comfyui.client import ComfyUIClient + for _ in range(100): + s = ComfyUIClient.random_seed() + assert 0 <= s < 2**32 + + def test_generate_passes_history_item_type_to_view(self, monkeypatch, tmp_path): + from tools._comfyui.client import ComfyUIClient + + client = ComfyUIClient("http://comfy.test") + seen = {} + + monkeypatch.setattr(client, "submit", lambda workflow: "prompt-1") + monkeypatch.setattr(client, "poll", lambda prompt_id, **kwargs: { + "outputs": { + "9": { + "images": [{ + "filename": "preview.png", + "subfolder": "previews", + "type": "temp", + }] + } + } + }) + + def fake_download(filename, subfolder, dest, folder_type="output"): + seen["filename"] = filename + seen["subfolder"] = subfolder + seen["folder_type"] = folder_type + return Path(dest) + + monkeypatch.setattr(client, "download", fake_download) + + client.generate({"9": {"inputs": {}}}, "9", tmp_path / "preview.png") + + assert seen == { + "filename": "preview.png", + "subfolder": "previews", + "folder_type": "temp", + } + + def test_is_default_url_when_env_not_set(self, monkeypatch): + from tools._comfyui.client import ComfyUIClient + monkeypatch.delenv("COMFYUI_SERVER_URL", raising=False) + client = ComfyUIClient() + assert client.is_default_url is True + + def test_is_not_default_url_when_env_set(self, monkeypatch): + from tools._comfyui.client import ComfyUIClient + monkeypatch.setenv("COMFYUI_SERVER_URL", "http://myhost:9999") + client = ComfyUIClient() + assert client.is_default_url is False + + def test_unavailable_reason_default_url(self, monkeypatch): + from tools._comfyui.client import ComfyUIClient + monkeypatch.delenv("COMFYUI_SERVER_URL", raising=False) + client = ComfyUIClient() + msg = client.unavailable_reason() + assert "COMFYUI_SERVER_URL" in msg + assert ".env" in msg + + def test_unavailable_reason_custom_url(self, monkeypatch): + from tools._comfyui.client import ComfyUIClient + monkeypatch.setenv("COMFYUI_SERVER_URL", "http://myhost:9999") + client = ComfyUIClient() + msg = client.unavailable_reason() + assert "myhost:9999" in msg + assert "COMFYUI_SERVER_URL" not in msg + + +# ------------------------------------------------------------------ +# Model discovery (offline, no server needed) +# ------------------------------------------------------------------ + +class TestModelRequirements: + + def test_image_tool_has_required_models(self): + from tools.graphics.comfyui_image import _REQUIRED_MODELS + assert len(_REQUIRED_MODELS) > 0 + assert any("flux" in m.lower() for m in _REQUIRED_MODELS) + + def test_video_tool_has_required_models_i2v(self): + from tools.video.comfyui_video import _REQUIRED_MODELS_I2V + assert len(_REQUIRED_MODELS_I2V) > 0 + assert any("i2v" in m.lower() for m in _REQUIRED_MODELS_I2V) + + def test_video_tool_has_required_models_t2v(self): + from tools.video.comfyui_video import _REQUIRED_MODELS_T2V + assert len(_REQUIRED_MODELS_T2V) > 0 + assert any("t2v" in m.lower() for m in _REQUIRED_MODELS_T2V) + + +# ------------------------------------------------------------------ +# Custom workflow contract and provenance +# ------------------------------------------------------------------ + +class TestCustomWorkflowContract: + + def test_image_custom_workflow_uses_caller_output_node_and_provenance(self, tmp_path): + tool = ComfyUIImage() + tool._client.is_available = lambda: True + seen = {} + + def fake_generate(workflow, output_node, dest, **kwargs): + seen["workflow"] = workflow + seen["output_node"] = output_node + return [Path(dest)] + + tool._client.generate = fake_generate + + result = tool.execute({ + "prompt": "test", + "workflow_json": json.dumps({"99": {"inputs": {}}}), + "output_node": "99", + "workflow_model": "custom-flux", + "output_path": str(tmp_path / "image.png"), + }) + + assert result.success is True + assert seen["output_node"] == "99" + assert result.model == "custom-flux" + assert result.data["model"] == "custom-flux" + assert result.data["workflow_provenance"]["source"] == "user_supplied" + assert result.data["workflow_provenance"]["output_node"] == "99" + assert result.data["workflow_provenance"]["workflow_hash_sha256"] + assert result.data["workflow_provenance"]["model_stack_source"] == ( + "unknown_custom_workflow" + ) + + def test_video_custom_workflow_uses_caller_output_node_and_provenance(self, tmp_path): + tool = ComfyUIVideo() + tool._client.is_available = lambda: True + seen = {} + + def fake_generate(workflow, output_node, dest, **kwargs): + seen["workflow"] = workflow + seen["output_node"] = output_node + return [Path(dest)] + + tool._client.generate = fake_generate + + result = tool.execute({ + "prompt": "test", + "workflow_json": json.dumps({"42": {"inputs": {}}}), + "output_node": "42", + "workflow_model": "custom-wan", + "output_path": str(tmp_path / "video.mp4"), + }) + + assert result.success is True + assert seen["output_node"] == "42" + assert result.model == "custom-wan" + assert result.data["model"] == "custom-wan" + assert result.data["workflow_provenance"]["source"] == "user_supplied" + assert result.data["workflow_provenance"]["output_node"] == "42" + assert result.data["workflow_provenance"]["workflow_hash_sha256"] + assert result.data["workflow_provenance"]["model_stack_source"] == ( + "unknown_custom_workflow" + ) + + def test_custom_workflow_accepts_model_stack_provenance(self, tmp_path): + tool = ComfyUIVideo() + tool._client.is_available = lambda: True + tool._client.generate = lambda workflow, output_node, dest, **kwargs: [Path(dest)] + + result = tool.execute({ + "prompt": "test", + "workflow_json": json.dumps({"42": {"inputs": {}}}), + "output_node": "42", + "workflow_model_stack": [{"role": "lora", "name": "style.safetensors"}], + "output_path": str(tmp_path / "video.mp4"), + }) + + provenance = result.data["workflow_provenance"] + assert provenance["model_stack"] == [{"role": "lora", "name": "style.safetensors"}] + assert provenance["model_stack_source"] == "caller_supplied" + + def test_image_missing_models_are_structured(self): + tool = ComfyUIImage() + tool._client.is_available = lambda: True + tool._client.check_models = lambda required: ( + [], + ["flux2-vae.safetensors"], + ) + + result = tool.execute({"prompt": "test"}) + + assert result.success is False + assert result.data["provider"] == "comfyui" + assert result.data["missing_models"][0]["name"] == "flux2-vae.safetensors" + assert result.data["missing_models"][0]["destination_hint"] == "ComfyUI/models/vae/" + assert result.data["missing_models"][0]["download_url"] + + def test_video_missing_models_are_structured(self): + tool = ComfyUIVideo() + tool._client.is_available = lambda: True + tool._client.check_models = lambda required: ( + [], + ["wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors"], + ) + + result = tool.execute({"prompt": "test", "operation": "text_to_video"}) + + assert result.success is False + assert result.data["operation"] == "text_to_video" + assert result.data["missing_models"][0]["role"] == "diffusion_model_high_noise" + assert result.data["missing_models"][0]["download_url"] + + def test_bundled_workflow_provenance_records_hash_and_stack(self, tmp_path): + tool = ComfyUIImage() + tool._client.is_available = lambda: True + tool._client.check_models = lambda required: (list(required), []) + tool._client.generate = lambda workflow, output_node, dest, **kwargs: [Path(dest)] + + result = tool.execute({ + "prompt": "test", + "output_path": str(tmp_path / "image.png"), + }) + + provenance = result.data["workflow_provenance"] + assert provenance["source"] == "bundled" + assert provenance["workflow_hash_sha256"] + assert any(item["role"] == "vae" for item in provenance["model_stack"]) + + +class TestComfyUISetupOffer: + + def test_provider_menu_summary_includes_structured_setup_offer(self): + registry = ToolRegistry() + tool = ComfyUIImage() + tool._client.is_available = lambda: False + registry.register(tool) + registry._discovered_packages.add("tools") + + summary = registry.provider_menu_summary() + + offer = summary["setup_offers"][0] + assert offer["tool"] == "comfyui_image" + assert offer["env_var"] == "COMFYUI_SERVER_URL" + assert offer["default_url"] == "http://localhost:8188" + assert offer["health_check"] == "GET /system_stats" + + +# ------------------------------------------------------------------ +# Operation-specific video readiness +# ------------------------------------------------------------------ + +class TestVideoOperationReadiness: + + def test_video_tool_reports_partial_operation_readiness(self): + from tools.video.comfyui_video import _REQUIRED_MODELS_I2V, _REQUIRED_MODELS_T2V + + tool = ComfyUIVideo() + tool._client.is_available = lambda: True + + def fake_check_models(required): + if required == _REQUIRED_MODELS_T2V: + return list(required), [] + if required == _REQUIRED_MODELS_I2V: + return [], list(required) + return [], list(required) + + tool._client.check_models = fake_check_models + + assert tool.get_status() == ToolStatus.AVAILABLE + assert tool.is_operation_available("text_to_video") is True + assert tool.is_operation_available("image_to_video") is False + assert tool.operation_statuses() == { + "text_to_video": "available", + "image_to_video": "degraded", + } + + def test_video_selector_filters_operation_unready_tools(self): + class PartialVideoTool(BaseTool): + name = "partial_video" + capability = "video_generation" + provider = "partial" + supports = {"image_to_video": True} + input_schema = {"type": "object", "properties": {}} + + def is_operation_available(self, operation): + return operation == "text_to_video" + + def execute(self, inputs): + raise AssertionError("not used") + + selector = VideoSelector() + candidates = [PartialVideoTool()] + + assert selector._filter_candidates( + {"operation": "image_to_video"}, candidates + ) == [] + + def test_video_selector_rank_uses_target_operation_for_readiness(self): + class PartialVideoTool(BaseTool): + name = "partial_video" + capability = "video_generation" + provider = "partial" + supports = {"image_to_video": True} + input_schema = {"type": "object", "properties": {}} + + def is_operation_available(self, operation): + return operation == "text_to_video" + + def execute(self, inputs): + raise AssertionError("not used") + + selector = VideoSelector() + candidates = [PartialVideoTool()] + rank_inputs = selector._rank_inputs({ + "operation": "rank", + "target_operation": "image_to_video", + }) + + assert rank_inputs["operation"] == "image_to_video" + assert selector._filter_candidates(rank_inputs, candidates) == [] + diff --git a/tools/_comfyui/__init__.py b/tools/_comfyui/__init__.py new file mode 100644 index 00000000..fa7beab6 --- /dev/null +++ b/tools/_comfyui/__init__.py @@ -0,0 +1 @@ +"""ComfyUI integration — shared client and bundled workflow templates.""" diff --git a/tools/_comfyui/client.py b/tools/_comfyui/client.py new file mode 100644 index 00000000..fa80e843 --- /dev/null +++ b/tools/_comfyui/client.py @@ -0,0 +1,296 @@ +"""Thin REST client for a running ComfyUI server. + +Handles the full generation cycle: submit workflow, poll for completion, +download artifacts. Used by comfyui_image, comfyui_video, and comfyui_music. +""" + +from __future__ import annotations + +import copy +import json +import os +import random +import time +from pathlib import Path +from typing import Any + +import requests + + +class ComfyUIError(Exception): + """Raised when ComfyUI returns an error or times out.""" + + +class ComfyUIClient: + """Client for the ComfyUI REST API. + + The protocol is simple and battle-tested: + 1. POST /prompt → queue a workflow, get a prompt_id + 2. GET /history/{id} → poll until outputs appear + 3. GET /view?filename=… → download the generated artifact + 4. POST /upload/image → stage a local image for I2V workflows + """ + + def __init__(self, server_url: str | None = None) -> None: + self.server_url = ( + server_url + or os.environ.get("COMFYUI_SERVER_URL", "http://localhost:8188") + ).rstrip("/") + + # ------------------------------------------------------------------ + # Health + # ------------------------------------------------------------------ + + @property + def is_default_url(self) -> bool: + """True if using the fallback URL (user didn't set COMFYUI_SERVER_URL).""" + return not os.environ.get("COMFYUI_SERVER_URL") + + def is_available(self) -> bool: + """Return True if the ComfyUI server is reachable.""" + try: + resp = requests.get( + f"{self.server_url}/system_stats", timeout=5 + ) + return resp.status_code == 200 + except Exception: + return False + + def unavailable_reason(self) -> str: + """Human-readable explanation of why the server can't be reached.""" + if self.is_default_url: + return ( + f"No ComfyUI server found at {self.server_url} " + f"(default — no COMFYUI_SERVER_URL configured).\n" + f"Set COMFYUI_SERVER_URL in your .env file to the address of " + f"your ComfyUI server (e.g. http://localhost:8188)." + ) + return ( + f"ComfyUI server not reachable at {self.server_url}.\n" + f"Check that ComfyUI is running and the URL is correct." + ) + + # ------------------------------------------------------------------ + # Model discovery + # ------------------------------------------------------------------ + + def list_models(self) -> dict[str, list[str]]: + """Query ComfyUI for available models, grouped by type. + + Returns a dict like:: + + { + "checkpoints": ["sd_xl_base.safetensors", ...], + "diffusion_models": ["flux2-dev-nvfp4.safetensors", ...], + "vae": ["ae.safetensors", ...], + "clip": ["clip_l.safetensors", ...], + "loras": ["my_lora.safetensors", ...], + } + """ + node_to_key = { + "CheckpointLoaderSimple": ("ckpt_name", "checkpoints"), + "UNETLoader": ("unet_name", "diffusion_models"), + "VAELoader": ("vae_name", "vae"), + "CLIPLoader": ("clip_name", "clip"), + "LoraLoaderModelOnly": ("lora_name", "loras"), + } + result: dict[str, list[str]] = {} + for node_class, (field, group) in node_to_key.items(): + try: + resp = requests.get( + f"{self.server_url}/object_info/{node_class}", timeout=10 + ) + resp.raise_for_status() + data = resp.json() + options = ( + data.get(node_class, {}) + .get("input", {}) + .get("required", {}) + .get(field, [[]])[0] + ) + if isinstance(options, list): + result[group] = options + except Exception: + result[group] = [] + return result + + def check_models( + self, required: list[str] + ) -> tuple[list[str], list[str]]: + """Check which of *required* model filenames are available. + + Returns ``(found, missing)`` — two lists of filenames. + """ + all_models: set[str] = set() + for names in self.list_models().values(): + all_models.update(names) + + found = [m for m in required if m in all_models] + missing = [m for m in required if m not in all_models] + return found, missing + + # ------------------------------------------------------------------ + # Core cycle + # ------------------------------------------------------------------ + + def submit(self, workflow: dict) -> str: + """Queue a workflow for execution. Returns the ``prompt_id``.""" + resp = requests.post( + f"{self.server_url}/prompt", + json={"prompt": workflow}, + timeout=30, + ) + try: + data = resp.json() + except ValueError: + data = {} + if data.get("node_errors"): + raise ComfyUIError(f"Node errors: {json.dumps(data['node_errors'])}") + if data.get("error"): + raise ComfyUIError(f"Prompt error: {json.dumps(data['error'])}") + resp.raise_for_status() + prompt_id = data.get("prompt_id") + if not prompt_id: + raise ComfyUIError(f"No prompt_id in response: {data}") + return prompt_id + + def poll( + self, + prompt_id: str, + *, + timeout: int = 600, + interval: int = 5, + ) -> dict: + """Block until *prompt_id* finishes. Returns the history entry.""" + deadline = time.time() + timeout + while time.time() < deadline: + resp = requests.get( + f"{self.server_url}/history/{prompt_id}", timeout=10 + ) + resp.raise_for_status() + history = resp.json() + if prompt_id in history: + entry = history[prompt_id] + status = entry.get("status", {}) + if status.get("status_str") == "error": + msgs = status.get("messages", []) + raise ComfyUIError(f"Execution error: {msgs}") + return entry + time.sleep(interval) + raise ComfyUIError( + f"Prompt {prompt_id} did not complete within {timeout}s" + ) + + def download( + self, + filename: str, + subfolder: str, + dest: Path, + folder_type: str = "output", + ) -> Path: + """Download an output artifact from the ComfyUI server.""" + resp = requests.get( + f"{self.server_url}/view", + params={ + "filename": filename, + "subfolder": subfolder, + "type": folder_type, + }, + timeout=120, + ) + resp.raise_for_status() + dest.parent.mkdir(parents=True, exist_ok=True) + dest.write_bytes(resp.content) + return dest + + def upload_image(self, local_path: Path, name: str) -> str: + """Upload a local image so it can be referenced by LoadImage nodes. + + Returns the server-side filename. + """ + with open(local_path, "rb") as f: + resp = requests.post( + f"{self.server_url}/upload/image", + files={"image": (name, f, "image/png")}, + timeout=30, + ) + resp.raise_for_status() + return resp.json()["name"] + + # ------------------------------------------------------------------ + # High-level helper + # ------------------------------------------------------------------ + + def generate( + self, + workflow: dict, + output_node: str, + dest: Path, + *, + timeout: int = 600, + interval: int = 5, + ) -> list[Path]: + """Submit → poll → download. Returns list of artifact paths.""" + prompt_id = self.submit(workflow) + entry = self.poll(prompt_id, timeout=timeout, interval=interval) + + outputs = entry.get("outputs", {}) + node_output = outputs.get(output_node, {}) + + # ComfyUI stores images and videos under the "images" key + items = node_output.get("images", []) or node_output.get("gifs", []) + if not items: + raise ComfyUIError( + f"No output artifacts on node {output_node}. " + f"Available nodes: {list(outputs.keys())}" + ) + + paths: list[Path] = [] + for i, item in enumerate(items): + suffix = Path(item["filename"]).suffix + if len(items) == 1: + target = dest + else: + target = dest.with_stem(f"{dest.stem}_{i:03d}").with_suffix(suffix) + self.download( + item["filename"], + item.get("subfolder", ""), + target, + item.get("type", "output"), + ) + paths.append(target) + return paths + + # ------------------------------------------------------------------ + # Workflow helpers + # ------------------------------------------------------------------ + + @staticmethod + def load_workflow(path: Path) -> dict: + """Load a workflow JSON template from disk.""" + with open(path) as f: + return json.load(f) + + @staticmethod + def patch_workflow( + workflow: dict, patches: dict[str, dict[str, Any]] + ) -> dict: + """Deep-copy *workflow* and apply *patches*. + + *patches* maps ``node_id`` → ``{input_name: value, ...}``. + """ + w = copy.deepcopy(workflow) + for node_id, values in patches.items(): + if node_id not in w: + raise ComfyUIError( + f"Node {node_id!r} not found in workflow. " + f"Available: {list(w.keys())}" + ) + for key, val in values.items(): + w[node_id]["inputs"][key] = val + return w + + @staticmethod + def random_seed() -> int: + """Return a random seed suitable for ComfyUI noise nodes.""" + return random.randint(0, 2**32 - 1) diff --git a/tools/_comfyui/metadata.py b/tools/_comfyui/metadata.py new file mode 100644 index 00000000..483beed2 --- /dev/null +++ b/tools/_comfyui/metadata.py @@ -0,0 +1,222 @@ +"""Shared metadata helpers for ComfyUI provider tools.""" + +from __future__ import annotations + +import hashlib +import json +from typing import Any + + +COMFYUI_SETUP_OFFER: dict[str, Any] = { + "kind": "local_server", + "fix_complexity": "1-minute env-var if ComfyUI is already running; otherwise local install", + "env_var": "COMFYUI_SERVER_URL", + "default_url": "http://localhost:8188", + "health_check": "GET /system_stats", + "what_it_unlocks": [ + "free local image generation through ComfyUI workflows", + "free local video generation through ComfyUI workflows", + "community workflow_json/workflow_path execution", + ], +} + + +BUNDLED_MODEL_STACKS: dict[str, list[dict[str, Any]]] = { + "flux2-txt2img": [ + { + "role": "diffusion_model", + "name": "flux2-dev-nvfp4.safetensors", + "quantization": "NVFP4", + "destination_hint": "ComfyUI/models/diffusion_models/", + "download_url": ( + "https://huggingface.co/black-forest-labs/FLUX.2-dev-NVFP4" + ), + }, + { + "role": "text_encoder", + "name": "mistral_3_small_flux2_fp4_mixed.safetensors", + "quantization": "FP4 mixed", + "destination_hint": "ComfyUI/models/text_encoders/", + "download_url": ( + "https://huggingface.co/Comfy-Org/flux2-dev/tree/main/" + "split_files/text_encoders" + ), + }, + { + "role": "vae", + "name": "flux2-vae.safetensors", + "destination_hint": "ComfyUI/models/vae/", + "download_url": ( + "https://huggingface.co/Comfy-Org/flux2-dev/blob/main/" + "split_files/vae/flux2-vae.safetensors" + ), + }, + ], + "wan22-t2v-4step": [ + { + "role": "text_encoder", + "name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors", + "quantization": "FP8", + "destination_hint": "ComfyUI/models/text_encoders/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/text_encoders" + ), + }, + { + "role": "diffusion_model_high_noise", + "name": "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors", + "quantization": "FP8", + "destination_hint": "ComfyUI/models/diffusion_models/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "blob/main/split_files/diffusion_models/" + "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors" + ), + }, + { + "role": "diffusion_model_low_noise", + "name": "wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors", + "quantization": "FP8", + "destination_hint": "ComfyUI/models/diffusion_models/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/diffusion_models" + ), + }, + { + "role": "vae", + "name": "wan2.2_vae.safetensors", + "destination_hint": "ComfyUI/models/vae/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/vae" + ), + }, + { + "role": "lora", + "name": "wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors", + "strength_model": 1.0, + "destination_hint": "ComfyUI/models/loras/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/loras" + ), + }, + { + "role": "lora", + "name": "wan2.2_t2v_lightx2v_4steps_lora_v1.1_low_noise.safetensors", + "strength_model": 1.0, + "destination_hint": "ComfyUI/models/loras/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/loras" + ), + }, + ], + "wan22-i2v-4step": [ + { + "role": "text_encoder", + "name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors", + "quantization": "FP8", + "destination_hint": "ComfyUI/models/text_encoders/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/text_encoders" + ), + }, + { + "role": "diffusion_model_high_noise", + "name": "wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors", + "quantization": "FP8", + "destination_hint": "ComfyUI/models/diffusion_models/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "blob/main/split_files/diffusion_models/" + "wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors" + ), + }, + { + "role": "diffusion_model_low_noise", + "name": "wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors", + "quantization": "FP8", + "destination_hint": "ComfyUI/models/diffusion_models/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/diffusion_models" + ), + }, + { + "role": "vae", + "name": "wan_2.1_vae.safetensors", + "destination_hint": "ComfyUI/models/vae/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/" + "tree/main/split_files/vae" + ), + }, + { + "role": "lora", + "name": "wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors", + "strength_model": 1.0, + "destination_hint": "ComfyUI/models/loras/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/loras" + ), + }, + { + "role": "lora", + "name": "wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors", + "strength_model": 1.0, + "destination_hint": "ComfyUI/models/loras/", + "download_url": ( + "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/" + "tree/main/split_files/loras" + ), + }, + ], +} + + +def workflow_hash(workflow: dict[str, Any]) -> str: + """Return a stable hash of the final workflow JSON submitted to ComfyUI.""" + payload = json.dumps(workflow, sort_keys=True, separators=(",", ":")) + return hashlib.sha256(payload.encode("utf-8")).hexdigest() + + +def model_stack(workflow_key: str | None, inputs: dict[str, Any]) -> list[dict[str, Any]]: + """Return bundled or caller-supplied model stack metadata.""" + if workflow_key: + return [dict(item) for item in BUNDLED_MODEL_STACKS[workflow_key]] + stack = inputs.get("workflow_model_stack") + return stack if isinstance(stack, list) else [] + + +def missing_models_payload( + missing: list[str], + *, + workflow_key: str, + workflow_name: str, + operation: str | None = None, +) -> dict[str, Any]: + """Build a machine-readable missing-model error payload.""" + stack_by_name = { + item["name"]: item for item in BUNDLED_MODEL_STACKS.get(workflow_key, []) + } + items = [] + for name in missing: + meta = dict(stack_by_name.get(name, {})) + meta.setdefault("name", name) + meta.setdefault("role", "unknown") + meta.setdefault("destination_hint", "ComfyUI/models/ matching the workflow node") + meta.setdefault("download_url", None) + items.append(meta) + + return { + "provider": "comfyui", + "workflow": workflow_name, + "operation": operation, + "missing_models": items, + "setup_offer": COMFYUI_SETUP_OFFER, + } diff --git a/tools/_comfyui/workflows/flux2-txt2img.json b/tools/_comfyui/workflows/flux2-txt2img.json new file mode 100644 index 00000000..4dbb8981 --- /dev/null +++ b/tools/_comfyui/workflows/flux2-txt2img.json @@ -0,0 +1,96 @@ +{ + "1": { + "class_type": "UNETLoader", + "inputs": { + "unet_name": "flux2-dev-nvfp4.safetensors", + "weight_dtype": "default" + } + }, + "2": { + "class_type": "CLIPLoader", + "inputs": { + "clip_name": "mistral_3_small_flux2_fp4_mixed.safetensors", + "type": "flux2", + "device": "cpu" + } + }, + "3": { + "class_type": "VAELoader", + "inputs": { + "vae_name": "flux2-vae.safetensors" + } + }, + "4": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["2", 0], + "text": "" + } + }, + "5": { + "class_type": "FluxGuidance", + "inputs": { + "conditioning": ["4", 0], + "guidance": 3.5 + } + }, + "6": { + "class_type": "EmptyFlux2LatentImage", + "inputs": { + "width": 1024, + "height": 1024, + "batch_size": 1 + } + }, + "7": { + "class_type": "RandomNoise", + "inputs": { + "noise_seed": 42 + } + }, + "8": { + "class_type": "BasicGuider", + "inputs": { + "model": ["1", 0], + "conditioning": ["5", 0] + } + }, + "9": { + "class_type": "KSamplerSelect", + "inputs": { + "sampler_name": "euler" + } + }, + "10": { + "class_type": "Flux2Scheduler", + "inputs": { + "steps": 20, + "width": 1024, + "height": 1024 + } + }, + "11": { + "class_type": "SamplerCustomAdvanced", + "inputs": { + "noise": ["7", 0], + "guider": ["8", 0], + "sampler": ["9", 0], + "sigmas": ["10", 0], + "latent_image": ["6", 0] + } + }, + "12": { + "class_type": "VAEDecode", + "inputs": { + "samples": ["11", 0], + "vae": ["3", 0] + } + }, + "13": { + "class_type": "SaveImage", + "inputs": { + "images": ["12", 0], + "filename_prefix": "openmontage" + } + } +} diff --git a/tools/_comfyui/workflows/wan22-i2v-4step.json b/tools/_comfyui/workflows/wan22-i2v-4step.json new file mode 100644 index 00000000..83313a67 --- /dev/null +++ b/tools/_comfyui/workflows/wan22-i2v-4step.json @@ -0,0 +1,154 @@ +{ + "84": { + "class_type": "CLIPLoader", + "inputs": { + "clip_name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors", + "type": "wan", + "device": "default" + } + }, + "89": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["84", 0], + "text": "oversaturated, overexposed, static, blurry details, subtitles, style, artwork, painting, still frame, gray overall, worst quality, low quality, JPEG artifacts, ugly, deformed, extra fingers, poorly drawn hands, poorly drawn face, deformed limbs, fused fingers, static frame, cluttered background, three legs, many people in background, walking backwards" + } + }, + "90": { + "class_type": "VAELoader", + "inputs": { + "vae_name": "wan_2.1_vae.safetensors" + } + }, + "93": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["84", 0], + "text": "" + } + }, + "95": { + "class_type": "UNETLoader", + "inputs": { + "unet_name": "wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors", + "weight_dtype": "default" + } + }, + "96": { + "class_type": "UNETLoader", + "inputs": { + "unet_name": "wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors", + "weight_dtype": "default" + } + }, + "97": { + "class_type": "LoadImage", + "inputs": { + "image": "" + } + }, + "98": { + "class_type": "WanImageToVideo", + "inputs": { + "width": 640, + "height": 640, + "length": 81, + "batch_size": 1, + "positive": ["93", 0], + "negative": ["89", 0], + "vae": ["90", 0], + "start_image": ["97", 0] + } + }, + "101": { + "class_type": "LoraLoaderModelOnly", + "inputs": { + "model": ["95", 0], + "lora_name": "wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors", + "strength_model": 1.0 + } + }, + "102": { + "class_type": "LoraLoaderModelOnly", + "inputs": { + "model": ["96", 0], + "lora_name": "wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors", + "strength_model": 1.0 + } + }, + "103": { + "class_type": "ModelSamplingSD3", + "inputs": { + "model": ["102", 0], + "shift": 5.0 + } + }, + "104": { + "class_type": "ModelSamplingSD3", + "inputs": { + "model": ["101", 0], + "shift": 5.0 + } + }, + "86": { + "class_type": "KSamplerAdvanced", + "inputs": { + "model": ["104", 0], + "positive": ["98", 0], + "negative": ["98", 1], + "latent_image": ["98", 2], + "add_noise": "enable", + "noise_seed": 42, + "control_after_generate": "randomize", + "steps": 4, + "cfg": 1.0, + "sampler_name": "euler", + "scheduler": "simple", + "start_at_step": 0, + "end_at_step": 2, + "return_with_leftover_noise": "enable" + } + }, + "85": { + "class_type": "KSamplerAdvanced", + "inputs": { + "model": ["103", 0], + "positive": ["98", 0], + "negative": ["98", 1], + "latent_image": ["86", 0], + "add_noise": "disable", + "noise_seed": 0, + "control_after_generate": "fixed", + "steps": 4, + "cfg": 1.0, + "sampler_name": "euler", + "scheduler": "simple", + "start_at_step": 2, + "end_at_step": 4, + "return_with_leftover_noise": "disable" + } + }, + "87": { + "class_type": "VAEDecode", + "inputs": { + "samples": ["85", 0], + "vae": ["90", 0] + } + }, + "94": { + "class_type": "CreateVideo", + "inputs": { + "images": ["87", 0], + "fps": 16 + } + }, + "108": { + "class_type": "SaveVideo", + "inputs": { + "video": ["94", 0], + "filename_prefix": "openmontage_i2v", + "format": "auto", + "codec": "auto" + } + } +} diff --git a/tools/_comfyui/workflows/wan22-t2v-4step.json b/tools/_comfyui/workflows/wan22-t2v-4step.json new file mode 100644 index 00000000..f5772aca --- /dev/null +++ b/tools/_comfyui/workflows/wan22-t2v-4step.json @@ -0,0 +1,143 @@ +{ + "1": { + "class_type": "CLIPLoader", + "inputs": { + "clip_name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors", + "type": "wan", + "device": "default" + } + }, + "2": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["1", 0], + "text": "" + } + }, + "3": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["1", 0], + "text": "oversaturated, overexposed, static, blurry details, subtitles, style, artwork, painting, still frame, gray overall, worst quality, low quality, JPEG artifacts, ugly, deformed, extra fingers, poorly drawn hands, poorly drawn face, deformed limbs, fused fingers, static frame, cluttered background, three legs, many people in background, walking backwards" + } + }, + "4": { + "class_type": "VAELoader", + "inputs": { + "vae_name": "wan2.2_vae.safetensors" + } + }, + "5": { + "class_type": "UNETLoader", + "inputs": { + "unet_name": "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors", + "weight_dtype": "default" + } + }, + "6": { + "class_type": "UNETLoader", + "inputs": { + "unet_name": "wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors", + "weight_dtype": "default" + } + }, + "7": { + "class_type": "LoraLoaderModelOnly", + "inputs": { + "model": ["5", 0], + "lora_name": "wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors", + "strength_model": 1.0 + } + }, + "8": { + "class_type": "LoraLoaderModelOnly", + "inputs": { + "model": ["6", 0], + "lora_name": "wan2.2_t2v_lightx2v_4steps_lora_v1.1_low_noise.safetensors", + "strength_model": 1.0 + } + }, + "9": { + "class_type": "ModelSamplingSD3", + "inputs": { + "model": ["7", 0], + "shift": 5.0 + } + }, + "10": { + "class_type": "ModelSamplingSD3", + "inputs": { + "model": ["8", 0], + "shift": 5.0 + } + }, + "11": { + "class_type": "EmptyLatentImage", + "inputs": { + "width": 832, + "height": 480, + "batch_size": 81 + } + }, + "12": { + "class_type": "KSamplerAdvanced", + "inputs": { + "model": ["9", 0], + "positive": ["2", 0], + "negative": ["3", 0], + "latent_image": ["11", 0], + "add_noise": "enable", + "noise_seed": 42, + "control_after_generate": "randomize", + "steps": 4, + "cfg": 1.0, + "sampler_name": "euler", + "scheduler": "simple", + "start_at_step": 0, + "end_at_step": 2, + "return_with_leftover_noise": "enable" + } + }, + "13": { + "class_type": "KSamplerAdvanced", + "inputs": { + "model": ["10", 0], + "positive": ["2", 0], + "negative": ["3", 0], + "latent_image": ["12", 0], + "add_noise": "disable", + "noise_seed": 0, + "control_after_generate": "fixed", + "steps": 4, + "cfg": 1.0, + "sampler_name": "euler", + "scheduler": "simple", + "start_at_step": 2, + "end_at_step": 4, + "return_with_leftover_noise": "disable" + } + }, + "14": { + "class_type": "VAEDecode", + "inputs": { + "samples": ["13", 0], + "vae": ["4", 0] + } + }, + "15": { + "class_type": "CreateVideo", + "inputs": { + "images": ["14", 0], + "fps": 16 + } + }, + "16": { + "class_type": "SaveVideo", + "inputs": { + "video": ["15", 0], + "filename_prefix": "openmontage_t2v", + "format": "auto", + "codec": "auto" + } + } +} diff --git a/tools/graphics/comfyui_image.py b/tools/graphics/comfyui_image.py new file mode 100644 index 00000000..4c55d94b --- /dev/null +++ b/tools/graphics/comfyui_image.py @@ -0,0 +1,295 @@ +"""ComfyUI image generation via a local or remote ComfyUI server. + +Default workflow: FLUX 2 Dev (NVFP4) with Mistral text encoder. +Supports custom workflows via the ``workflow_json`` input. +""" + +from __future__ import annotations + +import json +import time +from pathlib import Path +from typing import Any + +from tools.base_tool import ( + BaseTool, + Determinism, + ExecutionMode, + ResourceProfile, + RetryPolicy, + ToolResult, + ToolRuntime, + ToolStability, + ToolStatus, + ToolTier, +) +from tools._comfyui.client import ComfyUIClient, ComfyUIError +from tools._comfyui.metadata import ( + BUNDLED_MODEL_STACKS, + COMFYUI_SETUP_OFFER, + missing_models_payload, + model_stack, + workflow_hash, +) + +_WORKFLOWS = Path(__file__).resolve().parent.parent / "_comfyui" / "workflows" + +# Models required by the bundled flux2-txt2img workflow +_REQUIRED_MODELS = [ + "flux2-dev-nvfp4.safetensors", + "mistral_3_small_flux2_fp4_mixed.safetensors", + "flux2-vae.safetensors", +] + + +class ComfyUIImage(BaseTool): + name = "comfyui_image" + version = "0.1.0" + tier = ToolTier.GENERATE + capability = "image_generation" + provider = "comfyui" + stability = ToolStability.EXPERIMENTAL + execution_mode = ExecutionMode.SYNC + determinism = Determinism.SEEDED + runtime = ToolRuntime.LOCAL_GPU + + dependencies = [] # checked at runtime via server health + setup_offer = COMFYUI_SETUP_OFFER + install_instructions = ( + "Start a ComfyUI server and set COMFYUI_SERVER_URL " + "(default http://localhost:8188).\n" + "See https://github.com/comfyanonymous/ComfyUI for setup." + ) + agent_skills = ["comfyui", "flux-best-practices"] + + capabilities = ["text_to_image"] + supports = { + "seed": True, + "custom_size": True, + "custom_workflow": True, + "custom_output_node": True, + "offline": True, + } + best_for = [ + "local GPU generation without API costs", + "Blackwell / DGX Spark hardware where diffusers is unsupported", + "full control over sampling via custom ComfyUI workflows", + ] + not_good_for = [ + "setups without a running ComfyUI server", + "CPU-only machines", + ] + fallback = "flux_image" + fallback_tools = ["flux_image", "local_diffusion", "openai_image"] + + input_schema = { + "type": "object", + "required": ["prompt"], + "properties": { + "prompt": {"type": "string", "description": "Text prompt for image generation"}, + "width": {"type": "integer", "default": 1024}, + "height": {"type": "integer", "default": 1024}, + "steps": {"type": "integer", "default": 20}, + "guidance": {"type": "number", "default": 3.5}, + "seed": {"type": "integer", "description": "Random if omitted"}, + "output_path": {"type": "string", "description": "Where to save the image"}, + "workflow_json": { + "type": "string", + "description": "Optional full ComfyUI workflow JSON. Requires output_node.", + }, + "workflow_path": { + "type": "string", + "description": "Optional path to a ComfyUI workflow JSON file. Requires output_node.", + }, + "output_node": { + "type": "string", + "description": "ComfyUI output node ID for custom workflow_json/workflow_path.", + }, + "workflow_name": { + "type": "string", + "description": "Optional human-readable provenance label for a custom workflow.", + }, + "workflow_model": { + "type": "string", + "description": "Optional model/provenance label for a custom workflow.", + }, + "workflow_model_stack": { + "type": "array", + "description": ( + "Optional provenance metadata for custom workflow dependencies. " + "Items should include name, role, quantization, and LoRA strengths when known." + ), + "items": {"type": "object"}, + }, + }, + } + + resource_profile = ResourceProfile( + cpu_cores=2, ram_mb=8000, vram_mb=8000, disk_mb=500, network_required=False, + ) + retry_policy = RetryPolicy(max_retries=1, retryable_errors=["timeout"]) + idempotency_key_fields = ["prompt", "width", "height", "steps", "seed"] + side_effects = ["writes image file to output_path"] + user_visible_verification = ["Inspect generated image for quality and prompt adherence"] + + def __init__(self) -> None: + self._client = ComfyUIClient() + + def get_status(self) -> ToolStatus: + if not self._client.is_available(): + return ToolStatus.UNAVAILABLE + _, missing = self._client.check_models(_REQUIRED_MODELS) + if missing: + return ToolStatus.DEGRADED + return ToolStatus.AVAILABLE + + def estimate_cost(self, inputs: dict[str, Any]) -> float: + return 0.0 + + def estimate_runtime(self, inputs: dict[str, Any]) -> float: + return float(inputs.get("steps", 20)) * 1.5 + + def get_info(self) -> dict[str, Any]: + info = super().get_info() + info["setup_offer"] = self.setup_offer + info["bundled_model_stack"] = BUNDLED_MODEL_STACKS["flux2-txt2img"] + return info + + def execute(self, inputs: dict[str, Any]) -> ToolResult: + custom_workflow = bool(inputs.get("workflow_json") or inputs.get("workflow_path")) + if custom_workflow and not inputs.get("output_node"): + return ToolResult( + success=False, + error=( + "Custom ComfyUI workflows require output_node so OpenMontage " + "knows which ComfyUI node to download artifacts from." + ), + ) + + if not self._client.is_available(): + return ToolResult( + success=False, + error=self._client.unavailable_reason(), + ) + + if not custom_workflow: + _, missing = self._client.check_models(_REQUIRED_MODELS) + if missing: + return ToolResult( + success=False, + data=missing_models_payload( + missing, + workflow_key="flux2-txt2img", + workflow_name="flux2-txt2img.json", + ), + error=( + f"ComfyUI server is running but missing required models: " + f"{', '.join(missing)}.\n" + f"See data.missing_models for destination hints and download URLs." + ), + ) + + start = time.time() + seed = inputs.get("seed") or ComfyUIClient.random_seed() + width = inputs.get("width", 1024) + height = inputs.get("height", 1024) + steps = inputs.get("steps", 20) + guidance = inputs.get("guidance", 3.5) + output_path = Path(inputs.get("output_path", f"comfyui_image_{seed}.png")) + + try: + if custom_workflow: + workflow = self._load_custom_workflow(inputs) + output_node = str(inputs["output_node"]) + else: + workflow = ComfyUIClient.load_workflow(_WORKFLOWS / "flux2-txt2img.json") + workflow = ComfyUIClient.patch_workflow(workflow, { + "4": {"text": inputs["prompt"]}, + "5": {"guidance": guidance}, + "6": {"width": width, "height": height, "batch_size": 1}, + "7": {"noise_seed": seed}, + "10": {"steps": steps, "width": width, "height": height}, + "13": {"filename_prefix": output_path.stem}, + }) + output_node = "13" + + provenance = self._workflow_provenance( + inputs, custom_workflow, output_node, workflow + ) + paths = self._client.generate( + workflow, output_node=output_node, dest=output_path, timeout=600, + ) + + except ComfyUIError as exc: + return ToolResult(success=False, error=str(exc)) + except Exception as exc: + return ToolResult(success=False, error=f"ComfyUI image generation failed: {exc}") + + model_name = self._model_name(inputs, custom_workflow) + return ToolResult( + success=True, + data={ + "provider": "comfyui", + "model": model_name, + "prompt": inputs["prompt"], + "width": width, + "height": height, + "steps": steps, + "guidance": guidance, + "output": str(paths[0]), + "format": "png", + "workflow_provenance": provenance, + }, + artifacts=[str(p) for p in paths], + cost_usd=0.0, + duration_seconds=round(time.time() - start, 2), + seed=seed, + model=model_name, + ) + + @staticmethod + def _load_custom_workflow(inputs: dict[str, Any]) -> dict: + if inputs.get("workflow_json"): + return json.loads(inputs["workflow_json"]) + return ComfyUIClient.load_workflow(Path(inputs["workflow_path"])) + + @staticmethod + def _model_name(inputs: dict[str, Any], custom_workflow: bool) -> str: + if not custom_workflow: + return "flux2-dev-nvfp4" + return ( + inputs.get("workflow_model") + or inputs.get("model") + or inputs.get("workflow_name") + or "custom-comfyui-workflow" + ) + + @staticmethod + def _workflow_provenance( + inputs: dict[str, Any], + custom_workflow: bool, + output_node: str, + workflow: dict[str, Any], + ) -> dict[str, Any]: + if not custom_workflow: + return { + "source": "bundled", + "workflow": "flux2-txt2img.json", + "workflow_hash_sha256": workflow_hash(workflow), + "model_stack": model_stack("flux2-txt2img", inputs), + "output_node": output_node, + } + return { + "source": "user_supplied", + "workflow_name": inputs.get("workflow_name"), + "workflow_path": inputs.get("workflow_path"), + "model": inputs.get("workflow_model") or inputs.get("model"), + "workflow_hash_sha256": workflow_hash(workflow), + "model_stack": model_stack(None, inputs), + "model_stack_source": ( + "caller_supplied" + if inputs.get("workflow_model_stack") + else "unknown_custom_workflow" + ), + "output_node": output_node, + } diff --git a/tools/tool_registry.py b/tools/tool_registry.py index 4c61d68a..3d14a332 100644 --- a/tools/tool_registry.py +++ b/tools/tool_registry.py @@ -269,6 +269,7 @@ def provider_menu(self) -> dict[str, dict[str, Any]]: "provider": tool.provider, "runtime": tool.runtime.value, "best_for": tool.best_for, + "dependencies": info.get("dependencies", []), "install_instructions": tool.install_instructions, "status": status.value, } @@ -278,6 +279,10 @@ def provider_menu(self) -> dict[str, dict[str, Any]]: "render_engines", "remotion_note", "provider_matrix", + "setup_offer", + "operation_statuses", + "resource_profiles", + "resource_profile_note", ): if extra_key in info: entry[extra_key] = info[extra_key] @@ -385,6 +390,40 @@ def provider_menu_summary(self) -> dict[str, Any]: setup_offers: list[dict[str, Any]] = [] for cap, bucket in menu.items(): for entry in bucket.get("unavailable", []): + offer = entry.get("setup_offer") + if offer: + setup_offers.append( + { + "capability": cap, + "tool": entry.get("name"), + "provider": entry.get("provider"), + "runtime": entry.get("runtime"), + "install_instructions": entry.get("install_instructions") or "", + **offer, + } + ) + continue + + env_vars = [ + dep[4:] + for dep in entry.get("dependencies", []) + if isinstance(dep, str) and dep.startswith("env:") + ] + if env_vars: + setup_offers.append( + { + "capability": cap, + "tool": entry.get("name"), + "provider": entry.get("provider"), + "runtime": entry.get("runtime"), + "kind": "env_var", + "fix_complexity": "1-minute env-var", + "env_vars": env_vars, + "install_instructions": entry.get("install_instructions") or "", + } + ) + continue + hint = entry.get("install_instructions") or "" # Heuristic: 1-minute fixes mention an env var or API key. if any(k in hint.lower() for k in ["api key", "env", "_key=", "_api"]): @@ -393,10 +432,17 @@ def provider_menu_summary(self) -> dict[str, Any]: "capability": cap, "tool": entry.get("name"), "provider": entry.get("provider"), + "runtime": entry.get("runtime"), "install_instructions": hint, } ) + for entry in bucket.get("available", []) + bucket.get("unavailable", []): + if entry.get("resource_profile_note"): + runtime_warnings.append( + f"{entry.get('name')}: {entry.get('resource_profile_note')}" + ) + result = { "composition_runtimes": comp_runtimes, "capabilities": capabilities, diff --git a/tools/video/comfyui_video.py b/tools/video/comfyui_video.py new file mode 100644 index 00000000..e9bb0333 --- /dev/null +++ b/tools/video/comfyui_video.py @@ -0,0 +1,473 @@ +"""ComfyUI video generation via a local or remote ComfyUI server. + +Supports text-to-video and image-to-video using WAN 2.2 14B with +4-step LightX2V LoRA acceleration. Custom workflows are accepted +via the ``workflow_json`` input. +""" + +from __future__ import annotations + +import json +import time +from pathlib import Path +from typing import Any + +import requests + +from tools.base_tool import ( + BaseTool, + Determinism, + ExecutionMode, + ResourceProfile, + RetryPolicy, + ToolResult, + ToolRuntime, + ToolStability, + ToolStatus, + ToolTier, +) +from tools._comfyui.client import ComfyUIClient, ComfyUIError +from tools._comfyui.metadata import ( + BUNDLED_MODEL_STACKS, + COMFYUI_SETUP_OFFER, + missing_models_payload, + model_stack, + workflow_hash, +) + +_WORKFLOWS = Path(__file__).resolve().parent.parent / "_comfyui" / "workflows" + +# Output node IDs in the bundled workflows +_T2V_OUTPUT_NODE = "16" +_I2V_OUTPUT_NODE = "108" + +# Models required by the bundled WAN 2.2 workflows +_REQUIRED_MODELS_COMMON = [ + "umt5_xxl_fp8_e4m3fn_scaled.safetensors", +] +_REQUIRED_MODELS_I2V = [ + *_REQUIRED_MODELS_COMMON, + "wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors", + "wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors", + "wan_2.1_vae.safetensors", + "wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors", + "wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors", +] +_REQUIRED_MODELS_T2V = [ + *_REQUIRED_MODELS_COMMON, + "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors", + "wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors", + "wan2.2_vae.safetensors", + "wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors", + "wan2.2_t2v_lightx2v_4steps_lora_v1.1_low_noise.safetensors", +] + +_RESOURCE_PROFILES = { + "provider_floor": { + "vram_mb": 8000, + "ram_mb": 16000, + "applies_to": ( + "ComfyUI provider availability and low-VRAM custom workflows. " + "Actual requirements depend on workflow_json/workflow_path." + ), + }, + "bundled_wan22_14b_fp8": { + "vram_mb": 16000, + "ram_mb": 32000, + "applies_to": ( + "Bundled WAN 2.2 14B FP8 T2V/I2V workflows. This is not a " + "ComfyUI provider-wide requirement." + ), + }, + "low_vram_custom_workflows": { + "vram_mb": "8000-12000", + "ram_mb": "16000-32000", + "examples": [ + "Wan 2.1 1.3B", + "LTX-Video / LTXV FP8 or quantized workflows", + "Wan 2.2 GGUF / quantized community workflows", + ], + }, +} + + +class ComfyUIVideo(BaseTool): + name = "comfyui_video" + version = "0.1.0" + tier = ToolTier.GENERATE + capability = "video_generation" + provider = "comfyui" + stability = ToolStability.EXPERIMENTAL + execution_mode = ExecutionMode.SYNC + determinism = Determinism.SEEDED + runtime = ToolRuntime.LOCAL_GPU + + dependencies = [] + setup_offer = COMFYUI_SETUP_OFFER + install_instructions = ( + "Start a ComfyUI server and set COMFYUI_SERVER_URL " + "(default http://localhost:8188).\n" + "Requires WAN 2.2 models and LightX2V LoRAs in ComfyUI's model directory." + ) + agent_skills = ["comfyui", "ai-video-gen", "ltx2"] + + capabilities = ["text_to_video", "image_to_video"] + supports = { + "seed": True, + "reference_image": True, + "custom_workflow": True, + "custom_output_node": True, + "offline": True, + } + best_for = [ + "local GPU video generation without API costs", + "Blackwell / DGX Spark hardware where diffusers is unsupported", + "image-to-video with WAN 2.2 14B (4-step accelerated)", + "text-to-video with WAN 2.2 14B (4-step accelerated)", + "custom low-VRAM ComfyUI workflows on 8GB-12GB GPUs", + ] + not_good_for = [ + "setups without a running ComfyUI server", + "CPU-only machines", + "running the bundled WAN 2.2 14B FP8 workflows on GPUs below 16GB VRAM", + ] + fallback = "wan_video" + fallback_tools = ["wan_video", "hunyuan_video", "ltx_video_local", "kling_video"] + + input_schema = { + "type": "object", + "required": ["prompt"], + "properties": { + "prompt": {"type": "string", "description": "Text prompt for video generation"}, + "operation": { + "type": "string", + "enum": ["text_to_video", "image_to_video"], + "default": "text_to_video", + }, + "reference_image_path": { + "type": "string", + "description": "Local path to reference image (for image_to_video)", + }, + "reference_image_url": { + "type": "string", + "description": "URL of reference image (for image_to_video, downloaded first)", + }, + "width": {"type": "integer", "default": 832, "description": "T2V default 832, I2V default 640"}, + "height": {"type": "integer", "default": 480, "description": "T2V default 480, I2V default 640"}, + "num_frames": {"type": "integer", "default": 81, "description": "81 frames = 5s at 16fps"}, + "seed": {"type": "integer", "description": "Random if omitted"}, + "output_path": {"type": "string", "description": "Where to save the video"}, + "workflow_json": { + "type": "string", + "description": "Optional full ComfyUI workflow JSON. Requires output_node.", + }, + "workflow_path": { + "type": "string", + "description": "Optional path to a ComfyUI workflow JSON file. Requires output_node.", + }, + "output_node": { + "type": "string", + "description": "ComfyUI output node ID for custom workflow_json/workflow_path.", + }, + "workflow_name": { + "type": "string", + "description": "Optional human-readable provenance label for a custom workflow.", + }, + "workflow_model": { + "type": "string", + "description": "Optional model/provenance label for a custom workflow.", + }, + "workflow_model_stack": { + "type": "array", + "description": ( + "Optional provenance metadata for custom workflow dependencies. " + "Items should include name, role, quantization, scheduler, " + "and LoRA strengths when known." + ), + "items": {"type": "object"}, + }, + }, + } + + resource_profile = ResourceProfile( + cpu_cores=2, ram_mb=16000, vram_mb=8000, disk_mb=2000, network_required=False, + ) + retry_policy = RetryPolicy(max_retries=1, retryable_errors=["timeout"]) + idempotency_key_fields = ["prompt", "operation", "width", "height", "num_frames", "seed"] + side_effects = ["writes video file to output_path"] + user_visible_verification = ["Watch generated clip for motion coherence and artifacts"] + + def __init__(self) -> None: + self._client = ComfyUIClient() + + def get_status(self) -> ToolStatus: + if not self._client.is_available(): + return ToolStatus.UNAVAILABLE + statuses = self.operation_statuses() + if any(status == "available" for status in statuses.values()): + return ToolStatus.AVAILABLE + if statuses: + return ToolStatus.DEGRADED + return ToolStatus.UNAVAILABLE + + def operation_statuses(self) -> dict[str, str]: + """Return per-operation readiness for selector routing and preflight.""" + if not self._client.is_available(): + return { + "text_to_video": "unavailable", + "image_to_video": "unavailable", + } + + _, missing_t2v = self._client.check_models(_REQUIRED_MODELS_T2V) + _, missing_i2v = self._client.check_models(_REQUIRED_MODELS_I2V) + return { + "text_to_video": "available" if not missing_t2v else "degraded", + "image_to_video": "available" if not missing_i2v else "degraded", + } + + def is_operation_available(self, operation: str) -> bool: + if operation not in {"text_to_video", "image_to_video"}: + return False + return self.operation_statuses().get(operation) == "available" + + def get_info(self) -> dict[str, Any]: + info = super().get_info() + info["operation_statuses"] = self.operation_statuses() + info["resource_profiles"] = _RESOURCE_PROFILES + info["setup_offer"] = self.setup_offer + info["bundled_model_stacks"] = { + "text_to_video": BUNDLED_MODEL_STACKS["wan22-t2v-4step"], + "image_to_video": BUNDLED_MODEL_STACKS["wan22-i2v-4step"], + } + info["resource_profile_note"] = ( + "The top-level resource_profile is a ComfyUI provider floor, not a " + "promise that every workflow fits 8GB VRAM. Bundled WAN 2.2 14B FP8 " + "workflows recommend 16GB VRAM; custom low-VRAM workflows can target " + "8GB-12GB depending on model, quantization, resolution, and frame count." + ) + return info + + def estimate_cost(self, inputs: dict[str, Any]) -> float: + return 0.0 + + def estimate_runtime(self, inputs: dict[str, Any]) -> float: + operation = inputs.get("operation", "text_to_video") + if operation == "image_to_video": + return 210.0 # ~3.5 min + return 240.0 # ~4 min + + def execute(self, inputs: dict[str, Any]) -> ToolResult: + custom_workflow = bool(inputs.get("workflow_json") or inputs.get("workflow_path")) + if custom_workflow and not inputs.get("output_node"): + return ToolResult( + success=False, + error=( + "Custom ComfyUI workflows require output_node so OpenMontage " + "knows which ComfyUI node to download artifacts from." + ), + ) + + if not self._client.is_available(): + return ToolResult( + success=False, + error=self._client.unavailable_reason(), + ) + + operation = inputs.get("operation", "text_to_video") + + if not custom_workflow: + required = _REQUIRED_MODELS_I2V if operation == "image_to_video" else _REQUIRED_MODELS_T2V + _, missing = self._client.check_models(required) + if missing: + workflow_key = ( + "wan22-i2v-4step" + if operation == "image_to_video" + else "wan22-t2v-4step" + ) + return ToolResult( + success=False, + data=missing_models_payload( + missing, + workflow_key=workflow_key, + workflow_name=f"{workflow_key}.json", + operation=operation, + ), + error=( + f"ComfyUI server is running but missing models for {operation}: " + f"{', '.join(missing)}.\n" + f"See data.missing_models for destination hints and download URLs." + ), + ) + start = time.time() + seed = inputs.get("seed") or ComfyUIClient.random_seed() + output_path = Path( + inputs.get("output_path", f"comfyui_video_{operation}_{seed}.mp4") + ) + + try: + if custom_workflow: + workflow = self._load_custom_workflow(inputs) + output_node = str(inputs["output_node"]) + elif operation == "image_to_video": + workflow, output_node = self._build_i2v(inputs, seed, output_path) + else: + workflow, output_node = self._build_t2v(inputs, seed, output_path) + + provenance = self._workflow_provenance( + inputs, custom_workflow, output_node, operation, workflow + ) + paths = self._client.generate( + workflow, + output_node=output_node, + dest=output_path, + timeout=900, + interval=10, + ) + + except ComfyUIError as exc: + return ToolResult(success=False, error=str(exc)) + except Exception as exc: + return ToolResult(success=False, error=f"ComfyUI video generation failed: {exc}") + + width = inputs.get("width", 832 if operation == "text_to_video" else 640) + height = inputs.get("height", 480 if operation == "text_to_video" else 640) + num_frames = inputs.get("num_frames", 81) + + model_name = self._model_name(inputs, custom_workflow) + return ToolResult( + success=True, + data={ + "provider": "comfyui", + "model": model_name, + "prompt": inputs["prompt"], + "operation": operation, + "width": width, + "height": height, + "num_frames": num_frames, + "fps": 16, + "duration_seconds": round(num_frames / 16, 2), + "output": str(paths[0]), + "format": "mp4", + "workflow_provenance": provenance, + }, + artifacts=[str(p) for p in paths], + cost_usd=0.0, + duration_seconds=round(time.time() - start, 2), + seed=seed, + model=model_name, + ) + + # ------------------------------------------------------------------ + # Workflow builders + # ------------------------------------------------------------------ + + def _build_t2v( + self, inputs: dict[str, Any], seed: int, output_path: Path + ) -> tuple[dict, str]: + width = inputs.get("width", 832) + height = inputs.get("height", 480) + num_frames = inputs.get("num_frames", 81) + + workflow = ComfyUIClient.load_workflow(_WORKFLOWS / "wan22-t2v-4step.json") + workflow = ComfyUIClient.patch_workflow(workflow, { + "2": {"text": inputs["prompt"]}, + "11": {"width": width, "height": height, "batch_size": num_frames}, + "12": {"noise_seed": seed}, + "16": {"filename_prefix": output_path.stem}, + }) + return workflow, _T2V_OUTPUT_NODE + + def _build_i2v( + self, inputs: dict[str, Any], seed: int, output_path: Path + ) -> tuple[dict, str]: + width = inputs.get("width", 640) + height = inputs.get("height", 640) + num_frames = inputs.get("num_frames", 81) + + # Resolve reference image + ref_path = inputs.get("reference_image_path") + ref_url = inputs.get("reference_image_url") + + if ref_url and not ref_path: + # Download to a temp location + resp = requests.get(ref_url, timeout=60) + resp.raise_for_status() + ref_path = str(output_path.with_suffix(".ref.png")) + Path(ref_path).parent.mkdir(parents=True, exist_ok=True) + Path(ref_path).write_bytes(resp.content) + + if not ref_path: + raise ComfyUIError( + "image_to_video requires reference_image_path or reference_image_url" + ) + + # Upload to ComfyUI + upload_name = f"om_{output_path.stem}.png" + server_name = self._client.upload_image(Path(ref_path), upload_name) + + workflow = ComfyUIClient.load_workflow(_WORKFLOWS / "wan22-i2v-4step.json") + workflow = ComfyUIClient.patch_workflow(workflow, { + "93": {"text": inputs["prompt"]}, + "97": {"image": server_name}, + "98": {"width": width, "height": height, "length": num_frames}, + "86": {"noise_seed": seed}, + "108": {"filename_prefix": output_path.stem}, + }) + return workflow, _I2V_OUTPUT_NODE + + @staticmethod + def _load_custom_workflow(inputs: dict[str, Any]) -> dict: + if inputs.get("workflow_json"): + return json.loads(inputs["workflow_json"]) + return ComfyUIClient.load_workflow(Path(inputs["workflow_path"])) + + @staticmethod + def _model_name(inputs: dict[str, Any], custom_workflow: bool) -> str: + if not custom_workflow: + return "wan2.2-14b-fp8-4step" + return ( + inputs.get("workflow_model") + or inputs.get("model") + or inputs.get("workflow_name") + or "custom-comfyui-workflow" + ) + + @staticmethod + def _workflow_provenance( + inputs: dict[str, Any], + custom_workflow: bool, + output_node: str, + operation: str, + workflow: dict[str, Any], + ) -> dict[str, Any]: + if not custom_workflow: + workflow_key = ( + "wan22-i2v-4step" + if operation == "image_to_video" + else "wan22-t2v-4step" + ) + return { + "source": "bundled", + "workflow": ( + "wan22-i2v-4step.json" + if operation == "image_to_video" + else "wan22-t2v-4step.json" + ), + "workflow_hash_sha256": workflow_hash(workflow), + "model_stack": model_stack(workflow_key, inputs), + "output_node": output_node, + } + return { + "source": "user_supplied", + "workflow_name": inputs.get("workflow_name"), + "workflow_path": inputs.get("workflow_path"), + "model": inputs.get("workflow_model") or inputs.get("model"), + "workflow_hash_sha256": workflow_hash(workflow), + "model_stack": model_stack(None, inputs), + "model_stack_source": ( + "caller_supplied" + if inputs.get("workflow_model_stack") + else "unknown_custom_workflow" + ), + "output_node": output_node, + } diff --git a/tools/video/video_selector.py b/tools/video/video_selector.py index bc864b8d..0ae77c68 100644 --- a/tools/video/video_selector.py +++ b/tools/video/video_selector.py @@ -54,6 +54,12 @@ class VideoSelector(BaseTool): "enum": ["text_to_video", "image_to_video", "reference_to_video", "rank"], "default": "text_to_video", }, + "target_operation": { + "type": "string", + "enum": ["text_to_video", "image_to_video", "reference_to_video"], + "description": "Operation to score when operation='rank'.", + "default": "text_to_video", + }, "aspect_ratio": { "type": "string", "enum": ["16:9", "9:16", "1:1"], @@ -137,11 +143,13 @@ def estimate_runtime(self, inputs: dict[str, object]) -> float: def execute(self, inputs: dict[str, object]) -> ToolResult: from lib.scoring import rank_providers - task_context = self._prepare_task_context(inputs) candidates = self._providers() # Rank mode — return scored provider rankings without generating if inputs.get("operation") == "rank": + rank_inputs = self._rank_inputs(inputs) + task_context = self._prepare_task_context(rank_inputs) + candidates = self._filter_candidates(rank_inputs, candidates) rankings = rank_providers(candidates, task_context) return ToolResult( success=True, @@ -153,6 +161,7 @@ def execute(self, inputs: dict[str, object]) -> ToolResult: ) # Normal generation — use scored selection + task_context = self._prepare_task_context(inputs) tool, score = self._select_best_tool(inputs, candidates, task_context) if tool is None: return ToolResult(success=False, error="No video generation provider available.") @@ -252,6 +261,12 @@ def _prepare_task_context(self, inputs: dict[str, object]) -> dict[str, object]: operation=str(inputs.get("operation", "text_to_video")), ) + @staticmethod + def _rank_inputs(inputs: dict[str, object]) -> dict[str, object]: + rank_inputs = dict(inputs) + rank_inputs["operation"] = inputs.get("target_operation", "text_to_video") + return rank_inputs + @staticmethod def _tool_context_payload(tool: BaseTool) -> dict[str, object]: info = tool.get_info() @@ -285,23 +300,36 @@ def _filter_candidates( ) -> list[BaseTool]: operation = inputs.get("operation", "text_to_video") if operation == "rank": - return candidates + operation = inputs.get("target_operation", "text_to_video") filtered: list[BaseTool] = [] + matched_operation = False for tool in candidates: supports = getattr(tool, "supports", {}) props = getattr(tool, "input_schema", {}).get("properties", {}) if operation == "image_to_video": if supports.get("image_to_video") or "image_url" in props or "reference_image_url" in props: - filtered.append(tool) + matched_operation = True + if self._operation_ready(tool, "image_to_video"): + filtered.append(tool) continue if operation == "reference_to_video": if supports.get("reference_to_video") or "reference_image_urls" in props: + matched_operation = True filtered.append(tool) continue - filtered.append(tool) + matched_operation = True + if self._operation_ready(tool, str(operation)): + filtered.append(tool) - return filtered or candidates + return filtered if matched_operation else candidates + + @staticmethod + def _operation_ready(tool: BaseTool, operation: str) -> bool: + checker = getattr(tool, "is_operation_available", None) + if not callable(checker): + return True + return bool(checker(operation))