Skip to content

feat: wandler (onnx) engine with cuda acceleration#156

Merged
TimPietruskyRunPod merged 5 commits into
mainfrom
feat/engine-filter-onnx
Apr 15, 2026
Merged

feat: wandler (onnx) engine with cuda acceleration#156
TimPietruskyRunPod merged 5 commits into
mainfrom
feat/engine-filter-onnx

Conversation

@TimPietruskyRunPod

@TimPietruskyRunPod TimPietruskyRunPod commented Apr 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds Wandler as a third inference engine alongside llama.cpp and MLX
  • Full end-to-end support: CLI --engine flag with auto-detection, Docker image install, entrypoint with CUDA acceleration, a2go doctor on Mac, deploy output in site
  • Separates LFM 2 and LFM 2.5 into distinct catalog entries, adds LFM 2.5 GGUF variant

Changes

CLI (a2go/cmd/):

  • --engine flag on a2go run with auto-detection from catalog
  • execRunWandler() run path with health checks and gateway setup
  • StartWandler() service function
  • Wandler engine recognized in model validation and listing
  • a2go doctor installs wandler via npm on Mac

Docker (Dockerfile.unified, scripts/entrypoint-unified.sh):

  • Wandler installed via npm install -g wandler@latest
  • Entrypoint detects NVIDIA GPU and passes --device cuda
  • LD_LIBRARY_PATH set for cuDNN so onnxruntime-node can use CUDA execution provider

Registry:

  • Wandler entry in engines.json
  • LFM 2.5 1.2B GGUF variant added (LiquidAI/LFM2.5-1.2B-Instruct-GGUF)
  • LFM 2 / LFM 2.5 separated into distinct families
  • Display names fixed: "LFM 2", "LFM 2.5" (proper spacing)

Site:

  • Deploy output generates --engine wandler when Wandler models are selected
  • Wandler variant preserved in deploy commands (not overridden by platform resolution)

Test plan

  • Go CLI builds and vets clean
  • TypeScript compiles clean
  • Shell entrypoint syntax valid
  • Site: engine filter shows wandler models, engine pills switch variants, deploy output shows correct commands
  • Runpod RTX 5090: wandler --device cuda serves LFM 2.5 ONNX with CUDA acceleration
  • Runpod RTX 5090: Hermes gateway proxies to Wandler successfully

@vercel

vercel Bot commented Apr 13, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
a2go Ready Ready Preview, Comment Apr 15, 2026 9:29am

Request Review

- Add engine filter UI (platform, engine, type filter groups in popover)
- Add 9 ONNX model registry configs (wandler engine)
- Add engine-resolver.ts for centralized platform+engine resolution
- Add platform-state.ts for per-platform draft persistence
- Redesign platform selector as deselectable pills (no web platform)
- Add EngineRow to SelectedModels for switching between llamacpp/mlx/wandler
- Add engineCategory field to CatalogModel via resolveEngine()
- Add wandler to model.schema.json engine enum
- Refactor ModelCatalog to use search+filter popovers
- Update url-state.ts to use platform param (backward compat with os param)
- add wandler to engines.json, Dockerfile.unified, and docker entrypoint
- add --engine flag to cli with auto-detection from catalog
- add StartWandler() service and execRunWandler() run path
- install wandler in a2go doctor on mac
- entrypoint detects nvidia gpu and passes --device cuda
- set LD_LIBRARY_PATH for cudnn so onnxruntime-node can use cuda
- deploy output generates --engine wandler when wandler models selected
- separate lfm 2 and lfm 2.5 into distinct catalog entries
- add lfm 2.5 1.2b gguf variant (LiquidAI/LFM2.5-1.2B-Instruct-GGUF)
- fix model display names: "LFM2" -> "LFM 2", "LFM2.5" -> "LFM 2.5"
@TimPietruskyRunPod TimPietruskyRunPod changed the title feat(site): engine filter system with onnx/wandler model support feat: wandler (onnx) engine with cuda acceleration Apr 15, 2026
- wandler runs llm + stt in one process: entrypoint scans for wandler
  audio model and passes --stt alongside --llm
- audio service case skips when wandler llm already handles stt
- StartWandler() and execRunWandler() now accept stt model
- remove kokoro-82m-onnx (wandler doesn't support tts)
the agent tab prompt now says "with the wandler engine" when wandler
models are selected, so the a2go skill knows which engine to use.
- agent prompt now shows engine for all engines (llama.cpp, mlx, wandler)
  not just wandler — future-proofs for additional engines
- update a2go skill in repo with --engine flag docs and wandler section
@TimPietruskyRunPod TimPietruskyRunPod merged commit 17c05f2 into main Apr 15, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant