feat: wandler (onnx) engine with cuda acceleration#156
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
6 tasks
- Add engine filter UI (platform, engine, type filter groups in popover) - Add 9 ONNX model registry configs (wandler engine) - Add engine-resolver.ts for centralized platform+engine resolution - Add platform-state.ts for per-platform draft persistence - Redesign platform selector as deselectable pills (no web platform) - Add EngineRow to SelectedModels for switching between llamacpp/mlx/wandler - Add engineCategory field to CatalogModel via resolveEngine() - Add wandler to model.schema.json engine enum - Refactor ModelCatalog to use search+filter popovers - Update url-state.ts to use platform param (backward compat with os param)
3d5fdbc to
ccca2f5
Compare
- add wandler to engines.json, Dockerfile.unified, and docker entrypoint - add --engine flag to cli with auto-detection from catalog - add StartWandler() service and execRunWandler() run path - install wandler in a2go doctor on mac - entrypoint detects nvidia gpu and passes --device cuda - set LD_LIBRARY_PATH for cudnn so onnxruntime-node can use cuda - deploy output generates --engine wandler when wandler models selected - separate lfm 2 and lfm 2.5 into distinct catalog entries - add lfm 2.5 1.2b gguf variant (LiquidAI/LFM2.5-1.2B-Instruct-GGUF) - fix model display names: "LFM2" -> "LFM 2", "LFM2.5" -> "LFM 2.5"
- wandler runs llm + stt in one process: entrypoint scans for wandler audio model and passes --stt alongside --llm - audio service case skips when wandler llm already handles stt - StartWandler() and execRunWandler() now accept stt model - remove kokoro-82m-onnx (wandler doesn't support tts)
the agent tab prompt now says "with the wandler engine" when wandler models are selected, so the a2go skill knows which engine to use.
- agent prompt now shows engine for all engines (llama.cpp, mlx, wandler) not just wandler — future-proofs for additional engines - update a2go skill in repo with --engine flag docs and wandler section
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--engineflag with auto-detection, Docker image install, entrypoint with CUDA acceleration,a2go doctoron Mac, deploy output in siteChanges
CLI (
a2go/cmd/):--engineflag ona2go runwith auto-detection from catalogexecRunWandler()run path with health checks and gateway setupStartWandler()service functiona2go doctorinstalls wandler via npm on MacDocker (
Dockerfile.unified,scripts/entrypoint-unified.sh):npm install -g wandler@latest--device cudaLD_LIBRARY_PATHset for cuDNN so onnxruntime-node can use CUDA execution providerRegistry:
engines.jsonLiquidAI/LFM2.5-1.2B-Instruct-GGUF)Site:
--engine wandlerwhen Wandler models are selectedTest plan
wandler --device cudaserves LFM 2.5 ONNX with CUDA acceleration