Skip to content

feat(plugins): add llm-switch example plugin for local server management#3672

Open
crxssrazr93 wants to merge 1 commit intoNousResearch:mainfrom
crxssrazr93:feat/llm-switch-plugin
Open

feat(plugins): add llm-switch example plugin for local server management#3672
crxssrazr93 wants to merge 1 commit intoNousResearch:mainfrom
crxssrazr93:feat/llm-switch-plugin

Conversation

@crxssrazr93
Copy link
Copy Markdown

Summary

Example plugin demonstrating the lifecycle hooks activated in #3542. Auto-manages a local llama-server (or any OpenAI-compatible server) when the active model matches a locally configured model name.

This is a plugin-only PR — no core changes. All hook infrastructure was already merged in #3542.

Supersedes #2930 (which included core hook patches before #3542 was merged).

What it does

  • pre_llm_call hook: Detects when the active model name matches a key in models.yaml. If the correct server isn't running, starts it automatically before the LLM call proceeds.
  • on_session_end hook: Kills the server when the session ends.
  • switch_local_llm tool: Mid-session model switching — the agent calls this when asked "switch to the code model". Swaps the server behind the scenes while the endpoint stays the same.
  • Declarative YAML config: Define models with GGUF paths, context sizes, KV cache quantization, and sampling params. Replaces shell scripts.

Example models.yaml

server:
  binary: llama-server
  models_dir: ~/llama-models
  port: 8080
  gpu_layers: 99
  flash_attention: true

models:
  write:
    description: "SEO articles and content briefs"
    gguf: qwen3.5-9b/Qwen3.5-9B-UD-Q6_K_XL.gguf
    context: 49152
    kv_cache: { key: q8_0, value: q4_0 }
    sampling: { temp: 0.7, top_p: 0.8, top_k: 20 }

  code:
    description: "Agentic coding and tool calling"
    gguf: omnicoder-9b/omnicoder-9b-q4_k_m.gguf
    context: 65536
    sampling: { temp: 0.6, top_p: 0.95 }

User flow

  1. hermes model → select custom provider → pick model name matching models.yaml key
  2. Start chatting → pre_llm_call hook auto-starts the server on first message
  3. Mid-session: "switch to the code model" → agent calls switch_local_llm tool → server swaps
  4. Exit → on_session_end kills server

Relationship to other PRs

Changes

6 new files in docs/llm-switch-plugin-example/:

File Purpose
plugin.yaml Manifest
__init__.py Registration, hook handlers, tool handler
schemas.py Tool schema for switch_local_llm
server.py Pure Python server lifecycle (start, stop, health check)
models.yaml.example Example config with full schema documentation
README.md Setup instructions, usage, and config reference

Testing

  • Plugin is self-contained — copy to ~/.hermes/plugins/llm-switch/, add models.yaml, verify with /plugins
  • No existing code is modified — zero regression risk

Platforms tested

  • Linux (Arch)

Example plugin demonstrating the lifecycle hooks activated in NousResearch#3542.
Auto-manages a local llama-server (or any OpenAI-compatible server) when
the active model matches a locally configured model name.

Features:
- pre_llm_call hook: auto-starts the correct server on first message
  when hermes is configured with a local model name
- on_session_end hook: kills the server on exit
- switch_local_llm tool: mid-session model switching — the agent swaps
  the server when asked ("switch to the code model")
- Declarative YAML config for model definitions (GGUF paths, context
  sizes, KV cache quantization, sampling params) replacing shell scripts

The plugin is self-contained in docs/llm-switch-plugin-example/ with a
README, example config, and full implementation. Users copy it to
~/.hermes/plugins/llm-switch/ to install.

Complements NousResearch#3360 and NousResearch#3548 which restore /model as a slash command —
once merged, /model custom:write would trigger the pre_llm_call hook
to auto-start the right server seamlessly.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant