Problem Statement
Add native support for llama.cpp-based inference servers (GGUF models) as a backend option, alongside existing OpenAI-compatible providers.
This would enable efficient local deployment of Decepticon using quantized models on consumer GPUs (e.g., T4, 3060, etc.).
Proposed Solution
Add a llama.cpp backend option, either:
-
Native llama.cpp integration
Directly interface with llama.cpp (via subprocess or Python bindings like llama-cpp-python)
Allow configuration of:
n_gpu_layers
n_ctx
n_batch
sampling parameters (temperature, top_p, etc.)
-
Enhanced OpenAI-compatible support
Officially support llama.cpp’s OpenAI-compatible server mode
Add config presets for llama.cpp quirks:
streaming differences
stop token handling
system prompt formatting (important for models like Qwen)
Alternatives Considered
Ollama backend works but adds overhead and reduces control.
OpenAI-compatible proxy partially works, but lacks optimization and standardization.
Area
Docker / Infrastructure
Additional Context
No response
Problem Statement
Add native support for llama.cpp-based inference servers (GGUF models) as a backend option, alongside existing OpenAI-compatible providers.
This would enable efficient local deployment of Decepticon using quantized models on consumer GPUs (e.g., T4, 3060, etc.).
Proposed Solution
Add a llama.cpp backend option, either:
Native llama.cpp integration
Directly interface with llama.cpp (via subprocess or Python bindings like llama-cpp-python)
Allow configuration of:
n_gpu_layers
n_ctx
n_batch
sampling parameters (temperature, top_p, etc.)
Enhanced OpenAI-compatible support
Officially support llama.cpp’s OpenAI-compatible server mode
Add config presets for llama.cpp quirks:
streaming differences
stop token handling
system prompt formatting (important for models like Qwen)
Alternatives Considered
Ollama backend works but adds overhead and reduces control.
OpenAI-compatible proxy partially works, but lacks optimization and standardization.
Area
Docker / Infrastructure
Additional Context
No response