lumin-engine is a Go-based local assistant engine that exposes a Unix socket API, manages model lifecycle, and routes tool calls through a permissions layer. It runs as a systemd service, listens on a Unix socket, and bridges natural language requests to safe OS actions.
User types message
↓
IPC socket receives JSON request
↓ cmd/lumin-engine/main.go
Tokenize text → integers
↓ internal/inference/tokenizer.go (loaded from GGUF)
Build context window with conversation history
↓ internal/context/manager.go (sliding window with truncation)
Run inference via llama.cpp
↓ internal/inference/llama_cgo.go (CGo bindings)
Detokenize output → text
↓ internal/inference/tokenizer.go
Stream tokens to client + parse tool calls
↓ internal/tools/parser.go (state machine, detects [[tool:...]])
Permission check for each tool call
↓ internal/permissions/policy.go (capability flags, path allowlist)
Execute OS action via sandboxed Go wrapper
↓ internal/tools/executor.go (fs_read, exec_safe, notify, etc.)
Audit log the result
↓ internal/permissions/audit.go (timestamped action record)
Return result to client
Files: internal/inference/llama_cgo.go, internal/inference/model.go, internal/inference/generate.go
Loads GGUF model weights, runs the forward pass (attention + feed-forward layers), outputs probability distributions over tokens. The actual neural network math. You bind to llama.cpp (already battle-tested) and get CUDA/ROCm/Metal/CPU backend selection for free.
Files: internal/inference/tokenizer.go
Converts user text → integer token IDs before inference, and output token IDs → text. Each model has its own vocabulary (Qwen3 uses BPE, Gemma uses SentencePiece). The real tokenizer is embedded in the GGUF file itself.
Files: internal/context/manager.go, internal/context/template.go
Builds the full prompt, tracks conversation history, injects system prompt, truncates old messages when the window fills up. This is what makes the AI "remember" what you said. Models can only "see" a fixed number of tokens at once (e.g. 32,768 for Qwen3 8B).
Files: internal/tools/parser.go
Watches the output stream, detects [[tool:...]] blocks mid-stream (without buffering the whole response), extracts JSON, routes to the right Go wrapper function. Modern LLMs emit structured JSON when they want to take an action. This is the bridge between "AI thinking" and "OS doing".
Files: internal/hardware/probe.go, internal/hardware/recommend.go, internal/hardware/watchdog.go
Probes the system at startup to find the best compute backend. If an Nvidia GPU with enough VRAM is found, it offloads layers to GPU. Falls back to CPU. Enforces 80% RAM ceiling and handles suspend/resume. You want inference to be fast, but not crash the system.
Files: internal/ipc/server.go, internal/ipc/handler.go, internal/ipc/protocol.go, cmd/lumin-engine/main.go
The Go daemon that runs as a systemd service. Listens on a Unix socket at /run/lumin/engine.sock. Any app (desktop widget, panel applet, terminal) connects and sends/receives JSON-RPC 2.0 messages. This is the interface between the rest of LuminOS and the AI engine. Nothing touches the system except through this.
Every tool has a capability flag. Users grant capabilities per-model via ~/.config/lumin/permissions.toml, and can scope them to specific paths or D-Bus methods. Revocation takes effect immediately, no restart needed.
| Capability | Risk | Behavior |
|---|---|---|
fs.read |
Low | Read files/dirs. Scoped to allowlisted paths only (e.g. ~/Documents). Never / or /etc without explicit grant. |
fs.write |
Medium | Write/create files. Requires path allowlist. Writes to /tmp by default. Destructive ops (delete, overwrite) need separate flag. |
plasma.theme |
Low | Change KDE theme, panel config, wallpaper via D-Bus. Fully reversible. |
plasma.windows |
Low | Read window list, focus, minimize, tile. Write requires explicit grant. |
net.fetch |
Medium | HTTP GET to external URLs. Domain allowlist enforced. |
exec.run |
High | Run shell commands. Off by default. Allowlisted commands only. |
notify.send |
Low | Send desktop notifications. Cannot impersonate system. |
clipboard |
Medium | Read/write clipboard. Separate flags. |
cmd/lumin-engine/main.go— application entrypoint; loads config, initializes model and permissions, starts socket server.lumin-engine.service— systemd user service file.
internal/config/config.go— loads/etc/lumin/engine.toml(socket path, permissions, audit log, context size).
internal/inference/llama_cgo.go— real CGo bindings to llama.cpp C API. Loads models, runs inference, tokenization.internal/inference/model.go— model state, load/unload/suspend lifecycle.internal/inference/generate.go— generation wrapper with sampling options.internal/inference/tokenizer.go— thin wrappers around the loaded model's tokenizer and detokenizer.internal/inference/llama_stub.go— non-CGo fallback.
internal/context/manager.go— tracks message history, auto-trims by token budget.internal/context/template.go— renders prompts (Qwen3, Gemma, Llama).
internal/tools/parser.go— streaming state machine detects[[tool:...]]blocks mid-stream.internal/tools/executor.go— dispatches tool calls, checks permissions, logs actions.internal/tools/fs.go— file read/write helpers.internal/tools/exec.go— restricted command execution (allowlist only).internal/tools/notify.go— notification wrapper.internal/tools/plasma.go— KDE Plasma D-Bus wrapper.internal/tools/web.go— HTTP GET helper (timeout + size limit).
internal/permissions/policy.go— loads, saves, evaluates permissions from TOML. Capability flags, path/command allowlists.internal/permissions/audit.go— writes timestamped audit log entries.
internal/hardware/probe.go— system memory/GPU detection.internal/hardware/recommend.go— recommends model size based on RAM.internal/hardware/watchdog.go— background RAM monitor, triggers suspend at 80%.
internal/ipc/server.go— Unix socket listener with clean shutdown.internal/ipc/handler.go— JSON-RPC 2.0 dispatcher:health,generate,model.load,model.unload,tool.call.internal/ipc/protocol.go— request/response types.
These native headers and libraries are NOT committed into the repository. Run the build target below to clone and compile llama.cpp locally — that produces lib/llama.h and lib/libllama.a under lib/.
IMPORTANT: Do not add prebuilt libllama.a or llama.h to the repo. Always build on the target machine (toolchain, GPU drivers, and ABI vary).
go.mod— Go module definition (stdlib only, except CGo to llama.cpp).go.sum— dependency checksums.Makefile— full build pipeline: clone llama.cpp, compile with CUDA, link Go binary.
make buildWhat it does:
- Clones
llama.cppfrom GitHub (if not present) - Builds with CUDA support (auto-detects; falls back to CPU-only)
- Copies
libllama.aandllama.htolib/ - Builds Go binary with CGo enabled
- Produces
bin/lumin-engine
Requirements:
gccorclang(C compiler)cmake(for llama.cpp build system)- Go 1.22+
- Optional: CUDA toolkit (for GPU support)
make run
````markdown
# lumin-engine
One consolidated document: this `README.md` contains the full project overview, architecture, build instructions, and the implementation/verification summaries previously spread across multiple Markdown files.
If you prefer the older, separate documents, their material is included below — the repository now maintains a single authoritative `README.md` to keep docs centralized.
---
## Quick Overview
`lumin-engine` is a Go-based local assistant engine that exposes a Unix socket API, manages model lifecycle, and routes tool calls through a permissions sandbox. It supports systemd socket activation, model management, a Qt/QML control panel, and model-backed tokenization via `llama.cpp` (GGUF models).
Core flow:
User message → IPC socket (JSON-RPC) → Tokenize (GGUF) → Context window → Inference (llama.cpp via CGo) → Detokenize → Stream + parse tool calls → Permission check → Execute → Audit log → Return result
---
## Single-file Contents
This file consolidates the following topics:
- Project architecture & components
- Build & run instructions (Makefile + systemd)
- Tokenizer, inference & CGo binding notes
- Systemd socket activation details
- D-Bus (KDE Plasma) bridge and available methods
- Model download manager (CLI) usage
- Qt/QML control panel overview
- Verification checklist and next steps
---
## Project Layout (short)
- `cmd/lumin-engine/` — main daemon (systemd + socket activation)
- `cmd/model-wizard/` — model download manager (HTTP + SHA256)
- `cmd/lumin-control-panel/` — Qt/QML control panel (Kirigami)
- `internal/inference/` — CGo bindings (`llama_cgo.go`), model lifecycle, tokenizer
- `internal/tools/` — tool parser, executor, KDE Plasma D-Bus wrapper
- `internal/ipc/` — Unix socket server + JSON-RPC handler
- `internal/permissions/` — policy, audit logging
- `internal/context/` — prompt templates and sliding-window manager
- `internal/hardware/` — GPU/RAM detection and watchdog
- `contrib/` — systemd socket & service files
- `Makefile` — build pipeline (clones & builds `llama.cpp` locally)
---
## Important notes about native artifacts
Native headers and libraries for `llama.cpp` are intentionally NOT committed to the repository. Run `make build` from the project root to clone and compile `llama.cpp` locally; the build will place `lib/llama.h` and `lib/libllama.a` into `lib/`.
Do not add prebuilt `libllama.a` or `llama.h` to the repo — native toolchains, ABIs and GPU drivers vary across machines and shipping binaries in-repo leads to brittle builds and security concerns.
---
## Build & Run (condensed)
Prereqs: `gcc`/`clang`, `cmake`, Go 1.22+, optional CUDA toolkit for GPU backends.
Build everything:
```bash
cd /path/to/lumin-engine
make build
# Build model wizard
go build -o bin/model-wizard ./cmd/model-wizard
# Optional: GUI (Qt 6 + KF6 required)
mkdir -p build-gui && cd build-gui
cmake ../cmd/lumin-control-panel
make
sudo make installQuick test (local socket):
# Run daemon for a quick test
./bin/lumin-engine -socket /tmp/lumin-engine.sock &
# Health check via JSON-RPC (example)
printf '%s' '{"jsonrpc":"2.0","id":1,"method":"health"}' | nc -U /tmp/lumin-engine.sockSystemd install (optional):
sudo cp contrib/lumin.socket /etc/systemd/system/
sudo cp contrib/lumin-engine.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable lumin.socket- Tokenizer: Real GGUF tokenizer via
llama_tokenizeandllama_token_to_piece(not word-split) —internal/inference/llama_cgo.goandinternal/inference/tokenizer.go. - Inference: Real llama.cpp C API usage via CGo (
#include "llama.h"). - Socket activation:
contrib/lumin.socket+cmd/lumin-engine/main.gousesactivation.Listeners()to inherit systemd socket. - D-Bus:
internal/tools/plasma.gousesgithub.com/godbus/dbus/v5to callorg.kde.Plasma.*methods (SetTheme, SetWallpaper, GetPanelConfig, etc.). - Model manager:
cmd/model-wizardperforms streaming HTTP downloads with SHA256 verification and atomic rename into~/.local/share/lumin/models. - GUI:
cmd/lumin-control-panel(Qt/QML + Kirigami) withIPCClient.qmlfor JSON-RPC over Unix socket.
- User enters prompt in GUI or sends JSON-RPC to Unix socket.
- Daemon tokenizes prompt using model's vocab (GGUF-backed tokenizer).
- Context manager composes conversation + system prompt within max token budget.
- Inference runs via
llama.cpp(CGo layer) and streams tokens back. - Streaming parser (
internal/tools/parser.go) watches output for[[tool:...]]blocks. - When tool call detected, permission engine evaluates policy; if allowed, executor runs sandboxed tool and logs audit entry.
- Results are returned to client.
- llama_cgo.go compiles with
#include "llama.h"(0 errors reported in static checks performed during implementation). internal/tools/plasma.gocompiles and uses godbus/dbus.cmd/lumin-engine/main.gocompiles and supports systemd socket activation.internal/ipc/server.goexposes aNewServer(listener, handler)constructor.cmd/model-wizardperforms streaming download + SHA256 verification and atomic rename.- Qt/QML control panel scaffolding (CMakeLists + QML) present and ready to build.
- Run
make buildon your target machine to producelib/llama.handlib/libllama.a. - Use
bin/model-wizardto fetch a small GGUF test model for integration testing. - Start the daemon (socket activation or direct) and run the end-to-end GUI → socket → model → tool → permission → execution → response smoke test.
MIT