Apeinx is the AI-native operating system kernel for GPU, token, KV-cache, model runtime, and agent process management.
Apeinx ≠ scheduler
Apeinx ≠ vLLM plugin
Apeinx ≠ Ray/K8s wrapper
Apeinx = the AI-native Linux
Apeinx is an AI-native OS kernel for the AI computing era, managing GPU, Token, KV Cache, model runtime, agent processes, and replayable execution evidence.
| Linux | Apeinx |
|---|---|
| CPU | GPU |
| Process (task_struct) | AI Task / Agent Process |
| Memory (page frame) | Token / KV Cache (64MB pages) |
| File (VFS) | Model Runtime (VFS: /trace /replay /models /memory) |
| Device (device_driver) | Runtime Driver (vLLM / llama.cpp / TensorRT-LLM / CUDA) |
| Syscall (int 0x80) | AI Syscall (Unix socket text protocol) |
| Scheduler (CFS) | Token Fair Scheduler (vruntime = tokens/weight + pressure) |
| OOM Killer | KV OOM Killer (largest KV consumer) |
| kswapd (LRU reclaim) | KV LRU Eviction (access_tick aging) |
| cgroup / namespace | Tenant / Sandbox / Capability |
| dmesg / auditd | Trace / Replay / Audit |
| /proc / nvidia-smi | apeinxctl (status / top / billing) |
| cluster (none) | Cluster Manager (master/worker, heartbeat, failover) |
┌────────────────────────┐
│ apeinxctl │ CLI: status submit kill top replay billing
└───────────┬────────────┘
│ Unix Socket IPC
┌───────────▼────────────┐
│ apeinxd │ AI Kernel Daemon (21 subsystems)
│ │
│ ┌───────────────────┐ │
│ │ Token Fair │ │ min-heap, vruntime, GPU/KV pressure
│ │ Scheduler │ │ preempt, greedy, cluster placement
│ ├───────────────────┤ │
│ │ KV Memory Mgr │ │ 64MB pages, LRU eviction, OOM killer
│ │ │ │ context window, prefix cache
│ ├───────────────────┤ │
│ │ Runtime Drivers │ │ mock / vLLM / llama.cpp / TRT-LLM / CUDA
│ ├───────────────────┤ │
│ │ Resource Control │ │ GPU mem / token budget / quota / pressure / lease
│ ├───────────────────┤ │
│ │ Cluster Manager │ │ master/worker, TCP heartbeat, failover
│ ├───────────────────┤ │
│ │ Security │ │ tenant / policy (DENY/ALLOW/LIMIT) / RBAC / sandbox
│ ├───────────────────┤ │
│ │ Trace / Audit │ │ ring buffer, CSV, per-task replay, metrics
│ ├───────────────────┤ │
│ │ Filesystem │ │ VFS: /trace /replay /models /memory
│ ├───────────────────┤ │
│ │ Agent / Stream │ │ Agent process, token stream, state machine, wait
│ └───────────────────┘ │
└───────────┬────────────┘
│ Driver Interface
┌───────────▼────────────┐
│ vLLM / llama.cpp │ External Inference Runtimes
│ SGLang / TensorRT-LLM │
└───────────┬────────────┘
│ CUDA
┌───────────▼────────────┐
│ GPU(s) │
└────────────────────────┘
git clone https://github.com/your/apeinx
cd apeinx
make
# Single-node daemon
./build/apeinxd --demo
# Another terminal
./build/apeinxctl status
./build/apeinxctl submit my-task 1000 5
./build/apeinxctl top
./build/apeinxctl replay
# 100-task stress test
./build/apeinxd --csv examples/tasks.csv --limit 100 &
# Cluster
./build/apeinxd --master 9800 --csv examples/tasks.csv &
./build/apeinxd --worker 127.0.0.1 9800 &
# Python venv (required for vLLM / SGLang drivers)
scripts/setup_venv.bat # Windows
bash scripts/setup_venv.sh # Linux/macOSapeinx/
├── README.md
├── LICENSE # MIT
├── CONTRIBUTING.md
├── Makefile # Single make builds everything
├── requirements.txt # Python dependencies
│
├── docs/ # 8 design documents
│ ├── architecture.md # Architecture overview
│ ├── apeinx_vs_linux.md # Linux comparison
│ ├── syscall.md # AI syscall spec
│ ├── scheduler.md # Token Fair scheduler
│ ├── memory.md # KV memory management
│ ├── roadmap.md # Phase 0-6 roadmap
│ └── ...
│
├── include/apeinx/ # 10 header files
│ ├── types.h # ax_pid_t, ax_state_t, ax_result_t ...
│ ├── errno.h # AX_OK / AX_EINVAL / AX_ENOMEM ...
│ ├── config.h # ax_config_t (boot config)
│ ├── kernel.h # ax_kernel_t (global kernel state, all subsystems)
│ ├── trace.h # ax_event_t + ax_event_type_t
│ ├── sched.h # ax_fair_rq_t (min-heap) + prio→weight
│ ├── mm.h # ax_kv_page_t + ax_kvmm_t
│ ├── driver.h # ax_driver_ops_t (5 callbacks)
│ ├── net.h # ax_node_t + ax_cluster_t
│ └── security.h # ax_tenant_t + ax_policy_t + ax_audit_entry_t
│
├── kernel/ # 50 kernel source files
│ │
│ ├── init/ # Boot
│ │ ├── boot.c # ax_boot() init call chain
│ │ └── init_task.c # main() + select() event loop + IPC + CSV
│ │
│ ├── core/ # Kernel core
│ │ ├── kernel.c # g_kernel global instance + ax_kernel_init()
│ │ ├── config.c # key=value config parser
│ │ ├── panic.c # Unrecoverable error → dump → exit
│ │ ├── errno.c # Error code → strerror()
│ │ ├── clock.c # Monotonic clock (μs precision)
│ │ └── id.c # Global ID generator
│ │
│ ├── syscall/ # Syscall dispatch table
│ │ └── syscall.c # register / dispatch / call count
│ │
│ ├── process/ # AI process management
│ │ ├── task.c # ax_task_create / find / free
│ │ ├── agent.c # Agent Process container (sub-task orchestration)
│ │ ├── stream.c # Token inference stream (ring buffer)
│ │ ├── state.c # State machine validation + batch transition
│ │ └── wait.c # waitpid()-style blocking wait
│ │
│ ├── sched/ # Scheduler (8 files)
│ │ ├── sched.h # ax_fair_entity_t + min-heap + prio→weight table
│ │ ├── fair.c # Token Fair: vruntime formula + heap_enqueue/pick
│ │ ├── greedy.c # Greedy baseline: always pick highest priority
│ │ ├── cost_model.c # Token cost estimation (matched by task name)
│ │ ├── placement.c # Local GPU selection (least running_tasks)
│ │ ├── placement_cluster.c # Cross-node GPU selection
│ │ ├── preempt.c # Preempt when vruntime gap >3x → PREEMPTED
│ │ └── failover.c # Node failure → task migration + epoch++
│ │
│ ├── mm/ # KV memory management (6 files)
│ │ ├── kv_cache.c # Page alloc/free (64MB/page)
│ │ ├── eviction.c # LRU eviction (access_tick - last_access)
│ │ ├── oom.c # KV OOM killer (kill task with largest KV usage)
│ │ ├── ai_mm.c # Unified memory interface (KV + future GPU unified)
│ │ ├── context_page.c # Context window pages (per-task token mapping)
│ │ └── prefix_cache.c # Shared prompt cache (hit rate stats)
│ │
│ ├── resource/ # Resource management (7 files)
│ │ ├── lease.c # GPU + KV lease acquire / release
│ │ ├── gpu.c # GPU memory alloc / utilization / pressure
│ │ ├── budget.c # Token budget (consumed/remaining/overshoot count)
│ │ ├── quota.c # Tenant quota (token + kv + gpu shares)
│ │ ├── pressure.c # Pressure metrics GPU/KV/Token 0-100
│ │ └── token.c # Token pool + short-term borrowing (10% overdraw)
│ │
│ ├── drivers/ # Runtime drivers (6 files)
│ │ ├── driver.c # Driver registry (register/find/resolve/dispatch)
│ │ ├── mock_driver.c # Mock driver (sleep + rand token, for CI)
│ │ ├── vllm_driver.c # vLLM HTTP (POST /v1/completions)
│ │ ├── llama_driver.c # llama.cpp subprocess (popen llama-cli)
│ │ ├── trtllm_driver.c # TensorRT-LLM (Triton POST /v2/models)
│ │ └── cuda_driver.c # CUDA bare-metal driver (custom kernel)
│ │
│ ├── net/ # Network + cluster (6 files)
│ │ ├── rpc.c # Unix socket IPC (daemon ↔ apeinxctl)
│ │ ├── channel.c # TCP channel (connect/send/recv, non-blocking + timeout)
│ │ ├── heartbeat.c # PING/PONG heartbeat (3s interval, 15s timeout)
│ │ ├── cluster.c # Cluster Manager (master: listen/accept/pick_best)
│ │ ├── node.c # Node Agent (worker: heartbeat/task receive)
│ │ └── state_sync.c # Cluster state sync (epoch/gpu/kv/budget)
│ │
│ ├── security/ # Security (5 files)
│ │ ├── tenant.c # Tenant CRUD + quota accounting
│ │ ├── policy.c # Policy engine (DENY/ALLOW/LIMIT, rule evaluation)
│ │ ├── capability.c # RBAC (admin/user/viewer → capability bits)
│ │ ├── namespace.c # Namespace isolation (PID visibility)
│ │ └── sandbox.c # Sandbox (file/network/CPU/memory limits)
│ │
│ ├── trace/ # Observability (5 files)
│ │ ├── trace.c # Ring buffer (push/dump/clear)
│ │ ├── replay.c # CSV replay (full / per-pid)
│ │ ├── audit.c # Audit log (admin operations, ring buffer)
│ │ ├── event.c # Event filtering/stats (by type/pid/gpu)
│ │ └── metrics.c # Prometheus-style metrics (scheduler/task/kv/budget)
│ │
│ └── fs/ # Filesystem (5 files)
│ ├── vfs.c # VFS (mount / read)
│ ├── modelfs.c # /models (model registry)
│ ├── memoryfs.c # /memory (KV status, cf. /proc/meminfo)
│ ├── tracefs.c # /trace (CSV event dump)
│ └── replayfs.c # /replay (per-task timeline)
│
├── user/ # Userspace tools
│ ├── apeinxctl/ # CLI (6 commands)
│ │ ├── main.c # Subcommand dispatch
│ │ ├── submit.c # submit <name> <tokens> <prio>
│ │ ├── top.c # Real-time GPU/task/KV dashboard
│ │ ├── kill.c # kill <pid>
│ │ ├── replay.c # Replay trace events
│ │ └── billing.c # Tenant usage + cost estimate
│ │
│ └── libapeinx/ # Client library (3 files)
│ ├── client.c # Socket communication wrapper
│ ├── syscall_user.c # Type-safe syscall wrappers
│ └── api.c # High-level API (ax_run_sync submit+wait)
│
├── tests/ # Unit tests (6 files)
│ ├── test_sched.c
│ ├── test_mm.c
│ ├── test_lease.c
│ ├── test_syscall.c
│ ├── test_tracefs.c
│ └── test_replay.c
│
├── examples/ # Config + data
│ ├── apeinx.conf # Example config
│ ├── tasks.csv # 100-task stress test data (10 task types)
│ ├── local4gpu.conf # 4GPU config
│ └── mock_models.conf # Mock model registry
│
└── scripts/ # Python environment
├── setup_venv.bat # Windows venv one-click setup
├── setup_venv.sh # Linux/macOS venv
└── requirements.txt # pytest, requests
| # | Abstraction | Description |
|---|---|---|
| 1 | AI Process | Agent / Inference / Tool Run → unified AI Process (task → agent → stream → wait) |
| 2 | Token Fair Scheduler | Linux CFS → token runtime fairness + GPU/KV pressure + deadline boost |
| 3 | KV Memory Manager | KV Cache → 64MB pages + LRU eviction + OOM killer + prefix cache |
| 4 | Runtime Driver Layer | Manage vLLM / llama.cpp / TRT-LLM / CUDA like Linux manages devices (5 drivers) |
| 5 | TraceFS / ReplayFS | AI execution → auditable, reproducible system logs (event/metrics/audit) |
Unix socket / TCP, text protocol, one line per frame, \n delimited
→ SUBMIT <name> <tokens> <priority> [deadline] ← OK / ERR
→ KILL <pid> ← OK / ERR
→ STATUS / TOP ← multi-line + END
→ REPLAY ← CSV events + END
→ BILLING ← tenant usage + END
| Phase | Version | Description | Files | Status |
|---|---|---|---|---|
| 0 | Spec | AIOS core abstractions defined | 7 docs | ✅ |
| 1 | v0.01 | Local AI Kernel prototype (boot/task/sched/trace/rpc) | 11 | ✅ |
| 2 | v0.1 | Local multi-GPU AIOS (fair/KV/budget/quota/pressure) | 16 | ✅ |
| 3 | v0.2 | Connect real runtimes (vLLM/llama/TRT-LLM/CUDA driver) | 5 | ✅ |
| 4 | v0.3 | Distributed cluster (master/worker/heartbeat/failover) | 9 | ✅ |
| 5 | v0.5 | Enterprise control plane (tenant/policy/RBAC/sandbox/audit) | 9 | ✅ |
| 6 | v1.0 | AI-native OS ecosystem (docs/tests/libraries) | 18 | ✅ |
MIT — see LICENSE