Skip to content

rootkiller6788/Apeinx

Repository files navigation

Apeinx — AI-native OS Kernel

Apeinx is the AI-native operating system kernel for GPU, token, KV-cache, model runtime, and agent process management.

Apeinx ≠ scheduler
Apeinx ≠ vLLM plugin
Apeinx ≠ Ray/K8s wrapper

Apeinx = the AI-native Linux

TL;DR

Apeinx is an AI-native OS kernel for the AI computing era, managing GPU, Token, KV Cache, model runtime, agent processes, and replayable execution evidence.


Linux Comparison

Linux Apeinx
CPU GPU
Process (task_struct) AI Task / Agent Process
Memory (page frame) Token / KV Cache (64MB pages)
File (VFS) Model Runtime (VFS: /trace /replay /models /memory)
Device (device_driver) Runtime Driver (vLLM / llama.cpp / TensorRT-LLM / CUDA)
Syscall (int 0x80) AI Syscall (Unix socket text protocol)
Scheduler (CFS) Token Fair Scheduler (vruntime = tokens/weight + pressure)
OOM Killer KV OOM Killer (largest KV consumer)
kswapd (LRU reclaim) KV LRU Eviction (access_tick aging)
cgroup / namespace Tenant / Sandbox / Capability
dmesg / auditd Trace / Replay / Audit
/proc / nvidia-smi apeinxctl (status / top / billing)
cluster (none) Cluster Manager (master/worker, heartbeat, failover)

Architecture

         ┌────────────────────────┐
         │      apeinxctl          │  CLI: status submit kill top replay billing
         └───────────┬────────────┘
                     │ Unix Socket IPC
         ┌───────────▼────────────┐
         │       apeinxd           │  AI Kernel Daemon (21 subsystems)
         │                         │
         │  ┌───────────────────┐  │
         │  │  Token Fair       │  │  min-heap, vruntime, GPU/KV pressure
         │  │  Scheduler        │  │  preempt, greedy, cluster placement
         │  ├───────────────────┤  │
         │  │  KV Memory Mgr    │  │  64MB pages, LRU eviction, OOM killer
         │  │                   │  │  context window, prefix cache
         │  ├───────────────────┤  │
         │  │  Runtime Drivers  │  │  mock / vLLM / llama.cpp / TRT-LLM / CUDA
         │  ├───────────────────┤  │
         │  │  Resource Control │  │  GPU mem / token budget / quota / pressure / lease
         │  ├───────────────────┤  │
         │  │  Cluster Manager  │  │  master/worker, TCP heartbeat, failover
         │  ├───────────────────┤  │
         │  │  Security         │  │  tenant / policy (DENY/ALLOW/LIMIT) / RBAC / sandbox
         │  ├───────────────────┤  │
         │  │  Trace / Audit    │  │  ring buffer, CSV, per-task replay, metrics
         │  ├───────────────────┤  │
         │  │  Filesystem       │  │  VFS: /trace /replay /models /memory
         │  ├───────────────────┤  │
         │  │  Agent / Stream   │  │  Agent process, token stream, state machine, wait
         │  └───────────────────┘  │
         └───────────┬────────────┘
                     │ Driver Interface
         ┌───────────▼────────────┐
         │  vLLM / llama.cpp      │  External Inference Runtimes
         │  SGLang / TensorRT-LLM │
         └───────────┬────────────┘
                     │ CUDA
         ┌───────────▼────────────┐
         │       GPU(s)            │
         └────────────────────────┘

Quick Start

git clone https://github.com/your/apeinx
cd apeinx
make

# Single-node daemon
./build/apeinxd --demo

# Another terminal
./build/apeinxctl status
./build/apeinxctl submit my-task 1000 5
./build/apeinxctl top
./build/apeinxctl replay

# 100-task stress test
./build/apeinxd --csv examples/tasks.csv --limit 100 &

# Cluster
./build/apeinxd --master 9800 --csv examples/tasks.csv &
./build/apeinxd --worker 127.0.0.1 9800 &

# Python venv (required for vLLM / SGLang drivers)
scripts/setup_venv.bat    # Windows
bash scripts/setup_venv.sh  # Linux/macOS

Full Directory Tree

apeinx/
├── README.md
├── LICENSE              # MIT
├── CONTRIBUTING.md
├── Makefile             # Single make builds everything
├── requirements.txt     # Python dependencies
│
├── docs/                # 8 design documents
│   ├── architecture.md  #   Architecture overview
│   ├── apeinx_vs_linux.md # Linux comparison
│   ├── syscall.md       #   AI syscall spec
│   ├── scheduler.md     #   Token Fair scheduler
│   ├── memory.md        #   KV memory management
│   ├── roadmap.md       #   Phase 0-6 roadmap
│   └── ...
│
├── include/apeinx/      # 10 header files
│   ├── types.h          #   ax_pid_t, ax_state_t, ax_result_t ...
│   ├── errno.h          #   AX_OK / AX_EINVAL / AX_ENOMEM ...
│   ├── config.h         #   ax_config_t (boot config)
│   ├── kernel.h         #   ax_kernel_t (global kernel state, all subsystems)
│   ├── trace.h          #   ax_event_t + ax_event_type_t
│   ├── sched.h          #   ax_fair_rq_t (min-heap) + prio→weight
│   ├── mm.h             #   ax_kv_page_t + ax_kvmm_t
│   ├── driver.h         #   ax_driver_ops_t (5 callbacks)
│   ├── net.h            #   ax_node_t + ax_cluster_t
│   └── security.h       #   ax_tenant_t + ax_policy_t + ax_audit_entry_t
│
├── kernel/              # 50 kernel source files
│   │
│   ├── init/                  # Boot
│   │   ├── boot.c             #   ax_boot() init call chain
│   │   └── init_task.c        #   main() + select() event loop + IPC + CSV
│   │
│   ├── core/                  # Kernel core
│   │   ├── kernel.c           #   g_kernel global instance + ax_kernel_init()
│   │   ├── config.c           #   key=value config parser
│   │   ├── panic.c            #   Unrecoverable error → dump → exit
│   │   ├── errno.c            #   Error code → strerror()
│   │   ├── clock.c            #   Monotonic clock (μs precision)
│   │   └── id.c               #   Global ID generator
│   │
│   ├── syscall/               # Syscall dispatch table
│   │   └── syscall.c          #   register / dispatch / call count
│   │
│   ├── process/               # AI process management
│   │   ├── task.c             #   ax_task_create / find / free
│   │   ├── agent.c            #   Agent Process container (sub-task orchestration)
│   │   ├── stream.c           #   Token inference stream (ring buffer)
│   │   ├── state.c            #   State machine validation + batch transition
│   │   └── wait.c             #   waitpid()-style blocking wait
│   │
│   ├── sched/                 # Scheduler (8 files)
│   │   ├── sched.h            #   ax_fair_entity_t + min-heap + prio→weight table
│   │   ├── fair.c             #   Token Fair: vruntime formula + heap_enqueue/pick
│   │   ├── greedy.c           #   Greedy baseline: always pick highest priority
│   │   ├── cost_model.c       #   Token cost estimation (matched by task name)
│   │   ├── placement.c        #   Local GPU selection (least running_tasks)
│   │   ├── placement_cluster.c #  Cross-node GPU selection
│   │   ├── preempt.c          #   Preempt when vruntime gap >3x → PREEMPTED
│   │   └── failover.c         #   Node failure → task migration + epoch++
│   │
│   ├── mm/                    # KV memory management (6 files)
│   │   ├── kv_cache.c         #   Page alloc/free (64MB/page)
│   │   ├── eviction.c         #   LRU eviction (access_tick - last_access)
│   │   ├── oom.c              #   KV OOM killer (kill task with largest KV usage)
│   │   ├── ai_mm.c            #   Unified memory interface (KV + future GPU unified)
│   │   ├── context_page.c     #   Context window pages (per-task token mapping)
│   │   └── prefix_cache.c     #   Shared prompt cache (hit rate stats)
│   │
│   ├── resource/              # Resource management (7 files)
│   │   ├── lease.c            #   GPU + KV lease acquire / release
│   │   ├── gpu.c              #   GPU memory alloc / utilization / pressure
│   │   ├── budget.c           #   Token budget (consumed/remaining/overshoot count)
│   │   ├── quota.c            #   Tenant quota (token + kv + gpu shares)
│   │   ├── pressure.c         #   Pressure metrics GPU/KV/Token 0-100
│   │   └── token.c            #   Token pool + short-term borrowing (10% overdraw)
│   │
│   ├── drivers/               # Runtime drivers (6 files)
│   │   ├── driver.c           #   Driver registry (register/find/resolve/dispatch)
│   │   ├── mock_driver.c      #   Mock driver (sleep + rand token, for CI)
│   │   ├── vllm_driver.c      #   vLLM HTTP (POST /v1/completions)
│   │   ├── llama_driver.c     #   llama.cpp subprocess (popen llama-cli)
│   │   ├── trtllm_driver.c    #   TensorRT-LLM (Triton POST /v2/models)
│   │   └── cuda_driver.c      #   CUDA bare-metal driver (custom kernel)
│   │
│   ├── net/                   # Network + cluster (6 files)
│   │   ├── rpc.c              #   Unix socket IPC (daemon ↔ apeinxctl)
│   │   ├── channel.c          #   TCP channel (connect/send/recv, non-blocking + timeout)
│   │   ├── heartbeat.c        #   PING/PONG heartbeat (3s interval, 15s timeout)
│   │   ├── cluster.c          #   Cluster Manager (master: listen/accept/pick_best)
│   │   ├── node.c             #   Node Agent (worker: heartbeat/task receive)
│   │   └── state_sync.c       #   Cluster state sync (epoch/gpu/kv/budget)
│   │
│   ├── security/              # Security (5 files)
│   │   ├── tenant.c           #   Tenant CRUD + quota accounting
│   │   ├── policy.c           #   Policy engine (DENY/ALLOW/LIMIT, rule evaluation)
│   │   ├── capability.c       #   RBAC (admin/user/viewer → capability bits)
│   │   ├── namespace.c        #   Namespace isolation (PID visibility)
│   │   └── sandbox.c          #   Sandbox (file/network/CPU/memory limits)
│   │
│   ├── trace/                 # Observability (5 files)
│   │   ├── trace.c            #   Ring buffer (push/dump/clear)
│   │   ├── replay.c           #   CSV replay (full / per-pid)
│   │   ├── audit.c            #   Audit log (admin operations, ring buffer)
│   │   ├── event.c            #   Event filtering/stats (by type/pid/gpu)
│   │   └── metrics.c          #   Prometheus-style metrics (scheduler/task/kv/budget)
│   │
│   └── fs/                    # Filesystem (5 files)
│       ├── vfs.c              #   VFS (mount / read)
│       ├── modelfs.c          #   /models (model registry)
│       ├── memoryfs.c         #   /memory (KV status, cf. /proc/meminfo)
│       ├── tracefs.c          #   /trace (CSV event dump)
│       └── replayfs.c         #   /replay (per-task timeline)
│
├── user/                 # Userspace tools
│   ├── apeinxctl/             # CLI (6 commands)
│   │   ├── main.c             #   Subcommand dispatch
│   │   ├── submit.c           #   submit <name> <tokens> <prio>
│   │   ├── top.c              #   Real-time GPU/task/KV dashboard
│   │   ├── kill.c             #   kill <pid>
│   │   ├── replay.c           #   Replay trace events
│   │   └── billing.c          #   Tenant usage + cost estimate
│   │
│   └── libapeinx/             # Client library (3 files)
│       ├── client.c           #   Socket communication wrapper
│       ├── syscall_user.c     #   Type-safe syscall wrappers
│       └── api.c              #   High-level API (ax_run_sync submit+wait)
│
├── tests/                # Unit tests (6 files)
│   ├── test_sched.c
│   ├── test_mm.c
│   ├── test_lease.c
│   ├── test_syscall.c
│   ├── test_tracefs.c
│   └── test_replay.c
│
├── examples/             # Config + data
│   ├── apeinx.conf       #   Example config
│   ├── tasks.csv         #   100-task stress test data (10 task types)
│   ├── local4gpu.conf    #   4GPU config
│   └── mock_models.conf  #   Mock model registry
│
└── scripts/              # Python environment
    ├── setup_venv.bat    #   Windows venv one-click setup
    ├── setup_venv.sh     #   Linux/macOS venv
    └── requirements.txt  #   pytest, requests

Five Technical Moats

# Abstraction Description
1 AI Process Agent / Inference / Tool Run → unified AI Process (task → agent → stream → wait)
2 Token Fair Scheduler Linux CFS → token runtime fairness + GPU/KV pressure + deadline boost
3 KV Memory Manager KV Cache → 64MB pages + LRU eviction + OOM killer + prefix cache
4 Runtime Driver Layer Manage vLLM / llama.cpp / TRT-LLM / CUDA like Linux manages devices (5 drivers)
5 TraceFS / ReplayFS AI execution → auditable, reproducible system logs (event/metrics/audit)

IPC Protocol

Unix socket / TCP, text protocol, one line per frame, \n delimited

→ SUBMIT <name> <tokens> <priority> [deadline]   ← OK / ERR
→ KILL <pid>                                       ← OK / ERR
→ STATUS / TOP                                     ← multi-line + END
→ REPLAY                                           ← CSV events + END
→ BILLING                                          ← tenant usage + END

Roadmap

Phase Version Description Files Status
0 Spec AIOS core abstractions defined 7 docs
1 v0.01 Local AI Kernel prototype (boot/task/sched/trace/rpc) 11
2 v0.1 Local multi-GPU AIOS (fair/KV/budget/quota/pressure) 16
3 v0.2 Connect real runtimes (vLLM/llama/TRT-LLM/CUDA driver) 5
4 v0.3 Distributed cluster (master/worker/heartbeat/failover) 9
5 v0.5 Enterprise control plane (tenant/policy/RBAC/sandbox/audit) 9
6 v1.0 AI-native OS ecosystem (docs/tests/libraries) 18

License

MIT — see LICENSE

About

Apeinx is the Linux for AI: managing GPUs, tokens, models, and autonomous agents.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors