Apeinx — AI-native OS Kernel

Apeinx is the AI-native operating system kernel for GPU, token, KV-cache, model runtime, and agent process management.

Apeinx ≠ scheduler
Apeinx ≠ vLLM plugin
Apeinx ≠ Ray/K8s wrapper

Apeinx = the AI-native Linux

TL;DR

Apeinx is an AI-native OS kernel for the AI computing era, managing GPU, Token, KV Cache, model runtime, agent processes, and replayable execution evidence.

Linux Comparison

Linux	Apeinx
CPU	GPU
Process (task_struct)	AI Task / Agent Process
Memory (page frame)	Token / KV Cache (64MB pages)
File (VFS)	Model Runtime (VFS: /trace /replay /models /memory)
Device (device_driver)	Runtime Driver (vLLM / llama.cpp / TensorRT-LLM / CUDA)
Syscall (int 0x80)	AI Syscall (Unix socket text protocol)
Scheduler (CFS)	Token Fair Scheduler (vruntime = tokens/weight + pressure)
OOM Killer	KV OOM Killer (largest KV consumer)
kswapd (LRU reclaim)	KV LRU Eviction (access_tick aging)
cgroup / namespace	Tenant / Sandbox / Capability
dmesg / auditd	Trace / Replay / Audit
/proc / nvidia-smi	apeinxctl (status / top / billing)
cluster (none)	Cluster Manager (master/worker, heartbeat, failover)

Architecture

         ┌────────────────────────┐
         │      apeinxctl          │  CLI: status submit kill top replay billing
         └───────────┬────────────┘
                     │ Unix Socket IPC
         ┌───────────▼────────────┐
         │       apeinxd           │  AI Kernel Daemon (21 subsystems)
         │                         │
         │  ┌───────────────────┐  │
         │  │  Token Fair       │  │  min-heap, vruntime, GPU/KV pressure
         │  │  Scheduler        │  │  preempt, greedy, cluster placement
         │  ├───────────────────┤  │
         │  │  KV Memory Mgr    │  │  64MB pages, LRU eviction, OOM killer
         │  │                   │  │  context window, prefix cache
         │  ├───────────────────┤  │
         │  │  Runtime Drivers  │  │  mock / vLLM / llama.cpp / TRT-LLM / CUDA
         │  ├───────────────────┤  │
         │  │  Resource Control │  │  GPU mem / token budget / quota / pressure / lease
         │  ├───────────────────┤  │
         │  │  Cluster Manager  │  │  master/worker, TCP heartbeat, failover
         │  ├───────────────────┤  │
         │  │  Security         │  │  tenant / policy (DENY/ALLOW/LIMIT) / RBAC / sandbox
         │  ├───────────────────┤  │
         │  │  Trace / Audit    │  │  ring buffer, CSV, per-task replay, metrics
         │  ├───────────────────┤  │
         │  │  Filesystem       │  │  VFS: /trace /replay /models /memory
         │  ├───────────────────┤  │
         │  │  Agent / Stream   │  │  Agent process, token stream, state machine, wait
         │  └───────────────────┘  │
         └───────────┬────────────┘
                     │ Driver Interface
         ┌───────────▼────────────┐
         │  vLLM / llama.cpp      │  External Inference Runtimes
         │  SGLang / TensorRT-LLM │
         └───────────┬────────────┘
                     │ CUDA
         ┌───────────▼────────────┐
         │       GPU(s)            │
         └────────────────────────┘

Quick Start

git clone https://github.com/your/apeinx
cd apeinx
make

# Single-node daemon
./build/apeinxd --demo

# Another terminal
./build/apeinxctl status
./build/apeinxctl submit my-task 1000 5
./build/apeinxctl top
./build/apeinxctl replay

# 100-task stress test
./build/apeinxd --csv examples/tasks.csv --limit 100 &

# Cluster
./build/apeinxd --master 9800 --csv examples/tasks.csv &
./build/apeinxd --worker 127.0.0.1 9800 &

# Python venv (required for vLLM / SGLang drivers)
scripts/setup_venv.bat    # Windows
bash scripts/setup_venv.sh  # Linux/macOS

Full Directory Tree

apeinx/
├── README.md
├── LICENSE              # MIT
├── CONTRIBUTING.md
├── Makefile             # Single make builds everything
├── requirements.txt     # Python dependencies
│
├── docs/                # 8 design documents
│   ├── architecture.md  #   Architecture overview
│   ├── apeinx_vs_linux.md # Linux comparison
│   ├── syscall.md       #   AI syscall spec
│   ├── scheduler.md     #   Token Fair scheduler
│   ├── memory.md        #   KV memory management
│   ├── roadmap.md       #   Phase 0-6 roadmap
│   └── ...
│
├── include/apeinx/      # 10 header files
│   ├── types.h          #   ax_pid_t, ax_state_t, ax_result_t ...
│   ├── errno.h          #   AX_OK / AX_EINVAL / AX_ENOMEM ...
│   ├── config.h         #   ax_config_t (boot config)
│   ├── kernel.h         #   ax_kernel_t (global kernel state, all subsystems)
│   ├── trace.h          #   ax_event_t + ax_event_type_t
│   ├── sched.h          #   ax_fair_rq_t (min-heap) + prio→weight
│   ├── mm.h             #   ax_kv_page_t + ax_kvmm_t
│   ├── driver.h         #   ax_driver_ops_t (5 callbacks)
│   ├── net.h            #   ax_node_t + ax_cluster_t
│   └── security.h       #   ax_tenant_t + ax_policy_t + ax_audit_entry_t
│
├── kernel/              # 50 kernel source files
│   │
│   ├── init/                  # Boot
│   │   ├── boot.c             #   ax_boot() init call chain
│   │   └── init_task.c        #   main() + select() event loop + IPC + CSV
│   │
│   ├── core/                  # Kernel core
│   │   ├── kernel.c           #   g_kernel global instance + ax_kernel_init()
│   │   ├── config.c           #   key=value config parser
│   │   ├── panic.c            #   Unrecoverable error → dump → exit
│   │   ├── errno.c            #   Error code → strerror()
│   │   ├── clock.c            #   Monotonic clock (μs precision)
│   │   └── id.c               #   Global ID generator
│   │
│   ├── syscall/               # Syscall dispatch table
│   │   └── syscall.c          #   register / dispatch / call count
│   │
│   ├── process/               # AI process management
│   │   ├── task.c             #   ax_task_create / find / free
│   │   ├── agent.c            #   Agent Process container (sub-task orchestration)
│   │   ├── stream.c           #   Token inference stream (ring buffer)
│   │   ├── state.c            #   State machine validation + batch transition
│   │   └── wait.c             #   waitpid()-style blocking wait
│   │
│   ├── sched/                 # Scheduler (8 files)
│   │   ├── sched.h            #   ax_fair_entity_t + min-heap + prio→weight table
│   │   ├── fair.c             #   Token Fair: vruntime formula + heap_enqueue/pick
│   │   ├── greedy.c           #   Greedy baseline: always pick highest priority
│   │   ├── cost_model.c       #   Token cost estimation (matched by task name)
│   │   ├── placement.c        #   Local GPU selection (least running_tasks)
│   │   ├── placement_cluster.c #  Cross-node GPU selection
│   │   ├── preempt.c          #   Preempt when vruntime gap >3x → PREEMPTED
│   │   └── failover.c         #   Node failure → task migration + epoch++
│   │
│   ├── mm/                    # KV memory management (6 files)
│   │   ├── kv_cache.c         #   Page alloc/free (64MB/page)
│   │   ├── eviction.c         #   LRU eviction (access_tick - last_access)
│   │   ├── oom.c              #   KV OOM killer (kill task with largest KV usage)
│   │   ├── ai_mm.c            #   Unified memory interface (KV + future GPU unified)
│   │   ├── context_page.c     #   Context window pages (per-task token mapping)
│   │   └── prefix_cache.c     #   Shared prompt cache (hit rate stats)
│   │
│   ├── resource/              # Resource management (7 files)
│   │   ├── lease.c            #   GPU + KV lease acquire / release
│   │   ├── gpu.c              #   GPU memory alloc / utilization / pressure
│   │   ├── budget.c           #   Token budget (consumed/remaining/overshoot count)
│   │   ├── quota.c            #   Tenant quota (token + kv + gpu shares)
│   │   ├── pressure.c         #   Pressure metrics GPU/KV/Token 0-100
│   │   └── token.c            #   Token pool + short-term borrowing (10% overdraw)
│   │
│   ├── drivers/               # Runtime drivers (6 files)
│   │   ├── driver.c           #   Driver registry (register/find/resolve/dispatch)
│   │   ├── mock_driver.c      #   Mock driver (sleep + rand token, for CI)
│   │   ├── vllm_driver.c      #   vLLM HTTP (POST /v1/completions)
│   │   ├── llama_driver.c     #   llama.cpp subprocess (popen llama-cli)
│   │   ├── trtllm_driver.c    #   TensorRT-LLM (Triton POST /v2/models)
│   │   └── cuda_driver.c      #   CUDA bare-metal driver (custom kernel)
│   │
│   ├── net/                   # Network + cluster (6 files)
│   │   ├── rpc.c              #   Unix socket IPC (daemon ↔ apeinxctl)
│   │   ├── channel.c          #   TCP channel (connect/send/recv, non-blocking + timeout)
│   │   ├── heartbeat.c        #   PING/PONG heartbeat (3s interval, 15s timeout)
│   │   ├── cluster.c          #   Cluster Manager (master: listen/accept/pick_best)
│   │   ├── node.c             #   Node Agent (worker: heartbeat/task receive)
│   │   └── state_sync.c       #   Cluster state sync (epoch/gpu/kv/budget)
│   │
│   ├── security/              # Security (5 files)
│   │   ├── tenant.c           #   Tenant CRUD + quota accounting
│   │   ├── policy.c           #   Policy engine (DENY/ALLOW/LIMIT, rule evaluation)
│   │   ├── capability.c       #   RBAC (admin/user/viewer → capability bits)
│   │   ├── namespace.c        #   Namespace isolation (PID visibility)
│   │   └── sandbox.c          #   Sandbox (file/network/CPU/memory limits)
│   │
│   ├── trace/                 # Observability (5 files)
│   │   ├── trace.c            #   Ring buffer (push/dump/clear)
│   │   ├── replay.c           #   CSV replay (full / per-pid)
│   │   ├── audit.c            #   Audit log (admin operations, ring buffer)
│   │   ├── event.c            #   Event filtering/stats (by type/pid/gpu)
│   │   └── metrics.c          #   Prometheus-style metrics (scheduler/task/kv/budget)
│   │
│   └── fs/                    # Filesystem (5 files)
│       ├── vfs.c              #   VFS (mount / read)
│       ├── modelfs.c          #   /models (model registry)
│       ├── memoryfs.c         #   /memory (KV status, cf. /proc/meminfo)
│       ├── tracefs.c          #   /trace (CSV event dump)
│       └── replayfs.c         #   /replay (per-task timeline)
│
├── user/                 # Userspace tools
│   ├── apeinxctl/             # CLI (6 commands)
│   │   ├── main.c             #   Subcommand dispatch
│   │   ├── submit.c           #   submit <name> <tokens> <prio>
│   │   ├── top.c              #   Real-time GPU/task/KV dashboard
│   │   ├── kill.c             #   kill <pid>
│   │   ├── replay.c           #   Replay trace events
│   │   └── billing.c          #   Tenant usage + cost estimate
│   │
│   └── libapeinx/             # Client library (3 files)
│       ├── client.c           #   Socket communication wrapper
│       ├── syscall_user.c     #   Type-safe syscall wrappers
│       └── api.c              #   High-level API (ax_run_sync submit+wait)
│
├── tests/                # Unit tests (6 files)
│   ├── test_sched.c
│   ├── test_mm.c
│   ├── test_lease.c
│   ├── test_syscall.c
│   ├── test_tracefs.c
│   └── test_replay.c
│
├── examples/             # Config + data
│   ├── apeinx.conf       #   Example config
│   ├── tasks.csv         #   100-task stress test data (10 task types)
│   ├── local4gpu.conf    #   4GPU config
│   └── mock_models.conf  #   Mock model registry
│
└── scripts/              # Python environment
    ├── setup_venv.bat    #   Windows venv one-click setup
    ├── setup_venv.sh     #   Linux/macOS venv
    └── requirements.txt  #   pytest, requests

Five Technical Moats

#	Abstraction	Description
1	AI Process	Agent / Inference / Tool Run → unified AI Process (task → agent → stream → wait)
2	Token Fair Scheduler	Linux CFS → token runtime fairness + GPU/KV pressure + deadline boost
3	KV Memory Manager	KV Cache → 64MB pages + LRU eviction + OOM killer + prefix cache
4	Runtime Driver Layer	Manage vLLM / llama.cpp / TRT-LLM / CUDA like Linux manages devices (5 drivers)
5	TraceFS / ReplayFS	AI execution → auditable, reproducible system logs (event/metrics/audit)

IPC Protocol

Unix socket / TCP, text protocol, one line per frame, \n delimited

→ SUBMIT <name> <tokens> <priority> [deadline]   ← OK / ERR
→ KILL <pid>                                       ← OK / ERR
→ STATUS / TOP                                     ← multi-line + END
→ REPLAY                                           ← CSV events + END
→ BILLING                                          ← tenant usage + END

Roadmap

Phase	Version	Description	Files	Status
0	Spec	AIOS core abstractions defined	7 docs	✅
1	v0.01	Local AI Kernel prototype (boot/task/sched/trace/rpc)	11	✅
2	v0.1	Local multi-GPU AIOS (fair/KV/budget/quota/pressure)	16	✅
3	v0.2	Connect real runtimes (vLLM/llama/TRT-LLM/CUDA driver)	5	✅
4	v0.3	Distributed cluster (master/worker/heartbeat/failover)	9	✅
5	v0.5	Enterprise control plane (tenant/policy/RBAC/sandbox/audit)	9	✅
6	v1.0	AI-native OS ecosystem (docs/tests/libraries)	18	✅

License

MIT — see LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apeinx — AI-native OS Kernel

TL;DR

Linux Comparison

Architecture

Quick Start

Full Directory Tree

Five Technical Moats

IPC Protocol

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
examples		examples
include/apeinx		include/apeinx
kernel		kernel
scripts		scripts
tests		tests
tools		tools
user		user
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README-CN.md		README-CN.md
README.md		README.md
build.bat		build.bat
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Apeinx — AI-native OS Kernel

TL;DR

Linux Comparison

Architecture

Quick Start

Full Directory Tree

Five Technical Moats

IPC Protocol

Roadmap

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages