sws — Single Worker Server

A single-thread async HTTP + WebSocket server on Linux io_uring, in Zig 0.16.0.

IO thread (io_uring Ring A + fiber):
  ├── accept/read/write CQE → fiber → handler → respond
  ├── drain user SubmitQueues
  ├── drain Next.go() ringbuffer tasks
  ├── drain DeferredResponse / InvokeQueue → respond
  ├── drainTick (DNS tick + invoke.drain + tick_hooks)
  ├── FiberShared.tick() — harvest outbound rings (Ring B/C/D...)
  └── TTL incremental scan (StackPool live list)

Worker pool (optional, offload CPU/GPU/blocking I/O):
  └── Next.submit() → worker thread → compute → InvokeQueue → IO thread drains

Handlers run as fibers on the IO thread by default.

Next.go() — fiber on IO thread, zero thread switch. Use for DB io_uring, async I/O.
Next.submit() — worker pool. Use only for CPU-intensive computation that would block.

Requirements

Linux 5.1+ (io_uring)
Zig 0.16.0

Quick Start

git clone https://github.com/fndome/sws
cd sws
zig build run

Use as a Library

const sws = @import("sws");

pub fn main() !void {
    var server = try sws.AsyncServer.init(alloc, io, "0.0.0.0:9090", null, 64);
    defer server.deinit();

    server.GET("/hello", myHandler);
    try server.run();
}

Architecture

Single IO thread + fiber

The entire event loop runs on one IO thread. Handlers execute as fibers (user-space coroutines) on the same thread.

IO thread (single):
  io_uring.submit_and_wait(1)
    → CQE dispatch (via StackPool sticker)
    → fiber → handler → ctx.text/json/html
    → drainPendingResumes (fiber resume queue)
    → drainNextTasks (Next.go ringbuffer tasks)
    → drainTick (DNS tick + invoke.drain + tick_hooks)
    → FiberShared.tick() (outbound Ring B/C/D harvest)
    → TTL scan (StackPool live list, incremental)
    → loop

No background threads unless you call server.initPool4NextSubmit(n).

StackPool — O(1) connection pool

Connections are stored in a pre-allocated array (not a hash map). O(1) acquire/release via freelist.

StackPool<StackSlot, 1_048_576>
  ├── slots: [1M]StackSlot — contiguous, cache-line-aligned
  ├── freelist: [1M]u32 — O(1) pop/push
  ├── live: []u32 — active slot indices (TTL scan source)
  └── warmup() — touch all pages to eliminate cold-start faults

StackSlot (384 bytes, 5 cache lines)

Each connection slot is split across independent cache lines for contention-free hot-path access:

line1 ( 64B): fd, gen_id, state, write_offset, req_count — CQE dispatch (hottest)
line2 ( 64B): conn_id, last_active_ms, active_list_pos — TTL scanning
line3 ( 64B): fiber_context, large_buf_ptr — async anchors, Worker Pool, oversized body
line4 (128B): response_buf, write_iovs, ws_write_queue — write path (low frequency)
line5 ( 64B): sentinel (0x53574153) + workspace union — HTTP/WS/Compute view

Ghost event defense: user_data = (gen_id << 32) | idx. After close, gen_id is zeroed. Any in-flight CQE arriving after close fails the gen_id match and is silently discarded.

Workspace switching: The line5.ws union switches between HttpWork, WsWork, and ComputeWork views depending on connection state — no heap allocation for protocol parsing state.

Ring A + FiberShared — multi-ring architecture

Ring A (built-in): the main server's io_uring ring — accept, connection read/write, DNS, invoke.

Outbound rings (Ring B, Ring C...): independent io_uring rings for outbound protocols. Registered with FiberShared, which the IO thread harvests every event loop iteration — zero extra threads, zero locks.

Ring A (main server, IO thread):
  ├── accept / read / write / close
  ├── io_registry (client callbacks)
  ├── dns_resolver (async UDP DNS)
  ├── rs.invoke (cross-thread push → IO thread callback)
  └── FiberShared.tick()
        ├── Ring B (HTTP client) : ATTACH_WQ → shared io-wq
        │     ├── DnsResolver
        │     ├── IORegistry
        │     ├── InvokeQueue
        │     └── TinyCache
        ├── Ring C (NATS client futures)
        └── Ring D (MySQL client futures)

FiberShared

Pure scheduling glue layer. Holds tick handles for all outbound rings. IO thread calls tick() every loop iteration to non-blocking harvest CQEs from all registered rings.

const FiberShared = @import("sws").FiberShared;
const RingTrait = @import("sws").RingTrait;

var fiber_shared = try FiberShared.init(allocator);
defer fiber_shared.deinit();

// Ring B (HTTP client) registers itself
try ring_b.registerWith(&fiber_shared);

// Any new ring just implements RingTrait:
//   ptr: *anyopaque  — pointer to ring instance
//   tickFn: fn(*anyopaque) void  — dns.tick + invoke.drain + submit + copy_cqes + dispatch

RingTrait

Contract that every outbound ring must implement:

dns.tick() — drive DNS query state machine
invoke.drain() — process cross-thread task dispatch
ring.submit() — submit pending SQEs
ring.copy_cqes() — non-blocking harvest CQEs
registry.dispatch(ud, res) — dispatch CQEs to registered callbacks

Init

var server = try AsyncServer.init(alloc, io, "0.0.0.0:9090", app_ctx, fiber_stack_size_kb);
//                                                                    ↑ 0 = 64KB

First handler/middleware registration calls ensureNext() → creates Next (ringbuffer) + setDefault().

Internally, AsyncServer.init() creates:

pool: StackPool — O(1) contiguous connection array
large_pool: LargeBufferPool(64) — 64 × 1MB blocks for oversized requests (>32KB)
rs: RingShared — single ring shared resource (ring + registry + invoke)
io_registry: IORegistry — outbound client connection registry
dns_resolver: DnsResolver — async UDP DNS with TTL cache

To add outbound rings (HTTP/NATS/MySQL), inject FiberShared:

const FiberShared = @import("sws").FiberShared;
const RingB = @import("sws").HttpRing;   // same as RingB
const HttpClient = @import("sws").HttpClient;

var fiber_shared = try FiberShared.init(alloc);
defer fiber_shared.deinit();

// Ring B with 1s built-in TinyCache TTL:
var ring_b = try RingB.init(alloc, io, server.ring.fd, 1000);
defer ring_b.deinit();
try ring_b.registerWith(&fiber_shared);

// HttpClient auto-uses RingB's built-in cache:
var http_client = try HttpClient.init(alloc, &ring_b);
defer http_client.deinit();

// Set fiber_shared on server so IO thread harvests outbound CQEs
server.fiber_shared = &fiber_shared;

Handler — Synchronous (on IO thread)

fn hello(allocator: Allocator, ctx: *Context) anyerror!void {
    ctx.text(200, "hello");
}

Handler — `Next.go` (fiber, IO thread, no thread switch)

For async I/O (DB io_uring, HTTP client):

const Ctx = struct { allocator: Allocator, resp: *DeferredResponse };

fn exec(c: *Ctx, complete: *const fn (?*anyopaque, []const u8) void) void {
    defer c.allocator.destroy(c);
    defer c.allocator.destroy(c.resp);
    c.resp.json(200, "[{\"id\":1}]");
    complete(c, "");
}

fn myHandler(allocator: Allocator, ctx: *Context) anyerror!void {
    const s: *AsyncServer = @ptrCast(@alignCast(ctx.server.?));
    const resp = try allocator.create(DeferredResponse);
    resp.* = .{ .server = s, .conn_id = ctx.conn_id, .allocator = allocator };
    ctx.deferred = true;
    Next.go(Ctx, .{ .allocator = allocator, .resp = resp }, exec);
}

Handler — `Next.submit` (worker pool, thread switch)

For offload work (crypto, compression, LLM/GPU inference, blocking I/O):

const Ctx = struct { allocator: Allocator, resp: *DeferredResponse };

fn exec(c: *Ctx, complete: *const fn (?*anyopaque, []const u8) void) void {
    defer c.allocator.destroy(c);
    defer c.allocator.destroy(c.resp);
    // Offload work here (CPU/GPU/blocking I/O)...
    c.resp.json(200, "{\"done\": true}");
    complete(c, "");
}

fn myHandler(allocator: Allocator, ctx: *Context) anyerror!void {
    const s: *AsyncServer = @ptrCast(@alignCast(ctx.server.?));
    const resp = try allocator.create(DeferredResponse);
    resp.* = .{ .server = s, .conn_id = ctx.conn_id, .allocator = allocator };
    ctx.deferred = true;
    Next.submit(Ctx, .{ .allocator = allocator, .resp = resp }, exec);
}

Worker pool (for Next.submit)

try server.initPool4NextSubmit(1); // 1 worker thread (recommended)

Recommendations:

1 — default, sufficient for crypto, compression
N/2 (e.g. 4 on 8-core) — sustained LLM/GPU inference or blocking I/O

DeferredResponse

Sends HTTP response from any thread (CAS-based lock-free):

resp.json(200, "{\"ok\":true}");
resp.text(200, "plain");

Deferred Hooks, Tick Hooks

Execute custom logic before each deferred response is sent, on the IO thread. Essential for MMORPG / real-time use cases (update game state, leaderboard, broadcast):

fn updateGameState(server: *AsyncServer, node: *DeferredNode) void {
    const world: *GameWorld = @ptrCast(@alignCast(server.app_ctx.?));
    world.update(node.body);
}

try server.addHookDeferred(updateGameState);

Rules:

Hooks run in registration order on the IO thread — safe for IO-thread-exclusive data
node.body is valid during hook execution; do NOT free it
Do NOT store node pointer — the node is destroyed after the hook returns
Must not panic (log errors instead)

Room Auto-Battle Example

Rooms with countdown → auto-battle for hundreds of players. Two hooks cooperate: addHookTick checks deadlines every loop iteration (no deferred node needed); addHookDeferred processes incoming player commands. Battle CPU work offloaded via Next.submit. Zero locks — all state on IO thread.

const Room = struct {
    id: u64,
    state: enum { waiting, fighting, settle },
    deadline: i64,                  // monotonic timestamp
    teams: [2]std.ArrayList(*Player),
};

const Player = struct { id: u64, hp: u32, atk: u32 };

const BattleCtx = struct {
    blue_team: []PlayerSnapshot,
    red_team:  []PlayerSnapshot,
};

const PlayerSnapshot = struct { hp: u32, atk: u32 };

fn roomTick(server: *AsyncServer) void {
    const app: *GameApp = @ptrCast(@alignCast(server.app_ctx.?));
    for (app.rooms.items) |*room| {
        if (room.state == .waiting and server.monotonic_ms() >= room.deadline) {
            room.state = .fighting;
            startBattle(server, room);
        }
    }
}

fn roomCommand(server: *AsyncServer, node: *DeferredNode) void {
    const app: *GameApp = @ptrCast(@alignCast(server.app_ctx.?));
    app.processCommand(node.body);  // join / ready / action
}

fn startBattle(server: *AsyncServer, room: *Room) void {
    const ctx = server.allocator.create(BattleCtx) catch return;
    ctx.blue_team = snapshotTeam(&room.teams[0], server.allocator) catch return;
    ctx.red_team  = snapshotTeam(&room.teams[1], server.allocator) catch return;
    Next.submit(BattleCtx, ctx, doBattle);
}

fn doBattle(ctx: *BattleCtx, complete: *const fn (?*anyopaque, []const u8) void) void {
    const result = simulateCombat(ctx.blue_team, ctx.red_team);
    var buf: [4096]u8 = undefined;
    const json = result.toJson(&buf);
    server.sendDeferredResponse(room_id, 200, .json, json);
    _ = complete;
}

try server.addHookTick(roomTick);        // tick: fires every IO loop
try server.addHookDeferred(roomCommand); // deferred: fires per-player command

Next.go / Next.submit

Next.go(Ctx, ctx, exec);       // fiber on IO thread (io_uring I/O)
Next.submit(Ctx, ctx, exec);   // worker pool (offload work)

Both are static. Next.go works out of the box (auto setDefault on first route). Next.submit requires server.initPool4NextSubmit(n).

GPU / Heavy Compute

GPU compute uses Next.submit — worker thread calls CUDA / CANN / Vulkan runtime. io_uring direct dispatch for GPU is blocked on Linux kernel drivers (missing IORING_OP_URING_CMD for compute queues, NVIDIA / Huawei not yet shipped).

Once drivers add it, IORegistry handles GPU with zero code changes — same register(id, ptr, on_cqe) → submit SQE → dispatch CQE pattern.

Current: fiber + worker pool

Worker pool always supports fiber. GPU task calls Fiber.workerYield(poll, ctx) after submitting a kernel, freeing the worker thread to process other tasks while the GPU runs. The worker tick polls parked fibers and resumes when the kernel completes.

// CPU task — no yield, runs to completion
Next.submit(CpuCtx, ctx, struct {
    fn exec(c: *CpuCtx, complete: ...) void {
        const result = heavyCompute(c.input);
        complete(c, result);
    }
}.exec);

// GPU task — MUST call workerYield after submitting kernel
//                                 ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
Next.submit(GpuCtx, ctx, struct {
    fn exec(c: *GpuCtx, complete: ...) void {
        cudaLaunchKernel(kernel, stream, args);
        Fiber.workerYield(            // ← THIS LINE makes it a GPU task
            struct { fn poll(s: *anyopaque) bool {
                return cuStreamQuery(@ptrCast(@alignCast(s))) == CUDA_SUCCESS;
            }}.poll,
            @ptrCast(stream),
        );
        // resume point — GPU done
        complete(c, output);
    }
}.exec);

The only difference between CPU and GPU: GPU tasks call Fiber.workerYield. Without it, the worker thread blocks synchronously until the kernel completes, defeating fiber multiplexing.

⚠️ GPU tasks MUST use Next.submit, never Next.go.

Next.go runs on the IO thread. Two failure modes:

Without workerYield: cuStreamSynchronize blocks the IO thread — io_uring CQE processing stops, entire server freezes.

With workerYield: fiber yields correctly, IO thread stays alive — but the fiber never wakes up. The IO thread has no poll tick; it only responds to io_uring CQEs. GPU kernels don't produce CQEs, so the IO thread never learns the kernel finished.

Worker threads have a built-in poll tick (while poll_fn() try resume) which is why GPU works there: workerYield → park → tick → poll → resume.

IMPORTANT: GPU uses initPool4NextSubmit(1). GPU drivers are async internally — one worker + fiber can submit N streams and poll for completion. No extra thread pool needed. io_uring not yet supported for GPU compute (kernel driver gap).

RingShared

RingShared is the materialization of a single io_uring ring + single thread — injected into server and any outbound client, all equal.

const rs = server.rs;  // { ring, registry, invoke, io_tid }
// Any client is injected equally:
var client = try RingSharedClient.init(alloc, rs, ...);
var http   = try HttpClient.init(alloc, ring_b, cache);

rs.ringPtr() / rs.registryPtr() — IO-thread assertion guard (non-IO thread access → @panic)
rs.invoke.push() — any-thread-safe CAS callback (worker → IO thread)

RingSharedClient

io_uring-driven outbound TCP client. Glue layer for integrating NATS / Redis / HTTP client libraries into sws's IO thread — no separate runtime, no locks.

const RingSharedClient = @import("sws").RingSharedClient;

fn onData(ctx: ?*anyopaque, data: []u8) void {
    const nats: *NatsClient = @ptrCast(@alignCast(ctx));
    nats.feed(data);
}

fn onClose(ctx: ?*anyopaque) void {
    const nats: *NatsClient = @ptrCast(@alignCast(ctx));
    nats.discard();
}

// In main(), before server.run():
var cs = try RingSharedClient.init(allocator, server.rs, onData, onClose, nats_ctx);
defer cs.deinit();
try cs.connect("127.0.0.1", 4222);

// Send data (queued, submitted via io_uring)
try cs.write("PUB subject 5\r\nhello\r\n");
cs.close();  // graceful

All I/O on sws IO thread — onData / onClose run in the same context as hooks
write() queues data; pending writes auto-flushed as io_uring CQEs arrive
Protocol layer (NATS / Redis / HTTP) only needs feed([]u8) and write([]const u8)
Multiple clients per server; user_data uses a dedicated high bit to avoid collisions

TinyCache (built into RingB)

Single-entry TTL connection cache for outbound protocols. Owned by RingB — all lifecycle (init, tick, evict, deinit) is managed automatically. Users get connection reuse for free with HttpClient.

Same host:port connections auto-reused within TTL window
Expired entries auto-evicted by RingB.tick() each event loop iteration
Connect phase allows retries; read/write phase forbids retries (kernel TCP stack guarantees SQE-level writes)

Pipe

Adapts RingSharedClient's push model to a pull model (reader.read / writer.write). Enables synchronous-protocol libraries (pgz, myzql) to run directly on the IO thread via fiber yield/resume — no worker threads, no locks.

// In main(), after AsyncServer.init() and before server.run():
const Pipe = @import("sws").Pipe;
const RingSharedClient = @import("sws").RingSharedClient;

fn onData(ctx: ?*anyopaque, data: []u8) void {
    const p: *Pipe = @ptrCast(@alignCast(ctx));
    p.feed(data) catch {};
}

fn onClose(ctx: ?*anyopaque) void {
    const p: *Pipe = @ptrCast(@alignCast(ctx));
    p.reset();
}

var cs = try RingSharedClient.init(allocator, server.rs, onData, onClose, &pipe);
var pipe = try Pipe.init(allocator, cs);
defer pipe.deinit();

try cs.connect("localhost", 5432);
// ... wait for connect (yield) ...

// Any protocol lib with anytype reader/writer works:
// var conn = try pgz.Connection.init(allocator, pipe.reader(), pipe.writer());
// var result = try conn.query("SELECT 1", struct { u8 });

feed(data) pushes bytes from ClientStream → read buffer, resumes waiting fiber
reader.read() blocks the fiber (via yield) until data arrives — looks synchronous to caller
writer.write() queues into buffer; flushWrite() sends via ClientStream
reset() clears buffers on disconnect/reconnect
Requires protocol library to accept anytype reader/writer (pgz needs 1-line patch on WriteBuffer.send)

LargeBufferPool

For oversized requests (Content-Length > 32KB) that can't fit in the 64KB shared fiber stack. Pre-allocated 1MB blocks with O(1) freelist acquire/release.

const LargeBufferPool = @import("sws").LargeBufferPool;

// 64 blocks × 1MB = 64MB — built into AsyncServer by default
// Usage in oversized body path:
const buf = self.large_pool.acquire() orelse return error.OutOfLargeBuffers;
// io_uring READ CQE writes directly to buf.ptr
// ... process body ...
self.large_pool.release(buf);

HttpRing + HttpClient (Ring B)

Independent io_uring Ring B for outbound HTTP client. Shares the kernel io-wq thread pool via IORING_SETUP_ATTACH_WQ. TinyCache is built into RingB — same host:port connections are automatically reused within the TTL window and evicted by RingB.tick().

const sws = @import("sws");

// Ring B init (attached to server's Ring A io-wq, 1s cache TTL):
var ring_b = try sws.HttpRing.init(allocator, io, server.ring.fd, 1000);
defer ring_b.deinit();
try ring_b.registerWith(&fiber_shared);

// HttpClient — cache is automatically managed by RingB:
var http_client = try sws.HttpClient.init(allocator, &ring_b);
defer http_client.deinit();

// Use from handler:
const resp = try http_client.get("http://api.example.com/data");
defer resp.deinit();

// POST with body:
const resp2 = try http_client.post("http://api.example.com/submit", "{\"key\":\"val\"}");

c-ares async DNS (optional)

Built-in DnsResolver covers basic needs (A record + TTL cache). For truncated UDP (TC bit → TCP retry) or SRV records, switch to c-ares:

sudo apt install libc-ares-dev

Add to build.zig:

exe.linkSystemLibrary("cares");

Switch DNS backend:

const HttpCaresDns = sws.HttpCaresDns;
// ring.dns = HttpCaresDns.init(alloc, ring.rs);

Fiber

Built-in fiber (x86_64 and ARM64 Linux). All handler fibers share a single pre-allocated stack buffer (stored in AsyncServer.shared_fiber_stack) — sequential execution, no per-request stack allocation, zero contention.

⚠️ Do NOT use std.Io.async() / future.await() in handlers.

Zig's Future is a thread-based design, not fiber-based:

async() → std.Thread.spawn + queued to OS thread pool (Threaded.zig:2112)

await() → Thread.futexWait — blocks the OS thread (Threaded.zig:2436)

On the IO thread, blocking means:

io_uring CQE processing stops — no new connections, no reads, no writes

The entire server stalls for the duration of the work

Why not patch the vtable to support Future on fibers?

future.await() requires the caller's stack frame to persist across suspension:
var future = io.async(work, .{data});
const result = future.await(io);   // fiber yields here — stack must survive
ctx.json(200, result);              // resumes here — expects data still intact
SWS uses a shared stack (one 64KB buffer, all fibers reuse it). When a fiber yields in await(), the next fiber's execution overwrites that same memory. The resumed fiber's stack frame is corrupted.

Switching to per-fiber stacks would fix this, but at a steep memory cost:

Concurrent requests Per-fiber stack Shared stack

1K 16 MB 64 KB

20K 320 MB 64 KB

200K 3.2 GB 64 KB

1M 16 GB 64 KB

(per-fiber stack at 16KB — the practical minimum for HTTP handlers)

At a typical production load of 200K concurrent requests, shared stack saves ~3GB. This directly translates to lower memory pressure and better operational stability.

This is the fundamental tradeoff: Future API semantics vs. 1M-connection memory model. SWS chooses the latter. All async is done via Next.go/Next.submit with callbacks instead of await-style suspension.

Fibers are cooperative; OS threads are preemptive. This breaks the fiber model.

Zig pattern SWS replacement

io.async(cpuWork) + future.await(io) Next.submit(Ctx, ctx, exec) + DeferredResponse

io.async(ioWork) + future.await(io) Next.go(Ctx, ctx, exec) (fiber on IO thread)

Pattern:
// ❌ Don't do this in handler — blocks IO thread:
// var future = io.async(heavyWork, .{data});
// const result = future.await(io);

// ✅ Do this instead — IO thread never blocks:
fn myHandler(allocator: Allocator, ctx: *Context) anyerror!void {
    ctx.deferred = true;
    const resp = try allocator.create(DeferredResponse);
    resp.* = .{ .server = server, .conn_id = ctx.conn_id, .allocator = allocator };
    Next.submit(Ctx, .{ .resp = resp, .data = data }, exec);
}
See Next.submit section above for the full exec/complete callback API.

Routing / Middleware / WebSocket / Context

See example/ and src/example.zig.

Memory Model (1M connections target)

Component	Size	Notes
StackSlot (per connection)	320 bytes	5 cache-line-aligned sub-structures
StackPool (1M slots)	~400 MB	384B per StackSlot, contiguous, warmup-touched
Freelist (1M u32)	4 MB	O(1) acquire/release
Read buffer (idle)	0 bytes	io_uring provided buffers, returned on idle
Slab for io_uring reads	64 MB	16384 × 4KB blocks, kernel-recycled
Tiered write pool	dynamic	8 size classes (512B–64KB), freelist-recycled
Shared fiber stack	64 KB	All fibers share one pre-allocated stack
LargeBufferPool	64 MB	64 × 1MB blocks for oversized requests
1M idle connections	~540 MB	No per-thread stack overhead

Like greatws, idle connections consume zero buffer memory.

Cache-line layout rationale

The 384-byte StackSlot is split across independent cache lines:

line1 (64B): fd, gen_id, state, write_offset — only this is touched during CQE dispatch
line2 (64B): conn_id, last_active_ms, active_list_pos — only touched during TTL scanning
line3 (64B): fiber_context, large_buf_ptr — async anchors (Worker Pool / oversized bodies)
line4 (128B): response_buf, write_iovs, WS queue — write path, not in the hot path
line5 (64B): sentinel + workspace union — protocol parser scratch, zero extra allocation

The IO loop's hottest path (CQE dispatch → slot lookup) only touches line1. TTL scanning only touches line2. No cache-line ping-pong between unrelated operations.

WebSocket payload copying

WS handlers may offload frame data asynchronously, so frame payloads must remain valid after handler returns. WS frame payloads are always duped — never zero-copy.

Performance impact (100B text frame):

Operation	Cost	Notes
memcpy(100B)	~10ns	Copy frame payload
GeneralPurposeAllocator alloc/free	~100ns	One alloc+free per frame

~110ns overhead per frame. 1M connections, 1% active, 10 msg/s each = 100K msg/s:

CPU: 100K × 110ns = 11ms/s = 1.1% of one core

Config

key	default	description
`fiber_stack_size_kb`	64	fiber stack size (KB). 0 = 64
`io_cpu`	null	pin IO thread to CPU core
`idle_timeout_ms`	30000	close idle connections
`write_timeout_ms`	5000	close stuck-write connections
`buffer_size`	4096	io_uring buffer block size
`buffer_pool_size`	16384	number of buffer blocks

invokeOnIoThread

Cross-thread safe callback to IO thread. Underneath is rs.invoke (CAS lock-free linked list), drained automatically in drainTick.

server.invokeOnIoThread(MyCtx, ctx, struct {
    fn run(allocator, c: *MyCtx) void {
        // Runs on IO thread — safe to access ring/registry
        c.client.write("PUB ...");
        allocator.free(c.data);
    }
}.run);

Advanced: io_uring-native DB Pool

Wire your DB driver's TCP fd into io_uring directly:

handler (fiber on IO thread):
  └── db.query(sql)
        └── io_uring write(fd, query) → CQE → io_uring read(fd) → CQE → parse
              → ctx.json(200, result)

For connection pooling: maintain a pool of connected TCP fds in a ringbuffer. Handler pops fd, issues write(sql) + read() via io_uring, parses result, pushes fd back.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
example		example
k8s		k8s
seccomp		seccomp
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Zig pattern	SWS replacement
`io.async(cpuWork)` + `future.await(io)`	`Next.submit(Ctx, ctx, exec)` + `DeferredResponse`
`io.async(ioWork)` + `future.await(io)`	`Next.go(Ctx, ctx, exec)` (fiber on IO thread)

Folders and files

Latest commit

History

Repository files navigation

sws — Single Worker Server

Requirements

Quick Start

Use as a Library

Architecture

Single IO thread + fiber

StackPool — O(1) connection pool

StackSlot (384 bytes, 5 cache lines)

Ring A + FiberShared — multi-ring architecture

FiberShared

RingTrait

Init

Handler — Synchronous (on IO thread)

Handler — Next.go (fiber, IO thread, no thread switch)

Handler — Next.submit (worker pool, thread switch)

Worker pool (for Next.submit)

DeferredResponse

Deferred Hooks, Tick Hooks

Room Auto-Battle Example

Next.go / Next.submit

GPU / Heavy Compute

RingShared

RingSharedClient

TinyCache (built into RingB)

Pipe

LargeBufferPool

HttpRing + HttpClient (Ring B)

c-ares async DNS (optional)

Fiber

Why not patch the vtable to support Future on fibers?

Routing / Middleware / WebSocket / Context

Memory Model (1M connections target)

Cache-line layout rationale

WebSocket payload copying

Config

invokeOnIoThread

Advanced: io_uring-native DB Pool

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

Handler — `Next.go` (fiber, IO thread, no thread switch)

Handler — `Next.submit` (worker pool, thread switch)