Built a local MLX/OpenClaw service inspired by the TurboQuant direction #83

PetoVeritas · 2026-04-18T03:01:13Z

PetoVeritas
Apr 18, 2026

Hi, sharing a related project in case it is useful to others here.

I built a local macOS service around Gemma 4 on Apple’s MLX stack that exposes a stable provider boundary for OpenClaw-style agent use:

https://github.com/PetoVeritas/MLX-TurboQuant-Service

What it is:

local MLX-based Gemma 4 serving
focused on practical integration and agent-style reliability
aimed at making the TurboQuant-style local path easier to actually use in a real app stack

Why it may be relevant here:

this work was influenced by the broader TurboQuant direction and the need for better local long-context practicality
I’ve been comparing it against an Ollama baseline in a real agent-shaped setup

A couple concrete observations from my side:

both lanes were configured at 73728 context
the MLX lane is using mlx-community/gemma-4-26b-a4b-it-4bit
live loaded memory was about 14.6 GB RSS for the MLX worker, versus about 20.1–20.3 GB RSS for the Ollama Gemma lane
in one heavy retrieval comparison, the MLX lane handled a 35,279-token prompt cleanly, while my Ollama baseline effectively behaved more like a ~32,768 prompt-token cap in that test

Definitely not posting this as “problem solved,” just as a related integration that might be interesting to people experimenting with local serving, memory pressure, and real app usage.

Still a work in progress, but in decent enough shape that I'm setting it up for my cron jobs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Built a local MLX/OpenClaw service inspired by the TurboQuant direction #83

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Built a local MLX/OpenClaw service inspired by the TurboQuant direction #83

Uh oh!

PetoVeritas Apr 18, 2026

Replies: 0 comments

PetoVeritas
Apr 18, 2026