Built a local MLX/OpenClaw service inspired by the TurboQuant direction #83
PetoVeritas
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, sharing a related project in case it is useful to others here.
I built a local macOS service around Gemma 4 on Apple’s MLX stack that exposes a stable provider boundary for OpenClaw-style agent use:
https://github.com/PetoVeritas/MLX-TurboQuant-Service
What it is:
Why it may be relevant here:
A couple concrete observations from my side:
73728contextmlx-community/gemma-4-26b-a4b-it-4bit14.6 GB RSSfor the MLX worker, versus about20.1–20.3 GB RSSfor the Ollama Gemma lane35,279-token prompt cleanly, while my Ollama baseline effectively behaved more like a~32,768prompt-token cap in that testDefinitely not posting this as “problem solved,” just as a related integration that might be interesting to people experimenting with local serving, memory pressure, and real app usage.
Still a work in progress, but in decent enough shape that I'm setting it up for my cron jobs.
Beta Was this translation helpful? Give feedback.
All reactions