GitHub - IthaloPS/vllm-turboquant: vllm + turboquant project

vLLM + TurboQuant (systemd)

This directory (vllm/) is set up so you can use it as its own repository and run a service with systemctl, using start_vllm.sh as the entry point.

Self-contained: the TurboQuant code needed at runtime lives under third_party/turboquant/ (vendored copy). You do not need a sibling checkout at ../turboquant unless you want to override the path with TURBOQUANT_ROOT.

Layout

start_vllm.sh: entry point (CUDA + TurboQuant env + starts the server)
run_api_server_turboquant.py: wrapper that enables TurboQuant before vLLM initializes
config.yaml: local vLLM config (model/tokenizer/etc.); create from config.example.yaml (often gitignored when publishing)
third_party/turboquant/: vendored TurboQuant used by bootstrap.sh and start_vllm.sh (PYTHONPATH + editable install)
requirements.txt: editable TurboQuant install with the vllm extra from third_party/
systemd/: systemd unit + installer

Prerequisites

Linux with systemd
CUDA installed (for GPU). Default: CUDA_HOME=/usr/local/cuda
Python 3
Python dependencies (vLLM + turboquant, etc.). If you use the local venv: vllm/venv/

Quick install (recommended)

Install Python dependencies in the interpreter that will run the service (e.g. local venv or conda):

cd vllm
python3 -m venv venv
./venv/bin/python -m pip install -U pip setuptools wheel
./venv/bin/pip install -r requirements.txt

(requirements.txt installs TurboQuant in editable mode from third_party/turboquant[vllm].)

Alternative to step 1 — only ensure TurboQuant is available for your chosen Python:

VLLM_PYTHON=/root/anaconda3/bin/python3 ./bootstrap.sh

This script:

upgrades pip
exits successfully if import turboquant already works
otherwise installs turboquant in editable mode from third_party/turboquant (preferred), or from ../turboquant if present, or from TURBOQUANT_ROOT

Model configuration

Create your config.yaml from the template and set paths:

cp config.example.yaml config.yaml

Then edit config.yaml and set at least:

model: /data/models/...
tokenizer: /data/models/...

Run manually (validate before systemd)

From the vllm/ directory:

./start_vllm.sh

To pick GPU / port:

CUDA_VISIBLE_DEVICES=0 VLLM_PORT=8000 ./start_vllm.sh

Install as a systemd service

Install and enable the service:

sudo ./systemd/install_vllm_turboquant_service.sh

(Optional) Tune service environment variables:

sudo nano /etc/default/vllm-turboquant

Main variables:

VLLM_ROOT: absolute path to this repo (the installer can set this)
VLLM_PYTHON: Python binary (e.g. /root/anaconda3/bin/python3 or .../venv/bin/python)
VLLM_HOST, VLLM_PORT
CUDA_VISIBLE_DEVICES, CUDA_HOME
TQ_KEY_BITS, TQ_VALUE_BITS, TQ_BUFFER_SIZE, TQ_INITIAL_LAYERS_COUNT

Restart after changes:

sudo systemctl restart vllm-turboquant

Useful commands

Status:

systemctl status vllm-turboquant

Logs:

journalctl -u vllm-turboquant -f

Stop / start:

sudo systemctl stop vllm-turboquant
sudo systemctl start vllm-turboquant

Troubleshooting

Python / packages not found: set VLLM_PYTHON=/path/to/venv/bin/python in /etc/default/vllm-turboquant
import turboquant fails: run ./bootstrap.sh or pip install -r requirements.txt using the same interpreter as VLLM_PYTHON
CUDA errors: fix CUDA_HOME and verify nvcc / libraries on the host
Port in use: change VLLM_PORT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM + TurboQuant (systemd)

Layout

Prerequisites

Quick install (recommended)

Model configuration

Run manually (validate before systemd)

Install as a systemd service

Useful commands

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
systemd		systemd
third_party/turboquant		third_party/turboquant
.gitignore		.gitignore
README.md		README.md
bootstrap.sh		bootstrap.sh
config.example.yaml		config.example.yaml
requirements.txt		requirements.txt
run_api_server_turboquant.py		run_api_server_turboquant.py
start_vllm.sh		start_vllm.sh

Folders and files

Latest commit

History

Repository files navigation

vLLM + TurboQuant (systemd)

Layout

Prerequisites

Quick install (recommended)

Model configuration

Run manually (validate before systemd)

Install as a systemd service

Useful commands

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages