noonghunna

Follow

🎯

Focusing

noonghunna

🎯

Focusing

Follow

11 followers · 13 following

Achievements

Achievements

Popular repositories Loading

club-3090 club-3090 Public

Community recipes for serving LLMs on RTX 3090. Multi-engine (vLLM, llama.cpp, SGLang) and model-agnostic. Currently shipping Qwen3.6-27B configs for 1× and 2× cards.

Shell 141 9
qwen36-27b-single-3090 qwen36-27b-single-3090 Public

Shell 84 12
qwen36-dual-3090 qwen36-dual-3090 Public

Qwen3.6-27B on dual RTX 3090 — TP=2 recipe, vLLM nightly, MTP + fp8 KV, validated for concurrent serving

Shell 41 3
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1
genesis-vllm-patches genesis-vllm-patches Public

Forked from Sandermage/genesis-vllm-patches

Production-grade runtime patches for vLLM (45+ patches) — Qwen3.6-35B-A3B-FP8 hybrid GDN+MoE on NVIDIA Ampere (SM 80-86). 127 tok/s MTP free-form, 99 tok/s suffix tool-call (max 175). TurboQuant k8…

Python