zhang-minchao

Zhang Minchao zhang-minchao

Pinned Loading

jd-opensource/xllm jd-opensource/xllm Public

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 701 77
xllm xllm Public

Forked from jd-opensource/xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
cutlass cutlass Public

Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda