xLLM-service is a service-layer framework developed based on the xLLM inference engine, providing efficient, fault-tolerant, and flexible LLM inference services for clustered deployment.
xLLM-service targets to address key challenges in enterprise-level service scenarios:
-
How to ensure the SLA of online services and improve resource utilization of offline tasks in a hybrid online-offline deployment environment.
-
How to react to changing request loads in actual businesses, such as fluctuations in input/output lengths.
-
Resolving performance bottlenecks of multimodal model requests.
-
Ensuring high reliability of computing instances.
With management of computing resource pools, intelligent scheduling and preemption of hybrid requests, and real-time monitoring of computing instances, xLLM-service achieves the following key features:
-
Unified scheduling of online and offline requests, with preemptive execution for online requests and best-effort execution for offline requests.
-
Adaptive dynamic allocation of PD ratios, supporting efficient switching of instance PD roles.
-
EPD three-stage disaggregation for multimodal requests, with intelligent resource allocation for different stages.
-
Fault-tolerant architecture, fast detection of instance error and automatic rescheduling for interrupted requests.
├── xllm-service/
| : main source folder
│ ├── chat_template/ #
│ ├── common/ #
│ ├── examples/ #
│ ├── http_service/ #
│ ├── rpc_service/ #
| ├── tokenizers/ #
| └── master.cpp #
git clone [email protected]:xllm-ai/xllm_service.git
cd xllm_service
git submodule init
git submodule update
compile vcpkg, set env variable:
export VCPKG_ROOT=/export/home/xxx/vcpkg-src
compile xllm-service:
mkdir -p build && cd build
cmake .. && make -j 8
There are several ways you can contribute to xLLM:
- Reporting Issues (Bugs & Errors)
- Suggesting Enhancements
- Improving Documentation
- Fork the repository
- Add your view in document
- Send your pull request
- Writing Code
- Fork the repository
- Create a new branch
- Add your feature or improvement
- Send your pull request
We appreciate all kinds of contributions! 🎉🎉🎉 If you have problems about development, please check our document: * Document
If you encounter any issues along the way, you are welcomed to submit reproducible steps and log snippets in the project's Issues area, or contact the xLLM Core team directly via your internal Slack.
Welcome to contact us:
Thanks to all the following developers who have contributed to xLLM.