We are excited to announce the first official release of Unified Cache Manager.
Hightlights
- Offload Prefix Cache to storage.
- Homogeneous/ Heterogeneos PD disaggregation.
- Training-Free sparsity in accelerating inference.(vllm==0.9.2, vllm-ascend==0.9.2rc1)in #199, #335, #190, #451
Core:
- Garbage collection for store in #315 and #312
- Adapt to vllm and vllm-ascend in #13, #292, #415 and #362
- UCM supports metrics display online via Grafana and Promethues in #414 and docs in #416
Known Issues
If using Ascend platform, please be mind of
- not compatible with broadcast
load_only_first_rank: falsein config
Others
- Update documents
- Tools for performance tuning, hyperparameter optimization in #418
What's Changed
- [opt] Share Infra implementation and unify status codes by @mag1c-h in #399
- [bugfix] Fix ESA to be compatible with the latest NFSStore. by @wangwenxin0312 in #401
- release v0.1.0rc4 by @Lijiachen1018 in #402
- [opt] Remove unused cc impl of dramstore by @mag1c-h in #406
- [Fix]remove dram docs and modify quick-start doc by @hero0307 in #411
- [Feature] Added performance testing tool based on the PyTest testing framework by @Menglths in #295
- [Misc] Add cpp-linter.yml by @mag1c-h in #422
- [docs]add metrics doc by @hero0307 in #416
- [perf] Modify CUDA SIMD and add Triton hash encoder by @Clarence-1103 in #408
- [bugfix] batch trans on cuda with SM return 700 error by @mag1c-h in #434
- [Misc] set default logger backend to spdlog by @mag1c-h in #440
- [rebase]Dev-ucm-v1 rebase to develop by @Lijiachen1018 in #453
- [cleancode] remove dramstore by @Lijiachen1018 in #455
- Fix metrics by @Lijiachen1018 in #456
New Contributors
Full Changelog: v0.1.0rc4...v0.1.0