v0.5.0
New features
- Processing time is affected by server load
 - Change TTFT parameter to be based on number of request tokens
 - KV cache affects prefill time
 - Support failure injection
 - Implement kv-cache usage and waiting loras Prometheus metrics
 - Randomize response length based when max-tokens is defined in the request
 - Support DP (data parallel)
 - Support /tokenize endpoint
 
What's Changed
- Fix server interrupt by @npolshakova in #161
 - Show final config in simulaor default logger at Info lvel by @pancak3 in #154
 - Cast bounds type in tests to func def: latency, interToken, and timeToFirst (to int) by @pancak3 in #163
 - Remvoe unnecessary deferal of server close by @pancak3 in #162
 - Fix: Rand generator is not set in a test suite which result in accessing nil pointer during runtime if run the only test suite by @pancak3 in #166
 - Use channels for metrics updates, added metrics tests by @irar2 in #171
 - Remove rerun on comment action by @irar2 in #174
 - Add failure injection mode to simulator by @smarunich in #131
 - Add waiting loras list to loraInfo metrics by @mayabar in #175
 - feat: generate response length based on a histogram when max_tokens is defined in the request by @mayabar in #169
 - extend response length buckets calculation to have not necessary equally sized buckets by @mayabar in #176
 - Use dynamic ports in zmq tests by @pancak3 in #170
 - Change time-to-first-token parameter to be based on number of request tokens #137 by @pancak3 in #165
 - Bugfix: was accessing number of tokens from nil var; getting it from req instead by @pancak3 in #177
 - feat: add helm charts for Kubernetes deployment by @Blackoutta in #182
 - chore: Make the image smaller by @shmuelk in #183
 - Take cached prompt tokens into account in prefill time calculation by @irar2 in #184
 - Add ignore eos in request by @pancak3 in #187
 - Support DP by @irar2 in #188
 - Change RandomNorm from float types to int by @pancak3 in #190
 - KV cache usage metric by @irar2 in #192
 - Adjust request "processing time" to current load by @pancak3 in #189
 - Updates for the new release of kv-cache-manager by @irar2 in #194
 - DP bug fix: wait after starting rank 0 sim by @irar2 in #193
 - Support /tokenize endpoint by @irar2 in #198
 - add Service to expose vLLM deployment and update doc by @googs1025 in #201
 - Split simulator.go into several files by @irar2 in #199
 
New Contributors
- @smarunich made their first contribution in #131
 - @Blackoutta made their first contribution in #182
 - @googs1025 made their first contribution in #201
 
Full Changelog: v0.4.0...v0.5.0