Add vLLM Engine #83

IlyaMescheryakov1402 · 2025-03-12T13:09:20Z

Adding vLLM Engine (v 0.7.3) with OpenAI API

Added routes:

/v1/completions
/v1/chat/completions
/v1/models

Entrypoint for vLLM-OpenAI is http://127.0.0.1:8080/serve/openai/v1 (in its basic options). Endpoint name doesn't change entrypoint (as it does for other engines) according to OpenAI API. Endpoint name in that case equals served_model_name that can be send via openai api request.

vLLM engine can be configured via docker/docker-compose-gpu.yml (line 108, variable VLLM_ENGINE_ARGS). Model can be configured via preprocess.py file. For vllm deploy example see examples/vllm/readme.md

Other small fixes:

Add CLEARML_DEFAULT_SERVE_SUFFIX - to change entrypoint suffix and link
Fix searching control plane task in triton_helper (in case if we have more than 1 serving instance)
Add docker-compose-gpu.yml file for instances that serves models on GPU (for custom, custom_async and vllm engines)

InsertNamePls · 2025-03-19T13:45:38Z

@IlyaMescheryakov1402

image: clearml-serving-inference:latest not being allegroai/clearml-serving-inference:latest will not create login issues?

IlyaMescheryakov1402 · 2025-03-19T23:39:41Z

Hi @InsertNamePls

Thank you for your attention

Yes, it was a mistake. I used custom images built with Dockerfiles during testing, because docker images allegroai/clearml-serving-inference:latest are not suitable for PR testing (they install latest clearml-serving package and don't have vllm as registered engine).

For testing this PR, you have to:

comment this string and this string and build your own images (both for serving and for statistics)
change images in docker/docker-compose-gpu.yml file (lines 78 and 122 accordingly) on your local ones (that was built on the previous step)
run docker-compose file

Meshcheryakov Ilya and others added 21 commits April 16, 2024 00:54

initial commit

6859920

initial commit

64daef2

initial commit

b8f5d81

fix shash processing

4796d77

Merge branch 'main' into feature/multimodel

2685d2a

fix suffix and add router

5b73bdf

revert some old changes

f51bf2e

add vllm example

32d72bc

major vllm engine update

428be76

add openai_serving and openai_serving_models

cadd48f

fix response import

77e1f95

fix openai testing

1c591f2

move engine init in separate class

9441ae8

fix imports

9bb0dbb

add getattr for process methods

fedfcda

fix jsonresponse

25e2940

add models endpoint

8ecb51f

add some sugar

10f887d

small changes for pr

a2817e3

add empty string

42d8738

update readme

db3d453

update readme and fix docker-compose-gpu.yml

34c4a9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM Engine #83

Add vLLM Engine #83

IlyaMescheryakov1402 commented Mar 12, 2025

InsertNamePls commented Mar 19, 2025

IlyaMescheryakov1402 commented Mar 19, 2025

Add vLLM Engine #83

Are you sure you want to change the base?

Add vLLM Engine #83

Conversation

IlyaMescheryakov1402 commented Mar 12, 2025

InsertNamePls commented Mar 19, 2025

IlyaMescheryakov1402 commented Mar 19, 2025