Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vLLM Engine #83

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

IlyaMescheryakov1402
Copy link
Contributor

Adding vLLM Engine (v 0.7.3) with OpenAI API

Added routes:

  • /v1/completions
  • /v1/chat/completions
  • /v1/models

Entrypoint for vLLM-OpenAI is http://127.0.0.1:8080/serve/openai/v1 (in its basic options). Endpoint name doesn't change entrypoint (as it does for other engines) according to OpenAI API. Endpoint name in that case equals served_model_name that can be send via openai api request.

vLLM engine can be configured via docker/docker-compose-gpu.yml (line 108, variable VLLM_ENGINE_ARGS). Model can be configured via preprocess.py file. For vllm deploy example see examples/vllm/readme.md

Other small fixes:

  1. Add CLEARML_DEFAULT_SERVE_SUFFIX - to change entrypoint suffix and link
  2. Fix searching control plane task in triton_helper (in case if we have more than 1 serving instance)
  3. Add docker-compose-gpu.yml file for instances that serves models on GPU (for custom, custom_async and vllm engines)

@InsertNamePls
Copy link

@IlyaMescheryakov1402

image: clearml-serving-inference:latest not being allegroai/clearml-serving-inference:latest will not create login issues?

@IlyaMescheryakov1402
Copy link
Contributor Author

Hi @InsertNamePls

Thank you for your attention

Yes, it was a mistake. I used custom images built with Dockerfiles during testing, because docker images allegroai/clearml-serving-inference:latest are not suitable for PR testing (they install latest clearml-serving package and don't have vllm as registered engine).

For testing this PR, you have to:

  1. comment this string and this string and build your own images (both for serving and for statistics)
  2. change images in docker/docker-compose-gpu.yml file (lines 78 and 122 accordingly) on your local ones (that was built on the previous step)
  3. run docker-compose file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants