-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vLLM Engine #83
base: main
Are you sure you want to change the base?
Add vLLM Engine #83
Conversation
|
Thank you for your attention Yes, it was a mistake. I used custom images built with Dockerfiles during testing, because docker images For testing this PR, you have to:
|
Adding vLLM Engine (v 0.7.3) with OpenAI API
Added routes:
Entrypoint for vLLM-OpenAI is
http://127.0.0.1:8080/serve/openai/v1
(in its basic options). Endpoint name doesn't change entrypoint (as it does for other engines) according to OpenAI API. Endpoint name in that case equalsserved_model_name
that can be send via openai api request.vLLM engine can be configured via docker/docker-compose-gpu.yml (line 108, variable
VLLM_ENGINE_ARGS
). Model can be configured viapreprocess.py
file. For vllm deploy example see examples/vllm/readme.mdOther small fixes:
CLEARML_DEFAULT_SERVE_SUFFIX
- to change entrypoint suffix and linkdocker-compose-gpu.yml
file for instances that serves models on GPU (for custom, custom_async and vllm engines)