-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve][doc] update serve vllm openai example for latest vllm version #50192
Conversation
Signed-off-by: Eric Tang <[email protected]>
redoing #50047 but without pinging like 30 people |
Signed-off-by: Eric Tang <[email protected]>
Signed-off-by: Eric Tang <[email protected]>
Signed-off-by: Eric Tang <[email protected]>
Signed-off-by: Eric Tang <[email protected]>
Signed-off-by: Eric Tang <[email protected]>
@@ -53,6 +55,7 @@ def __init__( | |||
self.prompt_adapters = prompt_adapters | |||
self.request_logger = request_logger | |||
self.chat_template = chat_template | |||
print(f"{ray.util.get_current_placement_group()=}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this debug statement as well as import ray :)
Signed-off-by: Eric Tang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.engine, | ||
model_config, | ||
served_model_names, | ||
self.response_role, | ||
[BaseModelPath(name=self.engine_args.model, model_path="./")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into the same thing yesterday and fixed it by setting name=self.engine_args.model, model_path=self.engine_args.model
, curious if you know what the difference is and what model_path is supposed to be used for :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point yeah, wasn't sure before so was just setting to "./" - updated to do model_path=self.engine_args.model
, seems like what they're using in the internal vllm tests (i.e. here). model_path
doesn't seem to be used in this class for anything except here, maybe just as a backup for when served_model_name
is separately specified from model
so both are accessible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think model_path
is where the model is being loaded from actually, and name
is what model=
in the query API will be mapped to, does that sound right? In which case I think you are doing the right thing by making them the same in most cases (and the user could make them different if they e.g. want to expose some model stored locally or on S3 via a user recognizable name) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see yeah makes sense!
@@ -190,4 +192,4 @@ def build_app(cli_args: Dict[str, str]) -> serve.Application: | |||
for chat in chat_completion: | |||
if chat.choices[0].delta.content is not None: | |||
print(chat.choices[0].delta.content, end="") | |||
# __query_example_end__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add newline at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks for fixing this ❤️
You need to fix some linting errors before this can be merged: https://buildkite.com/ray-project/microcheck/builds/10708#0194cd42-7138-4c3c-9795-04424c32c7f9 |
Signed-off-by: Eric Tang <[email protected]>
Signed-off-by: Eric Tang <[email protected]>
I tested it out and this is working now. Can you also add a test going forward (separate PR)? I believe @aslonnie and @comaniac have been working on getting the required dependencies in, so you should be able to set up a test now (cc @akshay-anyscale) |
Why are these changes needed?
https://docs.ray.io/en/latest/serve/tutorials/vllm-example.html - Currently doesn't work out of the box for the latest vllm versions.
Related issue number
N/A
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.