-
Notifications
You must be signed in to change notification settings - Fork 279
Support batch procesing for openai api compatible requests #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support batch procesing for openai api compatible requests #659
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mickqian @ravi03071991 , thank you for your contribution. Do you guys think it is more appropriate to put the changes in this file
https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/models/batch_gpt4.py
instead of the open_compatible.py
one? Because when we use this file, it is possible that we are testing some self-hosted server such as using vllm
or sglang
or any openai compatible which may not necessarily implement the batch
api
Thanks @kcz358. Are you suggesting that we create a new model file called Alternatively, we could update the existing |
Also, @kcz358, the OpenAI client supports batch requests So any self-hosted serving solution—like vLLM or sglang—that is OpenAI-compatible should support batch requests by default. |
I think vllm lacks of an endpoint batch_input_file = client.files.create(
file=open("batchinput.jsonl", "rb"),
purpose="batch"
) I kind of investigate this a while ago and I just tried again with vllm and found this still does not work. I remember that this works with sglang because they create that endpoint. So I believe the best way is still change the code in |
Yeah, that makes sense. I just tested it and realized it’s not supported on their end. I’ll go ahead and update the code in |
Hi @kcz358 , The default output format from the OpenAI Batch API seems quite different from the SGLang OpenAI client batch output. You can check the OpenAI Batch output here. SGLang OpenAI client batch output:
To extract the result, we currently need to do something like: Just wondering — would it make sense to update |
Hi, @ravi03071991 do you think it is okay to do an if else check here in |
Yeah, I think so. I raised a PR on sglang to fix it. Once its merged, I will update this PR accordingly. |
@ravi03071991 Hi ravi, can you link your pr to here? |
PR - Fixes batch with single request error |
PR to support batch processing with OpenAI API Compatible requests.
Currently, we are processing with a batch_size=1 and this PR helps to compute metrics with different batch sizes.