Skip to content

Support batch procesing for openai api compatible requests #659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ravi03071991
Copy link
Contributor

@ravi03071991 ravi03071991 commented Apr 29, 2025

PR to support batch processing with OpenAI API Compatible requests.

Currently, we are processing with a batch_size=1 and this PR helps to compute metrics with different batch sizes.

@mickqian
Copy link

mickqian commented May 6, 2025

@Luodian @kcz358 Could you please take a look? Thanks!

Copy link
Collaborator

@kcz358 kcz358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mickqian @ravi03071991 , thank you for your contribution. Do you guys think it is more appropriate to put the changes in this file

https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/models/batch_gpt4.py

instead of the open_compatible.py one? Because when we use this file, it is possible that we are testing some self-hosted server such as using vllm or sglang or any openai compatible which may not necessarily implement the batch api

@ravi03071991
Copy link
Contributor Author

Hi @mickqian @ravi03071991 , thank you for your contribution. Do you guys think it is more appropriate to put the changes in this file

https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/models/batch_gpt4.py

instead of the open_compatible.py one? Because when we use this file, it is possible that we are testing some self-hosted server such as using vllm or sglang or any openai compatible which may not necessarily implement the batch api

Thanks @kcz358. Are you suggesting that we create a new model file called batch_openai.py?

Alternatively, we could update the existing srt_api.py model to support batch API requests, since we’re specifically testing it for sglang, and sglang supports batch requests through the OpenAI API client.

@ravi03071991
Copy link
Contributor Author

Also, @kcz358, the OpenAI client supports batch requests So any self-hosted serving solution—like vLLM or sglang—that is OpenAI-compatible should support batch requests by default.

@kcz358
Copy link
Collaborator

kcz358 commented May 7, 2025

I think vllm lacks of an endpoint v1/files to be able to run the

batch_input_file = client.files.create(
    file=open("batchinput.jsonl", "rb"),
    purpose="batch"
)

I kind of investigate this a while ago and I just tried again with vllm and found this still does not work. I remember that this works with sglang because they create that endpoint.

So I believe the best way is still change the code in lmms_eval/models/batch_gpt4.py to separate the difference as it actually points to different endpoint.

@ravi03071991
Copy link
Contributor Author

I think vllm lacks of an endpoint v1/files to be able to run the

batch_input_file = client.files.create(
    file=open("batchinput.jsonl", "rb"),
    purpose="batch"
)

I kind of investigate this a while ago and I just tried again with vllm and found this still does not work. I remember that this works with sglang because they create that endpoint.

So I believe the best way is still change the code in lmms_eval/models/batch_gpt4.py to separate the difference as it actually points to different endpoint.

Yeah, that makes sense. I just tested it and realized it’s not supported on their end. I’ll go ahead and update the code in lmms_eval/models/batch_gpt4.py accordingly. Thanks @kcz358.

@ravi03071991
Copy link
Contributor Author

Hi @kcz358 ,

The default output format from the OpenAI Batch API seems quite different from the SGLang OpenAI client batch output.

You can check the OpenAI Batch output here.

SGLang OpenAI client batch output:

Response: {‘status_code’: 200, ‘request_id’: ‘batch_f6ea6fef-e9fe-4de5-944d-92d25efef3d9-req_0’, ‘body’: {‘id’: ‘batch_f6ea6fef-e9fe-4de5-944d-92d25efef3d9-req_0’, ‘object’: ‘chat.completion’, ‘created’: 1746752343, ‘model’: ‘qwen/qwen2.5-0.5b-instruct’, ‘choices’: {‘index’: 0, ‘message’: {‘role’: ‘assistant’, ‘content’: “Sure, here is a programming joke for you:\nWhy couldn’t the code always stay happy when it ran?\nBecause it always had to wait for the programmer to give it a smiley face!”, ‘tool_calls’: None, ‘reasoning_content’: None}, ‘logprobs’: None, ‘finish_reason’: ‘stop’, ‘matched_stop’: 151645}, ‘usage’: {‘prompt_tokens’: 35, ‘completion_tokens’: 40, ‘total_tokens’: 75}, ‘system_fingerprint’: None}}

To extract the result, we currently need to do something like:
item['response']['body']['choices']['message']['content']

Just wondering — would it make sense to update batch_gpt4.py to align with this format?

@kcz358
Copy link
Collaborator

kcz358 commented May 11, 2025

Hi, @ravi03071991 do you think it is okay to do an if else check here in batch_gpt4.py? I am not sure if this is the best options though

@ravi03071991
Copy link
Contributor Author

Hi, @ravi03071991 do you think it is okay to do an if else check here in batch_gpt4.py? I am not sure if this is the best options though

Yeah, I think so. I raised a PR on sglang to fix it. Once its merged, I will update this PR accordingly.

@mickqian
Copy link

mickqian commented May 18, 2025

@ravi03071991 Hi ravi, can you link your pr to here?

@ravi03071991
Copy link
Contributor Author

@ravi03071991 Hi ravi, can you link your pr to here?

PR - Fixes batch with single request error
PR - Fixes the batch response output format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants