add-task-seephys #903

RitzChow · 2025-11-13T16:39:06Z

I've added the Seephys benchmark.

kcz358 · 2025-11-17T09:03:51Z

lmms_eval/models/chat/openai_compatible.py

+                    #payload["reasoning_effort"] = "medium"
                    payload["response_format"] = {"type": "text"}
-                    payload["max_completion_tokens"] = gen_kwargs["max_new_tokens"]
+                    payload["max_completion_tokens"] = 5000


This is a bit hardcoding

kcz358 · 2025-11-17T09:04:41Z

lmms_eval/models/chat/openai_compatible.py

                    del payload["temperature"]
                    payload.pop("max_tokens")
-                    payload["reasoning_effort"] = "medium"
+                    #payload["reasoning_effort"] = "medium"


I think possibly should add a control args for reasoning effort for openai compatible instead of direct comment out it.

I apologize, this part is a temporary modification I made for testing purposes. You can ignore this modification and only focus on the newly added task.

kcz358

Hi, most of the part LGTM. Just a few commends for the revision in openai compatible. Thanks for the contribution

add-task-seephys

da937d2

kcz358 reviewed Nov 17, 2025

View reviewed changes

kcz358 approved these changes Nov 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add-task-seephys #903

add-task-seephys #903

Uh oh!

RitzChow commented Nov 13, 2025

Uh oh!

kcz358 Nov 17, 2025

Uh oh!

kcz358 Nov 17, 2025

Uh oh!

RitzChow Nov 22, 2025

Uh oh!

kcz358 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add-task-seephys #903

Are you sure you want to change the base?

add-task-seephys #903

Uh oh!

Conversation

RitzChow commented Nov 13, 2025

Uh oh!

kcz358 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

RitzChow Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kcz358 left a comment •

edited

Loading