[tests] Update lm_eval VL tests to qwen 3 #1953

brian-dellabetta · 2025-10-20T18:26:10Z

SUMMARY:
Upgrade the lm_eval vision languge tests from Qwen 2.5 to Qwen 3. After updating to include apply_chat_template, the scores closely align with what was achieved with Qwen 2.5

Prerequisites

* [Deps] pin cmarkgfm until build issue resolved #1952

TEST PLAN:
The 3 lm_eval VL tests were run, and the accuracies were updated. Dense model achieves 0.833, all compressed models achieve 0.833 or 0.800

github-actions · 2025-10-20T18:26:19Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

dsikka

Why not just use mmmu_val instead of the literature task? This gives us around 0.53 for the dense model?

brian-dellabetta · 2025-10-20T19:24:56Z

Why not just use mmmu_val instead of the literature task? This gives us around 0.53 for the dense model?

mmmu_val is 900 evals total instead of 30. that would add probably ~40 minutes to each lm-eval run, and we run two for each config, so total test time would increase over 3 hours with that change

dsikka · 2025-10-20T20:19:38Z

Why not use mmmu_val

Why not just use mmmu_val instead of the literature task? This gives us around 0.53 for the dense model?

mmmu_val is 900 evals total instead of 30. that would add probably ~40 minutes to each lm-eval run, and we run two for each config, so total test time would increase over 3 hours with that change

The 30 datapoints has proven to be very noisy historically. A happy medium might be better but we should also just validate the runtime for batch size of 100

_template true Signed-off-by: Brian Dellabetta <[email protected]>

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta requested review from dsikka, kylesayrs and rahul-tuli October 20, 2025 18:26

dsikka requested changes Oct 20, 2025

View reviewed changes

qwen 3 vl with apply_chat

57e50b1

_template true Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta force-pushed the bdellabe/qwen3-vl-lmeval branch from ea00c16 to 57e50b1 Compare October 21, 2025 21:57

chartqa p1

1e01353

Signed-off-by: Brian Dellabetta <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tests] Update lm_eval VL tests to qwen 3 #1953

[tests] Update lm_eval VL tests to qwen 3 #1953

brian-dellabetta commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

dsikka left a comment

Uh oh!

brian-dellabetta commented Oct 20, 2025

Uh oh!

dsikka commented Oct 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[tests] Update lm_eval VL tests to qwen 3 #1953

Are you sure you want to change the base?

[tests] Update lm_eval VL tests to qwen 3 #1953

Conversation

brian-dellabetta commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta commented Oct 20, 2025

Uh oh!

dsikka commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dsikka commented Oct 20, 2025 •

edited

Loading