support multi-modal MFU estimation

Thanks for opensourcing this! 

I found FLOPs estimation function for all model is text-only LLM estimation,
https://github.com/EvolvingLMMs-Lab/lmms-engine/blob/main/src/lmms_engine/models/utils.py#L61-L81.

Since official documentations https://lmms-engine.readthedocs.io/en/latest/reference/mfu.html#qwen3-vl-8b-with-sequence-parallel reports MFU around 0.2~0.25, and I believe it should not be from MFU calculation above, but includes ViT forward FLOPs.

Also, I would love to know what kind of dataset is used for MFU estimation, since specifically Qwen series multimodal(e.g., Qwen3-VL) supports native resolution, and also configuration of FPS in video dataset would change MFU a lot. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support multi-modal MFU estimation #100

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

support multi-modal MFU estimation #100

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions