Skip to content

support multi-modal MFU estimation #100

@long8v

Description

@long8v

Thanks for opensourcing this!

I found FLOPs estimation function for all model is text-only LLM estimation,
https://github.com/EvolvingLMMs-Lab/lmms-engine/blob/main/src/lmms_engine/models/utils.py#L61-L81.

Since official documentations https://lmms-engine.readthedocs.io/en/latest/reference/mfu.html#qwen3-vl-8b-with-sequence-parallel reports MFU around 0.2~0.25, and I believe it should not be from MFU calculation above, but includes ViT forward FLOPs.

Also, I would love to know what kind of dataset is used for MFU estimation, since specifically Qwen series multimodal(e.g., Qwen3-VL) supports native resolution, and also configuration of FPS in video dataset would change MFU a lot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions