Skip to content

Conversation

@quic-xiyushi
Copy link

No description provided.

@quic-xiyushi quic-xiyushi force-pushed the on-device-sampling-vlm branch 2 times, most recently from af8e673 to df3501a Compare October 30, 2025 07:13
Copy link
Contributor

@quic-hemagnih quic-hemagnih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the CI test cases.

@quic-xiyushi quic-xiyushi force-pushed the on-device-sampling-vlm branch from df3501a to d722a5a Compare November 10, 2025 17:22
Signed-off-by: quic-xiyushi <[email protected]>
@quic-xiyushi quic-xiyushi force-pushed the on-device-sampling-vlm branch from d722a5a to e06e175 Compare November 10, 2025 17:25
Signed-off-by: quic-sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Comment on lines +1606 to +1607
if kwargs.pop("qaic_config", None):
raise NotImplementedError("On-device sampling is not supported for single QPC multimodal models yet.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't qaic_config used for spec decoding as well?
Are we not supporting spec decode for single qpc?
Should we just error out or check what's passed in the config?

self.config = model.config
self.vision_model = QEffVisionEncoderForTextImageToTextModel(model, **kwargs)
self.lang_model = QEffCausalLMForTextImageToTextModel(model, **kwargs)
self.lang_model = QEffCausalLMForTextImageToTextModel(model, continuous_batching=continuous_batching, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace line 984-985 with following

        if kwargs.pop("full_batch_size", None):
            continuous_batching = True
            warnings.warn(
                "full_batch_size argument is deprecated. Use continuous_batching=True instead.", DeprecationWarning, 2
            )

Comment on lines +785 to +790
def get_sampling_inputs_and_outputs(
self,
example_inputs: Dict[str, torch.Tensor],
output_names: List[str],
dynamic_axes: Dict[str, Dict[int, str]],
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this method different from QEFFAutoModelForCausalLM.get_sampling_inputs_and_outputs
Can we combine these and create a different method in utils or spd_utils for this?
remove code duplication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for intern model i,e VLM in dual qpc mode?

QEffGPTJForCausalLM,
QEffGraniteForCausalLM,
QEffGraniteMoeForCausalLM,
QEffInternDecoderWrapper,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we are enabling sampling only for intern model?
Will other VLMs also be supported?

@ochougul ochougul added the enhancement New feature or request label Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants