Extend on-device sampling support for dual QPC VLMs #597

quic-xiyushi · 2025-10-24T00:01:48Z

No description provided.

quic-hemagnih

Can you please add the CI test cases.

Signed-off-by: quic-xiyushi <[email protected]>

Signed-off-by: quic-sanising <[email protected]> Signed-off-by: sanising <[email protected]>

Signed-off-by: quic-xiyushi <[email protected]>

Signed-off-by: sanising <[email protected]>

ochougul · 2025-11-12T08:58:43Z

QEfficient/transformers/models/modeling_auto.py

+        if kwargs.pop("qaic_config", None):
+            raise NotImplementedError("On-device sampling is not supported for single QPC multimodal models yet.")


Isn't qaic_config used for spec decoding as well?
Are we not supporting spec decode for single qpc?
Should we just error out or check what's passed in the config?

ochougul · 2025-11-12T09:00:17Z

QEfficient/transformers/models/modeling_auto.py

        self.config = model.config
        self.vision_model = QEffVisionEncoderForTextImageToTextModel(model, **kwargs)
-        self.lang_model = QEffCausalLMForTextImageToTextModel(model, **kwargs)
+        self.lang_model = QEffCausalLMForTextImageToTextModel(model, continuous_batching=continuous_batching, **kwargs)


replace line 984-985 with following

if kwargs.pop("full_batch_size", None): continuous_batching = True warnings.warn( "full_batch_size argument is deprecated. Use continuous_batching=True instead.", DeprecationWarning, 2 )

ochougul · 2025-11-12T09:02:33Z

QEfficient/transformers/models/modeling_auto.py

+    def get_sampling_inputs_and_outputs(
+        self,
+        example_inputs: Dict[str, torch.Tensor],
+        output_names: List[str],
+        dynamic_axes: Dict[str, Dict[int, str]],
+    ):


how is this method different from QEFFAutoModelForCausalLM.get_sampling_inputs_and_outputs
Can we combine these and create a different method in utils or spd_utils for this?
remove code duplication?

ochougul · 2025-11-12T09:10:24Z

tests/transformers/sampler/test_sampler.py

can you add a test for intern model i,e VLM in dual qpc mode?

ochougul · 2025-11-12T09:11:06Z

QEfficient/transformers/models/pytorch_transforms.py

        QEffGPTJForCausalLM,
        QEffGraniteForCausalLM,
        QEffGraniteMoeForCausalLM,
+        QEffInternDecoderWrapper,


Does this mean we are enabling sampling only for intern model?
Will other VLMs also be supported?

quic-xiyushi requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners October 24, 2025 00:01

quic-xiyushi force-pushed the on-device-sampling-vlm branch 2 times, most recently from af8e673 to df3501a Compare October 30, 2025 07:13

quic-hemagnih requested changes Oct 30, 2025

View reviewed changes

Extend on-device sampling support for dual QPC VLMs

409da24

Signed-off-by: quic-xiyushi <[email protected]>

quic-xiyushi force-pushed the on-device-sampling-vlm branch from df3501a to d722a5a Compare November 10, 2025 17:22

Fix random_numbers shape

e06e175

Signed-off-by: quic-xiyushi <[email protected]>

quic-xiyushi force-pushed the on-device-sampling-vlm branch from d722a5a to e06e175 Compare November 10, 2025 17:25

Update example with new random sampling logic

3e242ce

Signed-off-by: quic-sanising <[email protected]> Signed-off-by: sanising <[email protected]>

quic-sanising force-pushed the on-device-sampling-vlm branch from 900aee5 to 3e242ce Compare November 11, 2025 00:14

quic-xiyushi and others added 2 commits November 10, 2025 16:35

Update to align with recent VLM CB changes

1a01d57

Signed-off-by: quic-xiyushi <[email protected]>

Update tests with new random sampling logic

30d6061

Signed-off-by: sanising <[email protected]>

ochougul requested changes Nov 12, 2025

View reviewed changes

ochougul assigned quic-xiyushi Nov 12, 2025

ochougul added the enhancement New feature or request label Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend on-device sampling support for dual QPC VLMs #597

Extend on-device sampling support for dual QPC VLMs #597

quic-xiyushi commented Oct 24, 2025

Uh oh!

quic-hemagnih left a comment

Uh oh!

ochougul Nov 12, 2025

Uh oh!

ochougul Nov 12, 2025

Uh oh!

ochougul Nov 12, 2025

Uh oh!

ochougul Nov 12, 2025

Uh oh!

ochougul Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if kwargs.pop("qaic_config", None):
		raise NotImplementedError("On-device sampling is not supported for single QPC multimodal models yet.")

Extend on-device sampling support for dual QPC VLMs #597

Are you sure you want to change the base?

Extend on-device sampling support for dual QPC VLMs #597

Conversation

quic-xiyushi commented Oct 24, 2025

Uh oh!

quic-hemagnih left a comment

Choose a reason for hiding this comment

Uh oh!

ochougul Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ochougul Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ochougul Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ochougul Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ochougul Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants