Fix COMPILATION issue in serving phrase (#935)

KepingYan · kepingyan · web-flow · commit 04508a6f2b2e · 2025-03-20T01:30:21.000-07:00
In serving phrase, Even if it hits the warmup shape, there is still COMPILATION warning. ![image](https://github.com/user-attachments/assets/64e3686f-4f62-41ec-a50d-c196a7180b57) After debugging, I found that the serving phrase will broadcast data in prepare_input(), but warmup doesn't do it. So add this op to avoid graph compilation in serving phase. Co-authored-by: kepingyan <kepingyan@habana.ai>
diff --git a/vllm/worker/hpu_model_runner.py b/vllm/worker/hpu_model_runner.py
@@ -1793,8 +1793,13 @@ def warmup_scenario(self,
         if is_pt_profiler_run and self.is_driver_worker:
             profiler = setup_profiler()
             profiler.start()
-        for _ in range(times):
+        for time_index in range(times):
             inputs = self.prepare_model_input(seqs)
+            if time_index == 0:
+                if self.is_driver_worker:
+                    broadcast_tensor_dict({"input_tokens": inputs.input_tokens}, src=0)
+                else:
+                    broadcast_tensor_dict(src=0)
             is_single_step = \
                 self.vllm_config.scheduler_config.num_scheduler_steps == 1
             if is_prompt or is_single_step: