Skip to content

Commit 04508a6

Browse files
KepingYankepingyan
and
kepingyan
authored
Fix COMPILATION issue in serving phrase (#935)
In serving phrase, Even if it hits the warmup shape, there is still COMPILATION warning. ![image](https://github.com/user-attachments/assets/64e3686f-4f62-41ec-a50d-c196a7180b57) After debugging, I found that the serving phrase will broadcast data in prepare_input(), but warmup doesn't do it. So add this op to avoid graph compilation in serving phase. Co-authored-by: kepingyan <[email protected]>
1 parent c8d04be commit 04508a6

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

vllm/worker/hpu_model_runner.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -1793,8 +1793,13 @@ def warmup_scenario(self,
17931793
if is_pt_profiler_run and self.is_driver_worker:
17941794
profiler = setup_profiler()
17951795
profiler.start()
1796-
for _ in range(times):
1796+
for time_index in range(times):
17971797
inputs = self.prepare_model_input(seqs)
1798+
if time_index == 0:
1799+
if self.is_driver_worker:
1800+
broadcast_tensor_dict({"input_tokens": inputs.input_tokens}, src=0)
1801+
else:
1802+
broadcast_tensor_dict(src=0)
17981803
is_single_step = \
17991804
self.vllm_config.scheduler_config.num_scheduler_steps == 1
18001805
if is_prompt or is_single_step:

0 commit comments

Comments
 (0)