fix: double bos token #27517

0xrushi · 2025-10-26T00:37:14Z

Purpose

When using tokenizer.apply_chat_template() to manually format chat messages and then passing the resulting string to llm.generate(), vLLM incorrectly inserts an extra BOS token during tokenization.

Resolves #27486

Test Plan

def min_repro():
    from vllm import LLM, SamplingParams
    from transformers import AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
    MODEL_NAME = "meta-llama/Llama-3.2-1B-Instruct"
    prompt = "hello, how are you?"
    prompt_with_template = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True, add_special_tokens=False)
    llm = LLM(MODEL_NAME, max_model_len=1024, gpu_memory_utilization=0.60)
    outputs = llm.generate([prompt_with_template], sampling_params=SamplingParams(temperature=0.0, top_p=1.0, logprobs=1))
    print(outputs[0].prompt_token_ids)
    prompt_with_template_tokens = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True)
    print(prompt_with_template_tokens)
    assert len(outputs[0].prompt_token_ids) == len(prompt_with_template_tokens)
    

if __name__ == "__main__":
    min_repro()

Test Result

assert = success

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-10-26T00:37:27Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a fix to prevent duplicate beginning-of-sequence (BOS) tokens during input preprocessing. The change checks if a prompt already starts with the BOS token text. If so, it disables the tokenizer's automatic addition of special tokens, provided the user hasn't explicitly configured this behavior. The implementation is safe and only applies to tokenizers that expose a bos_token string attribute. My review finds the logic to be sound and without any high or critical severity issues.

…oject#26225) Signed-off-by: inc-jeong <[email protected]> Signed-off-by: InChang Jeong <[email protected]> Co-authored-by: USER <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…llm-project#24097) Signed-off-by: chenjun <[email protected]> Signed-off-by: kliuae <[email protected]> Co-authored-by: valarLip <[email protected]> Co-authored-by: TJian <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: mgoin <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: ahao-anyscale <[email protected]> Signed-off-by: 0xrushi <[email protected]>

) Signed-off-by: angelayi <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…m-project#26956) Signed-off-by: Richard Zou <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…ect / index_copy to reduce d2h) (vllm-project#26437) Signed-off-by: Vadim Gimpelson <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: Bram Wasti <[email protected]> Co-authored-by: Wentao Ye <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…#26978) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…llm-project#26987) Signed-off-by: Akash Kaothalkar <[email protected]> Co-authored-by: Akash Kaothalkar <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: zhewenli <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…oject#26992) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: Lukas Geiger <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…26654) Signed-off-by: Sungjae Lee <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…roject#27008) Signed-off-by: bogdanm <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: Matthew Bonanni <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…GGING_FACE_HUB_TOKEN`) (vllm-project#27020) Signed-off-by: Kay Yan <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…ct#24057) Signed-off-by: Tahsin Tunan <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Signed-off-by: 0xrushi <[email protected]>

DarkLight1337 · 2025-10-26T05:14:33Z

Someone else has already opened #27515, can you work with them on that PR instead?

Signed-off-by: 0xrushi <[email protected]>

mergify · 2025-10-26T11:44:09Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @0xrushi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

0xrushi · 2025-10-26T11:45:22Z

Someone else has already opened #27515, can you work with them on that PR instead?

Oh I didn't notice, closing this

gemini-code-assist bot reviewed Oct 26, 2025

View reviewed changes

0xrushi changed the title ~~fix: Add support for optional BOS token handling in input prepro…~~ fix: double bos token Oct 26, 2025

0xrushi requested review from DarkLight1337 and ywang96 as code owners October 26, 2025 01:14

0xrushi force-pushed the fix/tokenid branch from 7f5868d to d847227 Compare October 26, 2025 01:15

inc-jeong and others added 24 commits October 25, 2025 22:59

[Bugfix] reasoning_parser parameter handling in run_batch.py (vllm-pr…

81b608d

…oject#26225) Signed-off-by: inc-jeong <[email protected]> Signed-off-by: InChang Jeong <[email protected]> Co-authored-by: USER <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[CI] Enable Blackwell Llama4 MoE tests (vllm-project#26731)

b1ecb43

Signed-off-by: mgoin <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[BUG] Allow runai_streamer_sharded in config check (vllm-project#26958)

445ee12

Signed-off-by: ahao-anyscale <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[bugfix] Fix SP + PP without specifying compile size (vllm-project#26955

668d940

) Signed-off-by: angelayi <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[BugFix] Work around graph partition x torch.compile cache issue (vll…

29ddae4

…m-project#26956) Signed-off-by: Richard Zou <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[DOC][XPU]update feature parity with Intel GPU (vllm-project#26954)

c3e8091

Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[Chore] Rename utils submodules (vllm-project#26920)

e76ac40

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_sel…

5cec582

…ect / index_copy to reduce d2h) (vllm-project#26437) Signed-off-by: Vadim Gimpelson <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Deepseek-v3 Batch Invariant on 8xH100 (vllm-project#26609)

2f7b895

Signed-off-by: Bram Wasti <[email protected]> Co-authored-by: Wentao Ye <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[CI/Build] Update expected beam search output for Phi3V (vllm-project…

c70ac7e

…#26978) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling (v…

aa4ddbe

…llm-project#26987) Signed-off-by: Akash Kaothalkar <[email protected]> Co-authored-by: Akash Kaothalkar <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[CI/Build] Fix AMD import failures in CI (vllm-project#26841)

4bc9280

Signed-off-by: zhewenli <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[Benchmark] Use truncation by default for pooling benchmarks (vllm-pr…

b020e9c

…oject#26992) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[Chore] Separate out vllm.utils.collections (vllm-project#26990)

9adb917

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

1b0643c

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Cleanup code after Python 3.10 upgrade (vllm-project#26520)

4b9033e

Signed-off-by: Lukas Geiger <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[MISC] fix import violations for re and triton modules (vllm-project#…

8752f7a

…26654) Signed-off-by: Sungjae Lee <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[Bugfix] Correct LayerNorm epsilon parameter in modernbert.py (vllm-p…

71db0c1

…roject#27008) Signed-off-by: bogdanm <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 0xrushi <[email protected]>

[Benchmark] Show E2EL by default for pooling models (vllm-project#27014)

9e123d4

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

5390be1

Signed-off-by: Matthew Bonanni <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[NIXL] Improve request_finished() debug logs (vllm-project#25665)

3a821b5

Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: 0xrushi <[email protected]>

[docs] standardize Hugging Face env var to HF_TOKEN (deprecates `HU…

dc5de46

…GGING_FACE_HUB_TOKEN`) (vllm-project#27020) Signed-off-by: Kay Yan <[email protected]> Signed-off-by: 0xrushi <[email protected]>

github-project-automation bot added this to gpt-oss Issues & Enhancements Oct 26, 2025

mergify bot added structured-output speculative-decoding labels Oct 26, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Oct 26, 2025

github-project-automation bot added this to Structured Output Oct 26, 2025

mergify bot added v1 tpu Related to Google TPUs tool-calling labels Oct 26, 2025

mergify bot assigned sangstar Oct 26, 2025

github-project-automation bot added this to Tool Calling Oct 26, 2025

mergify bot added the kv-connector label Oct 26, 2025

merge main

57d72e5

Signed-off-by: 0xrushi <[email protected]>

mergify bot added the needs-rebase label Oct 26, 2025

0xrushi closed this Oct 26, 2025

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Oct 26, 2025

github-project-automation bot moved this to Done in Structured Output Oct 26, 2025

github-project-automation bot moved this to Done in Tool Calling Oct 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: double bos token #27517

fix: double bos token #27517

Uh oh!

0xrushi commented Oct 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 commented Oct 26, 2025 •

edited

Loading

Uh oh!

mergify bot commented Oct 26, 2025

Uh oh!

0xrushi commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

132 participants

Uh oh!

fix: double bos token #27517

fix: double bos token #27517

Uh oh!

Conversation

0xrushi commented Oct 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Oct 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Oct 26, 2025

Uh oh!

0xrushi commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

132 participants

0xrushi commented Oct 26, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Oct 26, 2025 •

edited

Loading