OPEA/deepseek-vl2-int4-sym-gptq-inc · How to remove the junk?

Even with skip_special_tokens the output contains the prompt itself:

: Describe this image.

:This image shows a person standing in front of a large, colorful mural. The mural depicts a vibrant cityscape with tall buildings, streets, and people going about their daily lives. The colors are bright and bold, and the overall atmosphere is lively and energetic. The person in the foreground is dressed in casual clothing and is smiling at the camera. The image is taken during the daytime, and the lighting is natural.

Why?

Here is my code:

import requests
import torch
from PIL import Image
from transformers import AutoModelForCausalLM

from deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM


# specify the path to the model
model_path = "OPEA/deepseek-vl2-int4-sym-gptq-inc"
vl_chat_processor: DeepseekVLV2Processor = DeepseekVLV2Processor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True,
    device_map="auto",
    torch_dtype=torch.float16,
)
vl_gpt = vl_gpt.eval()


image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
content = "Describe this image."

## single image conversation example
conversation = [
    {
        "role": "<|User|>",
        "content": content,
    },
    {"role": "<|Assistant|>", "content": ""},
]

# load images and prepare for inputs
pil_images = Image.open(requests.get(image_url, stream=True).raw)
prepare_inputs = vl_chat_processor(
    conversations=conversation,
    images=[pil_images],
    force_batchify=True,
    system_prompt=""
).to(vl_gpt.device)
prepare_inputs = prepare_inputs.to(vl_gpt.device)

# run image encoder to get the image embeddings
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)

# run the model to get the response
outputs = vl_gpt.language.generate(
    input_ids = prepare_inputs["input_ids"],
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True 
)

answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(answer)