Skip to content

An open-source server implementation for inference Qwen2-VL series model using fastapi.

License

Notifications You must be signed in to change notification settings

ZachcZhang/Qwen2-VL-inference

Repository files navigation

Qwen2-VL-inference

This repository is a MLLM inference server which is used for qwen2-vl series model using HuggingFace.

If you find this project helpful, please give it a ⭐, and for any questions or issues, feel free to create an issue.

Features

  • Support OpenAI client
  • stream response & normal response

Update

  • [2024/10/26] Open the source code.

Installation

Install the reqired packages using requirements.txt

conda create --name qwen2_vl python==3.10

conda activate qwen2_vl

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

How to start server

  • run the server
MODEL=Qwen/Qwen2-VL-7B-Instruct API_PORT=10102 API_HOST=0.0.0.0 python app.py 
  • Hardware Requirement(approximately)

    Qwen2_vl 7B 72B
    GPU 16GB 150GB

Use Case

How to start client using OpenAI

from openai import OpenAI

client = OpenAI(
    base_url='http://127.0.0.1:10102/v1',
    # required but ignored
    api_key='your_key',
)

stream = client.chat.completions.create(
    messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
                    },
                    {"type": "text", "text": "请详细描述这张图片."},
                ],
            }
        ],
    model='qwen',
    temperature=0, 
    max_tokens=256,
    stream=True
)

full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        full_response += content
        print(content)

How to start client using Restful

Click here Using Restful

message formart

## url
response = client.chat.completions.create(
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
                },
                {"type": "text", "text": "请详细描述这张图片."},
            ],
        }
    ],
    model='qwen',
    max_tokens=512,
    stream=False
)

## base64
response = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": f"data:image;base64,{base64_data}"
                },
                {"type": "text", "text": "请描述这张图片"},
            ],
        }
    ],
    model='qwen',
    max_tokens=512,
    stream=False
)

Docs

Because we are using FastAPI to build the service, you can visit http://127.0.0.1:10102/docs to get the API document.

License

This project is licensed under the Apache-2.0 License.

Citation

If you find this repository useful in your project, please consider giving a star 🌟 and citing:

@misc{Qwen2-VL-inference,
  author = {Cheng ZHANG},
  title = {Qwen2-VL-inference},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/ZachcZhang/Qwen2-VL-inference}
}

Acknowledgement

This project is based on

About

An open-source server implementation for inference Qwen2-VL series model using fastapi.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published