Skip to content

baibizhe/Efficient-R1-VLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Efficient-R1-VLLM

🚀 Efficient-R1-VLLM: Efficient RL-Tuned MoE Vision-Language Model For Reasoning

License Python 3.10+ PyTorch 2.2+ SGLang Optimized

is the first project to apply reward-based reinforcement learning (GRPO) to finetune Mixture-of-Experts (MoE) vision-language model,(DeepSeek2-VL,) for multimodal reasoning tasks. We focus on optimizing training efficiency via SGLang-accelerated rollouts while maintaining the model’s reasoning capabilities.

Architecture
Pipeline: Vision-Language Input → DeepSeek2-VL MoE → GRPO Reward Optimization → Reasoning Output

🔥 Key Findings

  • Enforcing Image Captions Significantly Improves Vision-Language Model Performance
    • Motivation: Compared to LLMs, VLLMs incorporate visual components, enabling them to leverage image information during reasoning.

    • Implementation: Modify prompts and reinforcement learning reward feedback to require the model to output an image caption before the reasoning output. For example:

      <caption>A right-angled triangle is shown where one leg forms an obtuse angle at vertex A while another side runs parallel to its hypotenuse. Within this diagram, three points labeled P, E, and Q have been marked such that they correspondingly intersect lines AD and EC along segments DB and BE.</caption>
      <think>xxxxxxxxxxxxxxxxxxxxxxx...</think>
      </answer> xxxxxxxxxxxxxxxxxxxxxxxx...<answer>
      
    • Results: : In the Qwen-7B-Instruct model with the GRPO algorithm, enforcing image captions led to significant performance improvements.

    • Visualization: : The blue curve in blow figure is the accuracy for only the <think> and <answer> . The purple is the accuracy for enforcing <caption> .

image

🔥 Key Innovations

  1. First RL-Tuned MoE Vision-Language
    • Pioneer reinforcement learning adaptation of DeepSeek2-VL-MoE (8x experts) on complex vision-language datasets (e.g., ScienceQA, VCR).
  2. SGLang-Optimized Rollouts
    • Achieve 1.7 xfaster trajectory sampling by integrating SGLang with DeepSeek2-VL’s official codebase.
  3. Embedded evaluation loop in trl framework

🚀 Quick Start

code will come soon .

Installation

cd MoERL-Vision  
pip install -r requirements.txt  # Requires CUDA 12.x and NVIDIA GPUs  

# Install SGLang for accelerated rollouts  
pip install "sglang[all]"

Our codebase builds upon R1-Multimodal-Journey and integrates open-source contributions from vLLM, Open-R1, and trl. We also extend our gratitude to DeepSeek-R1 and Qwen2.5-VL for their open-source techniques and base models, which have enabled us to further our exploration.

Core Contributors (equal contribution with Alphabetical sorting)

Bai Bizhe

Thanks for the help from Professor Wenqi Shao and Qiaosheng Zhang for their help.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages