Skip to content

Releases: Tencent/YOLO-Master

YOLO-Master-v26.02

13 Feb 07:12

Choose a tag to compare

YOLO-Master Logo

🎯 YOLO-Master v2026.02 Release Notes

LicensePythonPyTorchGitHub starsGitHub forksTechnical Report


🌟 Overview

We are thrilled to announce YOLO-Master v2026.02, a milestone release that achieves major breakthroughs in model efficiency and architectural flexibility, redefining the paradigm for large-scale model training and inference.

🎯 Key Highlights

  • 🧠 Mixture of Experts (MoE): Implements dynamic expert activation, significantly enhancing model capacity without proportional increase in computational cost
  • ⚡ Low-Rank Adaptation (LoRA): Parameter-efficient fine-tuning that dramatically reduces training resource requirements while achieving 95%+ of full fine-tuning performance
  • 🔍 Sparse SAHI: Intelligent adaptive slicing inference, achieving 3-5x speedup for large image detection
  • 🎯 Cluster-Weighted NMS: Cluster-based weighted fusion with significantly improved localization accuracy

🚀 New Features

1️⃣ Mixture of Experts (MoE) Support

The MoE architecture enables efficient model scaling through conditional computation, dramatically increasing model capacity while maintaining inference speed. Our implementation includes complete training, inference, and optimization pipelines.

🔧 Core Components

📊 MoE Loss Function (MoELoss)

  • Load Balancing Loss 🎯: Ensures balanced expert load distribution, preventing expert collapse
  • Z-Loss 📉: Suppresses large logit values, ensuring numerical stability
  • Adaptive weight adjustment mechanism that dynamically balances main task loss with auxiliary losses

Implementation: ultralytics/nn/modules/moe/loss.py

✂️ Intelligent Pruning (MoEPruner)

  • Validation set-based expert utilization analysis
  • Automatic pruning of low-utilization experts (default threshold: 15%)
  • Significantly reduces model parameters and inference latency
  • Achieves 20-30% inference speedup while maintaining performance

Implementation: ultralytics/nn/modules/moe/pruning.py

🏗️ Modular Architecture

  • Decoupled router, expert networks, and gating mechanisms
  • Supports multiple routing strategies: Top-K, Soft Routing, Expert Choice
  • Highly extensible modular design, easy integration of custom experts

2️⃣ LoRA Support - Parameter-Efficient Fine-Tuning Revolution

LoRA achieves parameter-efficient fine-tuning through low-rank matrix decomposition, reaching 95%+ of full fine-tuning performance while training only 1-5% of parameters.

🎯 Core Innovation: Architecture-Agnostic LoRA Adaptation

Zero-Overhead Integration Principle

We demonstrate that LoRA training can be achieved without adding any new modules to the original YOLO model architecture. This is accomplished through:

  1. Dynamic Weight Interception: LoRA adapters are applied at the parameter level rather than the module level
  2. Configuration-Driven Activation: LoRA behavior is controlled entirely through hyperparameter settings
  3. Backward Compatibility: Models retain their original architecture and can switch between LoRA and standard training modes without code modification
Traditional Approach vs. Our Approach

❌ Traditional Approach (Requires Model Modification)

# Traditional approach: Inject LoRA modules into model
class ConvWithLoRA(nn.Module):
    def __init__(self, conv_layer, r, alpha):
        super().__init__()
        self.conv = conv_layer
        self.lora_A = nn.Parameter(...)  # NEW MODULE
        self.lora_B = nn.Parameter(...)  # NEW MODULE
        
    def forward(self, x):
        return self.conv(x) + self.lora_B @ self.lora_A @ x

✅ Our Approach (Zero Architectural Overhead)

# Our approach: Configuration-only adaptation
# Original model architecture remains UNCHANGED
model = YOLO("yolov8n.pt")  # Standard model

# LoRA enabled through configuration
results = model.train(
    data="coco8.yaml",
    epochs=50,
    lora_r=16,              # LoRA activated via config
    lora_alpha=32,
    lora_gradient_checkpointing=True
)

# No model surgery required!

📋 Supported Model Matrix with Zero-Overhead Integration

Model Family Architecture Type LoRA Integration Method Architectural Changes Required Configuration Parameters
YOLOv3 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv5 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv6 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv8 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv9 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLOv10 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLO11 Convolutional Neural Network Configuration-only None lora_r, lora_alpha, gradient_checkpointing
YOLO12 Hybrid (CNN+Attention) Configuration-only None lora_r, lora_alpha, include_attention=True
RT-DETR Transformer-based Configuration-only None lora_r, lora_alpha, include_attention=True
YOLO-World Multi-modal Configuration-only None lora_r, lora_alpha, include_attention=True
YOLO-Master Mixture of Experts (MoE) Configuration-only None lora_r, lora_alpha, target_modules=["expert"]

⚙️ Key LoRA Configuration Parameters

Parameter Description Default Value YOLO (Conv) RT-DETR (Transformer) YOLO-Master (MoE)
lora_r Rank of low-rank decomposition 16 16-32 8-16 32-64
lora_alpha Scaling factor for LoRA updates 32 32-64 16-32 64-128
lora_dropout Dropout probability for LoRA layers 0.1 0.1 0.1 0.05
lora_gradient_checkpointing Enable gradient checkpointing False True (mandatory) True (mandatory) True (mandatory)
lora_include_attention Apply LoRA to attention layers False False True False
lora_target_modules Regex pattern for target modules ["conv"] ["conv"] ["linear", "conv"] ["conv", "expert", "router"]

Implementation: ultralytics/utils/lora.py


📊 Experimental Validation: PEFT Methods Comparison on YOLOv11

To comprehensively validate the effectiveness of LoRA and its variants, we conducted systematic ablation studies based on the YOLOv11 architecture. We compared the following four training strategies:

Training Strategy Description Trainable Parameters Ratio Typical Use Cases
Full SFT Full Supervised Fine-Tuning (Baseline) 100% Resource-rich environments, pursuing ultimate performance
LoRA (r=16) Low-Rank Adaptation, rank=16 ~10% Resource-constrained, rapid adaptation
DoRA (r=16) Weight-Decomposed LoRA, rank=16 ~12% Requires stronger expressiveness
LoHa (r=16) Hadamard Product LoRA, rank=16 ~11% Balance performance and efficiency
Read more

YOLO-Master-v0.0

31 Dec 13:00

Choose a tag to compare

YOLO-Master-v0.0 Pre-release
Pre-release

YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection.

We are excited to announce the first official release of YOLO-Master, a novel YOLO architecture integrated with Mixture-of-Experts (MoE). This release brings significant improvements in accuracy-latency trade-offs, specifically targeting real-time object detection and segmentation tasks.

"Adaptive Intelligence for Every Scene" — YOLO-Master introduces instance-conditional computation, dynamically allocating resources where they are needed most.

🚀 Key Highlights

  • MoE Architecture Integration: Native support for Mixture-of-Experts with dynamic routing.
  • SOTA Performance: Achieves 42.4% mAP on COCO with just 1.62ms latency (N-scale), outperforming YOLOv10/v11/v12.
  • Segmentation Breakthrough: +2.8% mAPmask gain over YOLOv12-seg-N.
  • Hardware-Aware Optimization: Optimized for GPU (Batched Compute) and Mobile (Ghost Experts).

🛠 New Features in v1.0

1. Advanced MoE Modules

  • ModularRouterExpertMoE (Recommended): A highly stable, plug-and-play MoE block featuring:
    • Shared Experts: Ensures baseline performance and prevents training collapse.
    • Z-Loss Integration: Stabilizes router logits for smoother convergence.
  • UltraOptimizedMoE: Designed for extreme speed, featuring Batched Expert Computation which eliminates Python loops, delivering 3-5x inference speedup on GPUs.
  • GhostExpert: Parameter-efficient experts based on GhostNet, reducing memory bandwidth pressure for mobile/edge deployment.

2. Intelligent Routing

  • EfficientSpatialRouter: Reduces routing FLOPs by >90% via spatial pre-pooling.
  • DynamicRouting: Adaptive computational resource allocation based on scene complexity.

3. Stability & Ease of Use

  • Training Stability: Solved common MoE training instability with Shared Expert paths and specialized router initialization (std=0.01).
  • Deployment Ready: Full support for ONNX and TensorRT export.

📊 Benchmarks (COCO val2017)

Model Size mAP (box) Latency Comparison
YOLOv10-N 640 38.5 1.84ms -
YOLOv11-N 640 39.4 1.50ms -
YOLOv12-N 640 40.6 1.64ms -
YOLO-Master-N 640 42.4 1.62ms SOTA 🏆

🔗 Resources

📥 Quick Start

Installation

git clone https://github.com/isLinXu/YOLO-Master.git
cd YOLO-Master
pip install -r requirements.txt
pip install -e .

Training (Single GPU)

from ultralytics import YOLO

model = YOLO('cfg/models/master/v0/det/yolo-master-n.yaml')
results = model.train(data='coco.yaml', epochs=100, imgsz=640)

🤝 Contributors

Special thanks to the research team at Tencent Youtu Lab and Singapore Management University.