Releases: Tencent/YOLO-Master
YOLO-Master-v26.02
🌟 Overview
We are thrilled to announce YOLO-Master v2026.02, a milestone release that achieves major breakthroughs in model efficiency and architectural flexibility, redefining the paradigm for large-scale model training and inference.
🎯 Key Highlights
- 🧠 Mixture of Experts (MoE): Implements dynamic expert activation, significantly enhancing model capacity without proportional increase in computational cost
- ⚡ Low-Rank Adaptation (LoRA): Parameter-efficient fine-tuning that dramatically reduces training resource requirements while achieving 95%+ of full fine-tuning performance
- 🔍 Sparse SAHI: Intelligent adaptive slicing inference, achieving 3-5x speedup for large image detection
- 🎯 Cluster-Weighted NMS: Cluster-based weighted fusion with significantly improved localization accuracy
🚀 New Features
1️⃣ Mixture of Experts (MoE) Support
The MoE architecture enables efficient model scaling through conditional computation, dramatically increasing model capacity while maintaining inference speed. Our implementation includes complete training, inference, and optimization pipelines.
🔧 Core Components
📊 MoE Loss Function (MoELoss)
- Load Balancing Loss 🎯: Ensures balanced expert load distribution, preventing expert collapse
- Z-Loss 📉: Suppresses large logit values, ensuring numerical stability
- Adaptive weight adjustment mechanism that dynamically balances main task loss with auxiliary losses
Implementation: ultralytics/nn/modules/moe/loss.py
✂️ Intelligent Pruning (MoEPruner)
- Validation set-based expert utilization analysis
- Automatic pruning of low-utilization experts (default threshold: 15%)
- Significantly reduces model parameters and inference latency
- Achieves 20-30% inference speedup while maintaining performance
Implementation: ultralytics/nn/modules/moe/pruning.py
🏗️ Modular Architecture
- Decoupled router, expert networks, and gating mechanisms
- Supports multiple routing strategies: Top-K, Soft Routing, Expert Choice
- Highly extensible modular design, easy integration of custom experts
2️⃣ LoRA Support - Parameter-Efficient Fine-Tuning Revolution
LoRA achieves parameter-efficient fine-tuning through low-rank matrix decomposition, reaching 95%+ of full fine-tuning performance while training only 1-5% of parameters.
🎯 Core Innovation: Architecture-Agnostic LoRA Adaptation
Zero-Overhead Integration Principle
We demonstrate that LoRA training can be achieved without adding any new modules to the original YOLO model architecture. This is accomplished through:
- Dynamic Weight Interception: LoRA adapters are applied at the parameter level rather than the module level
- Configuration-Driven Activation: LoRA behavior is controlled entirely through hyperparameter settings
- Backward Compatibility: Models retain their original architecture and can switch between LoRA and standard training modes without code modification
Traditional Approach vs. Our Approach
❌ Traditional Approach (Requires Model Modification)
# Traditional approach: Inject LoRA modules into model
class ConvWithLoRA(nn.Module):
def __init__(self, conv_layer, r, alpha):
super().__init__()
self.conv = conv_layer
self.lora_A = nn.Parameter(...) # NEW MODULE
self.lora_B = nn.Parameter(...) # NEW MODULE
def forward(self, x):
return self.conv(x) + self.lora_B @ self.lora_A @ x✅ Our Approach (Zero Architectural Overhead)
# Our approach: Configuration-only adaptation
# Original model architecture remains UNCHANGED
model = YOLO("yolov8n.pt") # Standard model
# LoRA enabled through configuration
results = model.train(
data="coco8.yaml",
epochs=50,
lora_r=16, # LoRA activated via config
lora_alpha=32,
lora_gradient_checkpointing=True
)
# No model surgery required!📋 Supported Model Matrix with Zero-Overhead Integration
| Model Family | Architecture Type | LoRA Integration Method | Architectural Changes Required | Configuration Parameters |
|---|---|---|---|---|
| YOLOv3 | Convolutional Neural Network | Configuration-only | None ✅ | lora_r, lora_alpha, gradient_checkpointing |
| YOLOv5 | Convolutional Neural Network | Configuration-only | None ✅ | lora_r, lora_alpha, gradient_checkpointing |
| YOLOv6 | Convolutional Neural Network | Configuration-only | None ✅ | lora_r, lora_alpha, gradient_checkpointing |
| YOLOv8 | Convolutional Neural Network | Configuration-only | None ✅ | lora_r, lora_alpha, gradient_checkpointing |
| YOLOv9 | Convolutional Neural Network | Configuration-only | None ✅ | lora_r, lora_alpha, gradient_checkpointing |
| YOLOv10 | Convolutional Neural Network | Configuration-only | None ✅ | lora_r, lora_alpha, gradient_checkpointing |
| YOLO11 | Convolutional Neural Network | Configuration-only | None ✅ | lora_r, lora_alpha, gradient_checkpointing |
| YOLO12 | Hybrid (CNN+Attention) | Configuration-only | None ✅ | lora_r, lora_alpha, include_attention=True |
| RT-DETR | Transformer-based | Configuration-only | None ✅ | lora_r, lora_alpha, include_attention=True |
| YOLO-World | Multi-modal | Configuration-only | None ✅ | lora_r, lora_alpha, include_attention=True |
| YOLO-Master | Mixture of Experts (MoE) | Configuration-only | None ✅ | lora_r, lora_alpha, target_modules=["expert"] |
⚙️ Key LoRA Configuration Parameters
| Parameter | Description | Default Value | YOLO (Conv) | RT-DETR (Transformer) | YOLO-Master (MoE) |
|---|---|---|---|---|---|
lora_r |
Rank of low-rank decomposition | 16 | 16-32 | 8-16 | 32-64 |
lora_alpha |
Scaling factor for LoRA updates | 32 | 32-64 | 16-32 | 64-128 |
lora_dropout |
Dropout probability for LoRA layers | 0.1 | 0.1 | 0.1 | 0.05 |
lora_gradient_checkpointing |
Enable gradient checkpointing | False |
True (mandatory) |
True (mandatory) |
True (mandatory) |
lora_include_attention |
Apply LoRA to attention layers | False |
False |
True |
False |
lora_target_modules |
Regex pattern for target modules | ["conv"] |
["conv"] |
["linear", "conv"] |
["conv", "expert", "router"] |
Implementation: ultralytics/utils/lora.py
📊 Experimental Validation: PEFT Methods Comparison on YOLOv11
To comprehensively validate the effectiveness of LoRA and its variants, we conducted systematic ablation studies based on the YOLOv11 architecture. We compared the following four training strategies:
| Training Strategy | Description | Trainable Parameters Ratio | Typical Use Cases |
|---|---|---|---|
| Full SFT | Full Supervised Fine-Tuning (Baseline) | 100% | Resource-rich environments, pursuing ultimate performance |
| LoRA (r=16) | Low-Rank Adaptation, rank=16 | ~10% | Resource-constrained, rapid adaptation |
| DoRA (r=16) | Weight-Decomposed LoRA, rank=16 | ~12% | Requires stronger expressiveness |
| LoHa (r=16) | Hadamard Product LoRA, rank=16 | ~11% | Balance performance and efficiency |
YOLO-Master-v0.0
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection.
We are excited to announce the first official release of YOLO-Master, a novel YOLO architecture integrated with Mixture-of-Experts (MoE). This release brings significant improvements in accuracy-latency trade-offs, specifically targeting real-time object detection and segmentation tasks.
"Adaptive Intelligence for Every Scene" — YOLO-Master introduces instance-conditional computation, dynamically allocating resources where they are needed most.
🚀 Key Highlights
- MoE Architecture Integration: Native support for Mixture-of-Experts with dynamic routing.
- SOTA Performance: Achieves 42.4% mAP on COCO with just 1.62ms latency (N-scale), outperforming YOLOv10/v11/v12.
- Segmentation Breakthrough: +2.8% mAPmask gain over YOLOv12-seg-N.
- Hardware-Aware Optimization: Optimized for GPU (Batched Compute) and Mobile (Ghost Experts).
🛠 New Features in v1.0
1. Advanced MoE Modules
ModularRouterExpertMoE(Recommended): A highly stable, plug-and-play MoE block featuring:- Shared Experts: Ensures baseline performance and prevents training collapse.
- Z-Loss Integration: Stabilizes router logits for smoother convergence.
UltraOptimizedMoE: Designed for extreme speed, featuring Batched Expert Computation which eliminates Python loops, delivering 3-5x inference speedup on GPUs.GhostExpert: Parameter-efficient experts based on GhostNet, reducing memory bandwidth pressure for mobile/edge deployment.
2. Intelligent Routing
EfficientSpatialRouter: Reduces routing FLOPs by >90% via spatial pre-pooling.DynamicRouting: Adaptive computational resource allocation based on scene complexity.
3. Stability & Ease of Use
- Training Stability: Solved common MoE training instability with Shared Expert paths and specialized router initialization (
std=0.01). - Deployment Ready: Full support for ONNX and TensorRT export.
- New Wiki Guide: Hardware Deployment & Inference Optimization
📊 Benchmarks (COCO val2017)
| Model | Size | mAP (box) | Latency | Comparison |
|---|---|---|---|---|
| YOLOv10-N | 640 | 38.5 | 1.84ms | - |
| YOLOv11-N | 640 | 39.4 | 1.50ms | - |
| YOLOv12-N | 640 | 40.6 | 1.64ms | - |
| YOLO-Master-N | 640 | 42.4 | 1.62ms | SOTA 🏆 |
🔗 Resources
- 📜 Detailed Documentation: Wiki: MoE Modules Explained
- 📄 Paper: arXiv:2512.23273
📥 Quick Start
Installation
git clone https://github.com/isLinXu/YOLO-Master.git
cd YOLO-Master
pip install -r requirements.txt
pip install -e .Training (Single GPU)
from ultralytics import YOLO
model = YOLO('cfg/models/master/v0/det/yolo-master-n.yaml')
results = model.train(data='coco.yaml', epochs=100, imgsz=640)🤝 Contributors
Special thanks to the research team at Tencent Youtu Lab and Singapore Management University.
