๐ Try it now: https://real-time-web-rtc-vlm-multi-object-pi.vercel.app/
An intelligent real-time video streaming and object detection system that combines WebRTC technology with Vision Language Models (VLM) for advanced multi-object detection and analysis.
- Laptop (Viewer) creates a session and displays a QR code
- Phone scans the QR code and grants camera access
- WebRTC establishes peer-to-peer connection for real-time streaming
- AI Detection Service analyzes video frames for object detection
- Real-time overlay displays detected objects with bounding boxes and labels
- Live analytics provides detection statistics and performance metrics
- Frontend: Next.js (TypeScript) - Real-time video streaming interface
- Backend: Node.js with Socket.IO - WebRTC signaling server
- Detection Service: Python FastAPI - YOLOv5 object detection engine
- WebRTC: Peer-to-peer video streaming with frame extraction
- AI Models: YOLOv5 for real-time multi-object detection
cd backend
npm install
npm run dev
cd detection-service
pip install -r requirements.txt
python detection_server.py
cd frontend
npm install
npm run dev
Visit http://localhost:3000
to start intelligent video streaming with object detection!
Real-time-WebRTC-VLM-Multi-Object-Detection/
โโโ backend/ # Node.js WebRTC signaling server
โ โโโ server.js # Main signaling server
โ โโโ package.json # Backend dependencies
โ โโโ test-backend.js # Server testing utilities
โโโ frontend/ # Next.js TypeScript application
โ โโโ src/
โ โ โโโ app/ # Next.js App Router pages
โ โ โโโ components/ # React components for detection overlay
โ โ โโโ utils/ # WebRTC and detection client utilities
โ โโโ package.json # Frontend dependencies
โ โโโ next.config.js # Next.js configuration
โโโ detection-service/ # Python AI detection service
โ โโโ detection_server.py # FastAPI detection server
โ โโโ yolo_detector.py # YOLOv5 detection engine
โ โโโ requirements.txt # Python dependencies
โ โโโ yolov5n.pt # Pre-trained YOLO model
โ โโโ test_detector.py # Detection testing utilities
โโโ README.md # Project documentation
- Connect your GitHub repo to your preferred platform (Render/Heroku)
- Create a new Web Service
- Set build command:
npm install
- Set start command:
npm start
- Add environment variable:
FRONTEND_URL=https://your-deployment-url.com
- Connect your GitHub repo to Vercel/Netlify
- Set root directory to
frontend/
- Add environment variables:
NEXT_PUBLIC_SIGNALING_SERVER_URL=https://your-backend-url.com
NEXT_PUBLIC_DETECTION_SERVER_URL=https://your-detection-service-url.com
- Deploy!
- Deploy to cloud platform supporting Python (Google Cloud Run/AWS Lambda/Heroku)
- Install dependencies:
pip install -r requirements.txt
- Start service:
python detection_server.py
- Ensure service is accessible via HTTP/HTTPS
NEXT_PUBLIC_SIGNALING_SERVER_URL=http://localhost:3001
NEXT_PUBLIC_DETECTION_SERVER_URL=http://localhost:5000
PORT=3001
FRONTEND_URL=http://localhost:3000
NODE_ENV=development
PORT=5000
MODEL_PATH=./yolov5n.pt
DETECTION_THRESHOLD=0.5
MAX_DETECTIONS=100
- Create Session: Visit the web app and click "Start New Detection Session"
- Scan QR Code: Use your phone to scan the displayed QR code
- Grant Permissions: Allow camera and microphone access on your phone
- Enable AI Detection: Toggle object detection to start AI analysis
- Start Streaming: Watch live video with real-time object detection overlays
- Analyze Results: View detection statistics, confidence scores, and FPS metrics
- โ Real-time Object Detection - YOLOv5-powered multi-object recognition
- โ Live Video Streaming - WebRTC peer-to-peer video transmission
- โ Detection Overlays - Bounding boxes with confidence scores
- โ QR Code Session Joining - Easy mobile device connection
- โ Performance Metrics - Real-time FPS and detection statistics
- โ Mobile-Optimized Interface - Responsive design for all devices
- โ Camera Switching - Front/back camera toggle support
- โ Automatic Reconnection - Robust connection handling
- โ Session Management - Secure temporary session handling
- โ Multi-Object Support - Detect multiple objects simultaneously
- โ Configurable Thresholds - Adjustable detection confidence levels
- โ Export Detection Results - Save detection data and statistics
- HTTPS Required: Camera access requires secure connection (except localhost)
- Peer-to-Peer: Video streams directly between devices (not through server)
- AI Processing: Detection runs on dedicated service, no data retention
- Temporary Sessions: Sessions are automatically cleaned up
- No Recording: No video data is stored on servers
- Secure Detection: Object detection data is processed in real-time only
- Node.js 18+
- Python 3.8+
- Modern browser with WebRTC support
- HTTPS for production (camera access requirement)
- Start detection service:
cd detection-service && python detection_server.py
- Start backend:
cd backend && npm run dev
- Start frontend:
cd frontend && npm run dev
- Visit
http://localhost:3000
- โ Chrome (Desktop & Mobile) - Full WebRTC + Detection support
- โ Firefox (Desktop & Mobile) - Full WebRTC + Detection support
- โ Safari (Desktop & Mobile) - WebRTC support with detection
- โ Edge (Desktop) - Full feature support
- โ Internet Explorer (not supported)
Object Detection not working?
- Ensure detection service is running on port 5000
- Check detection service health endpoint
- Verify model file (yolov5n.pt) is present
- Check detection service logs for errors
Camera not working?
- Ensure HTTPS connection in production
- Check browser permissions
- Try refreshing the page
Connection issues?
- Check network connectivity
- Verify environment variables are set correctly
- Check browser console for WebRTC errors
QR code not scanning?
- Ensure good lighting conditions
- Try manual URL entry
- Check if QR scanner app is working properly
Poor detection performance?
- Adjust detection threshold settings
- Check lighting conditions
- Ensure stable network connection
- Monitor detection service CPU/memory usage
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source and available under the MIT License.