Spatial Detector

A Python package for 3D object detection and spatial mapping using a webcam, combining YOLO object detection with monocular depth estimation.

Spatial Detector in action: 2D object detection with 3D mapping visualization

Overview

Spatial Detector combines state-of-the-art object detection with monocular depth estimation to create a 3D representation of objects in your environment using only a standard webcam or iPhone camera. It provides a user-friendly web interface, a desktop command-line application, and a Python API for integration into your own applications.

The project uses YOLOv8 for object detection and MiDaS for depth estimation, with optimizations for both Apple Silicon (M1/M2) through Metal Performance Shaders and NVIDIA GPUs through CUDA.

Features

Core Capabilities

Real-time Object Detection: Identify and classify objects using YOLOv8
Monocular Depth Estimation: Calculate distances using neural network-based depth sensing
3D Spatial Mapping: Project detected objects into 3D space
Real-time Visualization: See depth maps and object positions interactively
Top-down View: Get a bird's eye perspective of your environment

User Interface Options

User-Friendly Web Interface: Modern, responsive web app for easy use
iPhone Camera Support: Stream directly from iPhone cameras
Interactive 3D Visualization: Three.js-based 3D view with controls
Setup Wizard: Guided setup process for new users
Desktop Application: Traditional OpenCV-based UI

Technical Features

Hardware Acceleration: Optimized for:
- Apple Silicon (M1/M2) chips using Metal Performance Shaders (MPS)
- NVIDIA GPUs using CUDA
- Fallback to CPU for compatibility
Depth Calibration: Tools for accurate real-world measurements
Interactive Controls: Toggle visualizations and adjust settings in real-time
Persistence and Tracking: Follow objects as they move through your space
Spatial Mapping: Create top-down views of your environment with detected objects
WebSocket/Socket.IO: Real-time communication between client and server
QR Code Pairing: Simple device connection for mobile devices

Recent Updates (v0.1.1)

Improved Performance: Refactored web application for better performance and stability
Enhanced UI: Modularized JavaScript code for improved maintainability
Fixed Socket.IO Issues: Stabilized WebSocket connections and error handling
Better Bounding Box Rendering: Fixed rendering issues and improved visual quality
3D Map Display Improvements: Enhanced 3D visualization with better object tracking
Error Handling: Added comprehensive error handling for depth estimation

Installation

Prerequisites

Python 3.7 or later
PyTorch 1.13 or later
OpenCV 4.5 or later

Development environment setup with required dependencies

Option 1: Install from PyPI (Recommended)

# Install base package with standard pip
pip install spatial-detector

# Or use UV for faster installation
uv pip install spatial-detector

# Install with web UI support
pip install "spatial-detector[web]"
uv pip install "spatial-detector[web]"

Option 2: Install from Source

# Clone the repository
git clone https://github.com/oldhero5/spatial-detector.git
cd spatial-detector

# Install base package
pip install -e .
uv pip install -e .

# Install with web UI support
pip install -e ".[web]"
uv pip install -e ".[web]"

# Install with all development dependencies
pip install -e ".[dev,web]"
uv pip install -e ".[dev,web]"

Successful installation with all dependencies loaded

Quick Start

After installation, run one of the applications with default settings:

# Start desktop webcam application
spatial-detector

# Start web interface (including iPhone support)
spatial-detector-web

The desktop application will:

Access your default webcam
Load the lightweight YOLOv8n model
Initialize the MiDaS depth estimator
Show both the main detection view and a top-down spatial map

Desktop application showing 2D detection with depth visualization

The web interface will:

Start a web server on http://localhost:5011
Provide a user-friendly interface accessible from any browser
Support iPhone camera streaming via QR code pairing
Offer enhanced 3D visualization with interactive controls

Web interface showing detection and 3D mapping with interactive controls

Usage

Command Line Interface

Desktop Application

The CLI provides a flexible interface for using Spatial Detector with various options:

# Basic usage with default webcam
spatial-detector

# Specify a different webcam (by index)
spatial-detector --camera 1

# Use a different YOLO model
spatial-detector --yolo-model yolov8s.pt

# Specify resolution
spatial-detector --width 1280 --height 720

# Force a specific compute device
spatial-detector --device cuda  # For NVIDIA GPU
spatial-detector --device mps   # For Apple Silicon
spatial-detector --device cpu   # Force CPU processing

# Record video output
spatial-detector --record output.mp4

# Load calibration files
spatial-detector --calibration camera_calibration.json --depth-calibration depth_calibration.json

# Customize room dimensions for mapping
spatial-detector --room-width 10.0 --room-depth 8.0

Web Interface

The web interface can be customized with these options:

# Start web server with default settings (localhost:5000)
spatial-detector-web

# Specify host and port
spatial-detector-web --host 0.0.0.0 --port 8080

# Enable debug mode for development
spatial-detector-web --debug

# Specify custom template and static file locations
spatial-detector-web --templates /path/to/templates --static /path/to/static

The web interface is accessible at http://localhost:5011 by default (or the host:port you specify).

Web Interface

The web-based user interface offers a more user-friendly way to use Spatial Detector:

Web UI components showing control panel, visualization area, and 3D map

Features

Setup Wizard: Guided onboarding for new users
Device Selection: Support for webcams and iPhone cameras
Real-time Visualization: Live detection results with depth information
Interactive 3D Map: Three.js-based spatial visualization
Project Management: Save and load project states
Calibration Tools: User-friendly calibration workflow
Mobile Responsiveness: Works on desktop and mobile browsers

iPhone Integration

To use your iPhone as a camera source:

Start the web server with spatial-detector-web
Open the web interface in your browser
Click "Connect to iPhone" or select iPhone in the setup wizard
Scan the generated QR code with your iPhone camera
Follow the prompts on your iPhone to allow camera access
Start using your iPhone as a high-quality camera source

iPhone camera streaming via QR code pairing process

The iPhone integration uses web technologies for low-latency, high-quality video streaming.

3D Visualization

The web interface includes an advanced 3D visualization system:

3D visualization showing detected objects in a spatial environment

Interactive Controls: Rotate, pan, and zoom the 3D view
Object Tracking: Track detected objects in 3D space
Custom Rendering: Configurable colors and visualization styles
Grid System: Reference grid for scale and orientation
3D Export: Export the spatial map for use in other applications

Full list of CLI options:

Option	Description	Default
`--camera`	Camera index	0
`--yolo-model`	YOLO model path or name	yolov8n.pt
`--confidence`	Detection confidence threshold	0.25
`--device`	Computation device (mps, cuda, cpu, auto)	auto
`--calibration`	Camera calibration file	None
`--depth-calibration`	Depth calibration file	None
`--record`	Path to save video recording	None
`--width`	Camera width	640
`--height`	Camera height	480
`--room-width`	Room width in meters	5.0
`--room-depth`	Room depth in meters	5.0

Python API

For programmatic use, Spatial Detector can be integrated into your own Python applications:

from spatial_detector.detection import YOLODetector
from spatial_detector.depth import MiDaSDepthEstimator
from spatial_detector.projection import PinholeCamera
from spatial_detector.visualization import Visualizer
from spatial_detector.mapping import SpatialMap, DepthCalibrator

import cv2
import numpy as np

# Initialize components
detector = YOLODetector(model_path="yolov8n.pt", confidence=0.25)
depth_estimator = MiDaSDepthEstimator()
camera = PinholeCamera()
visualizer = Visualizer(show_depth=True, show_labels=True)
spatial_map = SpatialMap(room_dimensions=(5.0, 5.0))
depth_calibrator = DepthCalibrator()

# Open webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Detect objects
    detections = detector.detect(frame)

    # Estimate depth
    depth_map, depth_norm = depth_estimator.estimate_depth(frame)

    # Convert to metric depth
    metric_depth_map = depth_calibrator.depth_to_meters(depth_norm)

    # Project to 3D space
    positions_3d = []
    for detection in detections:
        center_x, center_y = detection['center']
        normalized_depth = depth_estimator.get_depth_at_point(depth_norm, center_x, center_y)
        if normalized_depth is not None:
            metric_depth = depth_calibrator.depth_to_meters(normalized_depth)
            position_3d = camera.pixel_to_3d(center_x, center_y, metric_depth, normalized_depth=False)
            positions_3d.append(position_3d)
        else:
            positions_3d.append((0, 0, 0))

    # Update spatial map
    spatial_map.update(detections, positions_3d)

    # Visualize results
    annotated_frame = visualizer.draw_detections(frame, detections, positions_3d)
    map_viz = spatial_map.get_topdown_view(width=400, height=400)

    # Display
    cv2.imshow('Detection', annotated_frame)
    cv2.imshow('Map', map_viz)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Controls

Desktop Application Controls

When using the CLI application, the following keyboard controls are available:

Key	Function
`q`	Quit application
`d`	Toggle depth visualization
`l`	Toggle object labels
`m`	Toggle map mode (topdown/3D)
`c`	Enter/exit calibration mode
`+`/`-`	Adjust calibration distance (in calibration mode)
`space`	Set calibration point (in calibration mode)
`s`	Save depth calibration

Web Interface Controls

The web interface provides the following interactive controls:

Camera Controls:

Start/Stop Camera: Toggle webcam capture
Select Camera: Choose from available webcams
Connect to iPhone: Generate QR code for iPhone connection

Visualization Controls:

Show/Hide Labels: Toggle object labels
Show/Hide Depth Map: Toggle depth visualization
Show/Hide Grid: Toggle 3D grid

Map Controls:

Show/Hide Map: Toggle 3D map view
Reset View: Return to default camera position
Export 3D Model: Export the spatial map (when available)

Calibration Controls:

Start Calibration: Enter calibration mode
Save Calibration: Save current calibration

Project Controls:

New Project: Start a new spatial mapping project
Save Project: Save current project state
Load Project: Load a previously saved project

Architecture

Spatial Detector follows a modular design with these core components:

Components

YOLODetector: Handles 2D object detection using YOLOv8
- Identifies objects in the scene
- Provides bounding boxes, class labels, and confidence scores
MiDaSDepthEstimator: Provides monocular depth estimation
- Generates a depth map from a single RGB image
- Uses MiDaS deep learning model
PinholeCamera: Implements camera projection model
- Converts 2D pixel coordinates to 3D space
- Handles camera calibration
DepthCalibrator: Calibrates depth values to real-world measurements
- Converts normalized depth to metric distances
- Provides tools for distance calibration
SpatialMap: Creates and maintains a spatial representation
- Tracks objects in 3D space
- Provides top-down visualization
Visualizer: Handles visualization and UI elements
- Draws bounding boxes and labels
- Renders depth maps and other visualizations
Web Interface: Provides a user-friendly web-based UI
- Supports iPhone camera streaming
- Interactive 3D visualization with Three.js
- Project management and calibration wizard
CLI Application: Ties everything together in an interactive interface

Data Flow

Desktop Application Flow

Image acquisition from webcam
Parallel processing:
- Object detection (YOLOv8)
- Depth estimation (MiDaS)
Depth calibration and conversion to metrics
3D projection of detected objects
Spatial mapping and object tracking
Visualization and user interaction

Web Interface Flow

Image acquisition from webcam or iPhone camera
Image streaming via WebSocket/WebRTC
Server-side processing:
- Object detection (YOLOv8)
- Depth estimation (MiDaS)
- 3D projection and mapping
Real-time results pushed to client
Client-side 3D visualization with Three.js
Interactive controls and project management

Calibration

For accurate real-world measurements, Spatial Detector includes a calibration workflow:

Calibration interface showing distance adjustment and crosshair alignment

Depth Calibration

Enter calibration mode by pressing c or clicking the "Start Calibration" button in the web interface
Place an object at a known distance from the camera (e.g., 1 meter)
Use +/- keys to adjust the displayed distance to match the actual distance
Align the object with the center crosshair
Press space to set the calibration point
Press s or click "Save Calibration" to save the calibration to a file

Calibration is saved to depth_calibration.json by default or to the file specified with --depth-calibration.

Depth calibration results showing improved distance measurements

Camera Calibration (Advanced)

For even more accurate 3D positioning, you can provide a camera calibration file with intrinsic parameters:

Generate a calibration file using OpenCV's camera calibration tools

Save it as a JSON file with the following format:

{
    "fx": 1000.0,
    "fy": 1000.0,
    "cx": 640.0,
    "cy": 360.0
}

Provide the file path using the --calibration option

Advanced camera calibration with intrinsic parameters visualization

API Reference