🤖 Autonomous Robot — Voice + Vision + ROS 2

A fully autonomous voice-controlled robot powered by AI. Say a command, and the robot uses computer vision to locate and chase the target in real-time.

✨ Features

🎙️ Wake Word Detection — Say "Hey Bro, track the cat" to activate (no button press needed).
🛑 Global Stop Command — Say "Stop" at any time to immediately halt the robot.
👁️ Real-time Object Tracking — Powered by YOLOv8n running on a Mac with Apple Silicon (MPS).
🏎️ Proportional Pursuit Controller — The robot steers and drives simultaneously, following curved paths to chase moving targets.
🔊 Two-Way Voice — The robot speaks back using text-to-speech via the on-board speaker.
🌐 ROS 2 Backbone — All inter-device communication runs over ROS 2 via rosbridge.

🏗️ Architecture

┌──────────────────── RASPBERRY PI 5 ─────────────────────┐
│                                                           │
│  pi_camera.py      → publishes /image_raw/compressed     │
│  pi_audio_node.py  → publishes /audio_raw                │
│                    → subscribes /robot_voice (speaker)   │
│  arduino_bridge    → subscribes /robot_commands          │
│       │                                                   │
│  Arduino Nano  ─── Servo Motors (L/R wheels)             │
└───────────────────────────────────────────────────────────┘
           │   WiFi / ROS 2 rosbridge (port 9090)
┌──────────────────────── MAC ───────────────────────────────┐
│                                                             │
│  robot_agent.py                                            │
│    ├── VAD + Whisper (STT)   — listens for wake word       │
│    ├── Ollama / gemma3:1b    — extracts target object      │
│    ├── YOLOv8n + OpenCV      — tracks object in frame      │
│    ├── P-Controller          — sends motor commands        │
│    └── macOS TTS (say)       — speaks responses to Pi      │
└─────────────────────────────────────────────────────────────┘

🗂️ Repository Structure

Autonomous-robot/
│
├── ai_agent/
│   ├── robot_agent.py          # Main AI agent (run on Mac)
│   ├── vision_test.py          # Standalone vision & motor debug script
│   └── requirements.txt
│
├── raspberry_pi/
│   ├── pi_camera.py            # ROS 2 camera node (GStreamer → /image_raw)
│   ├── pi_audio_node.py        # ROS 2 audio node (Mic → /audio_raw, /robot_voice → Speaker)
│   ├── requirements.txt
│   └── ros2_nodes/
│       └── motor_control/      # ROS 2 Python package
│           ├── motor_control/
│           │   └── arduino_bridge.py   # /robot_commands → Serial → Arduino
│           ├── package.xml
│           ├── setup.py
│           └── setup.cfg
│
├── arduino/
│   └── motor_firmware/
│       └── motor_firmware.ino  # Arduino Nano servo controller firmware
│
└── assets/
    └── cat_chaser_2.mp4        # Demo video: robot chasing a cat

🚀 Getting Started

Prerequisites

Hardware:

Raspberry Pi 5
Raspberry Pi Camera Module (libcamera compatible)
I2S Microphone Array (e.g. Google Voice HAT: googlevoicehat-soundcard overlay)
I2S Amplifier + Speaker (e.g. MAX98357A)
Arduino Nano + 2× Continuous Rotation Servos

Software:

ROS 2 Jazzy (on Pi) + rosbridge_suite
Python 3.11+
Ollama with gemma3:1b model (on Mac)

1. Flash the Arduino

Open arduino/motor_firmware/motor_firmware.ino in the Arduino IDE and upload it to your Nano.

Motor control protocol over Serial (115200 baud):

Command	Meaning
`L<val>`	Set Left servo (0-180)
`R<val>`	Set Right servo (0-180)
`S`	Stop both motors

2. Set Up the Raspberry Pi

# Install system dependencies
sudo apt install ros-jazzy-rosbridge-suite python3-pyaudio

# Install Python dependencies
pip install -r raspberry_pi/requirements.txt

# Add to /boot/firmware/config.txt:
#   dtoverlay=googlevoicehat-soundcard

# Start rosbridge
ros2 launch rosbridge_server rosbridge_websocket_launch.xml

# Start camera node
python3 raspberry_pi/pi_camera.py

# Start audio node
python3 raspberry_pi/pi_audio_node.py

# Start motor bridge (in your ROS 2 workspace)
ros2 run motor_control arduino_bridge

3. Set Up the Mac

# Create a virtual environment
python3 -m venv venv && source venv/bin/activate

# Install dependencies
pip install -r mac_agent/requirements.txt

# Pull the LLM model
ollama pull gemma3:1b

# Set the Pi's IP address in robot_agent.py
# PI_IP = '192.168.x.x'

# Run!
python mac_agent/robot_agent.py

🗣️ Voice Commands

Say...	Effect
`"Hey Robo, track the cat"`	Robot starts chasing the cat
`"Hey Robo, track the person"`	Robot starts following a person
`"Stop"`	Robot immediately halts (always active)

Trackable objects: person, cat, dog, bottle, cup, backpack, laptop, phone, ball, and more.

⚙️ Key Configuration (mac_agent/robot_agent.py)

Parameter	Default	Description
`PI_IP`	`'192.168.x.x'`	Raspberry Pi's IP address
`OLLAMA_MODEL`	`'gemma3:1b'`	Ollama model for command parsing
`ENERGY_THRESH`	`0.02`	Microphone sensitivity for VAD
`FAR_PX`	`800`	Bounding box width (px) to start moving forward
`CLOSE_PX`	`1500`	Bounding box width (px) to stop/back up
`CONFIDENCE`	`0.4`	YOLO detection confidence threshold

📹 Demo

The robot tracking and chasing a cat around the room:

cat_chaser_2.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ai_agent		ai_agent
arduino/motor_firmware		arduino/motor_firmware
assets		assets
raspberry_pi		raspberry_pi
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Autonomous Robot — Voice + Vision + ROS 2

✨ Features

🏗️ Architecture

🗂️ Repository Structure

🚀 Getting Started

Prerequisites

1. Flash the Arduino

2. Set Up the Raspberry Pi

3. Set Up the Mac

🗣️ Voice Commands

⚙️ Key Configuration (mac_agent/robot_agent.py)

📹 Demo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Autonomous Robot — Voice + Vision + ROS 2

✨ Features

🏗️ Architecture

🗂️ Repository Structure

🚀 Getting Started

Prerequisites

1. Flash the Arduino

2. Set Up the Raspberry Pi

3. Set Up the Mac

🗣️ Voice Commands

⚙️ Key Configuration (mac_agent/robot_agent.py)

📹 Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages