Skip to content

nipunarora8/Autonomous-robot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Autonomous Robot — Voice + Vision + ROS 2

A fully autonomous voice-controlled robot powered by AI. Say a command, and the robot uses computer vision to locate and chase the target in real-time.

Demo


✨ Features

  • 🎙️ Wake Word Detection — Say "Hey Bro, track the cat" to activate (no button press needed).
  • 🛑 Global Stop Command — Say "Stop" at any time to immediately halt the robot.
  • 👁️ Real-time Object Tracking — Powered by YOLOv8n running on a Mac with Apple Silicon (MPS).
  • 🏎️ Proportional Pursuit Controller — The robot steers and drives simultaneously, following curved paths to chase moving targets.
  • 🔊 Two-Way Voice — The robot speaks back using text-to-speech via the on-board speaker.
  • 🌐 ROS 2 Backbone — All inter-device communication runs over ROS 2 via rosbridge.

🏗️ Architecture

┌──────────────────── RASPBERRY PI 5 ─────────────────────┐
│                                                           │
│  pi_camera.py      → publishes /image_raw/compressed     │
│  pi_audio_node.py  → publishes /audio_raw                │
│                    → subscribes /robot_voice (speaker)   │
│  arduino_bridge    → subscribes /robot_commands          │
│       │                                                   │
│  Arduino Nano  ─── Servo Motors (L/R wheels)             │
└───────────────────────────────────────────────────────────┘
           │   WiFi / ROS 2 rosbridge (port 9090)
┌──────────────────────── MAC ───────────────────────────────┐
│                                                             │
│  robot_agent.py                                            │
│    ├── VAD + Whisper (STT)   — listens for wake word       │
│    ├── Ollama / gemma3:1b    — extracts target object      │
│    ├── YOLOv8n + OpenCV      — tracks object in frame      │
│    ├── P-Controller          — sends motor commands        │
│    └── macOS TTS (say)       — speaks responses to Pi      │
└─────────────────────────────────────────────────────────────┘

🗂️ Repository Structure

Autonomous-robot/
│
├── ai_agent/
│   ├── robot_agent.py          # Main AI agent (run on Mac)
│   ├── vision_test.py          # Standalone vision & motor debug script
│   └── requirements.txt
│
├── raspberry_pi/
│   ├── pi_camera.py            # ROS 2 camera node (GStreamer → /image_raw)
│   ├── pi_audio_node.py        # ROS 2 audio node (Mic → /audio_raw, /robot_voice → Speaker)
│   ├── requirements.txt
│   └── ros2_nodes/
│       └── motor_control/      # ROS 2 Python package
│           ├── motor_control/
│           │   └── arduino_bridge.py   # /robot_commands → Serial → Arduino
│           ├── package.xml
│           ├── setup.py
│           └── setup.cfg
│
├── arduino/
│   └── motor_firmware/
│       └── motor_firmware.ino  # Arduino Nano servo controller firmware
│
└── assets/
    └── cat_chaser_2.mp4        # Demo video: robot chasing a cat

🚀 Getting Started

Prerequisites

Hardware:

  • Raspberry Pi 5
  • Raspberry Pi Camera Module (libcamera compatible)
  • I2S Microphone Array (e.g. Google Voice HAT: googlevoicehat-soundcard overlay)
  • I2S Amplifier + Speaker (e.g. MAX98357A)
  • Arduino Nano + 2× Continuous Rotation Servos

Software:

  • ROS 2 Jazzy (on Pi) + rosbridge_suite
  • Python 3.11+
  • Ollama with gemma3:1b model (on Mac)

1. Flash the Arduino

Open arduino/motor_firmware/motor_firmware.ino in the Arduino IDE and upload it to your Nano.

Motor control protocol over Serial (115200 baud):

Command Meaning
L<val> Set Left servo (0-180)
R<val> Set Right servo (0-180)
S Stop both motors

2. Set Up the Raspberry Pi

# Install system dependencies
sudo apt install ros-jazzy-rosbridge-suite python3-pyaudio

# Install Python dependencies
pip install -r raspberry_pi/requirements.txt

# Add to /boot/firmware/config.txt:
#   dtoverlay=googlevoicehat-soundcard

# Start rosbridge
ros2 launch rosbridge_server rosbridge_websocket_launch.xml

# Start camera node
python3 raspberry_pi/pi_camera.py

# Start audio node
python3 raspberry_pi/pi_audio_node.py

# Start motor bridge (in your ROS 2 workspace)
ros2 run motor_control arduino_bridge

3. Set Up the Mac

# Create a virtual environment
python3 -m venv venv && source venv/bin/activate

# Install dependencies
pip install -r mac_agent/requirements.txt

# Pull the LLM model
ollama pull gemma3:1b

# Set the Pi's IP address in robot_agent.py
# PI_IP = '192.168.x.x'

# Run!
python mac_agent/robot_agent.py

🗣️ Voice Commands

Say... Effect
"Hey Robo, track the cat" Robot starts chasing the cat
"Hey Robo, track the person" Robot starts following a person
"Stop" Robot immediately halts (always active)

Trackable objects: person, cat, dog, bottle, cup, backpack, laptop, phone, ball, and more.


⚙️ Key Configuration (mac_agent/robot_agent.py)

Parameter Default Description
PI_IP '192.168.x.x' Raspberry Pi's IP address
OLLAMA_MODEL 'gemma3:1b' Ollama model for command parsing
ENERGY_THRESH 0.02 Microphone sensitivity for VAD
FAR_PX 800 Bounding box width (px) to start moving forward
CLOSE_PX 1500 Bounding box width (px) to stop/back up
CONFIDENCE 0.4 YOLO detection confidence threshold

📹 Demo

The robot tracking and chasing a cat around the room:

cat_chaser_2.mp4

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors