The goal is to build something to sort a bucket of lego into individual kits and make rebuilding them straight forward.
- I don't want to teach it how to do it - the chatbot must be smart
- It can run for a week - I don't care as long as it sorts them
- It can't require continuous input from me
Written in python, this project demonstrates how a chatbot can see and control a robot arm. The project runs completely offline. The llama3.2-vision model is not yet smart enough to understand that their is nothing between the pincers, but it can move around on command. Model evaluation is between 10 and 300 seconds with an old Nvidia TESLA m40 24GB.
Turn up volume to hear conversation
ChatbotRobotArm.mov
- 6DOF Robotic Arm Kit from Amazon
- ESP32-CAM Webcam flashed with Webcam Software
- USB to serial UART to control arm from PC (No dependency on this hardware)
- Ollama as the "offline" chatbot
- GPU - I picked up a Nvidia TESLA m40 24GB for $70 on ebay
- llama3.2-vision model as it accepts image input
- Text-To-Speach to speak the chatbots' responses - runs offline
- Speach Recognition to listen for commands
- Python3.11 (No dependency on this version)
- Windows 10 (No dependency on this Platform)
- Evaluate online chatbots to speed up evaluation
- Evaluate whether a second camera, and/or other angles helps chatbot understanding
- Add interfaces to allow different hardware and APIs to be configured
- Find a vision aware model that uses Ollamas newer "tools" interface
git clone https://github.com/ribbles/ChatbotRobotArm/
cd ChatbotRobotArm
pip install -r requirements
pytest
python src/server.py