Skip to content

ChatterArm: Large Language Model Augmented Vision-based Grasping​

Notifications You must be signed in to change notification settings

SriHasitha/llm-grasp-capstone-docs

Repository files navigation

ChatterArm: Large Language Model Augmented Vision-based Grasping​

RBE 594 Capstone Project

Authors : Sri Lakshmi Hasitha Bachimanchi, Dheeraj Bhogisetty, Soham Shantanu Aserkar​

Abstract

With an aging global population, the need for scalable and effective elderly care solutions is becoming increasingly urgent. This project addresses the challenges of providing support to the elderly in everyday tasks such as fetching objects. The approach combines multi-modal large language model (LLM) with vision-based grasping technique and a robot manipulator to create an interactive robot. Our system allows for interaction through natural language text input, enabling the robot to recognize and manipulate objects with variations in shape and color. Results from simulation tests show that the manipulator can successfully execute tasks based on user commands, demonstrating its potential to operate effectively in real-world scenarios. The impact of this technology extends beyond individual assistance, with potential applications in inventory management, order fulfilment, and waste sorting.

Approach

We aim to integrate Language-Segment-Anything-Model(LangSAM) and Generative Grasp Convolutional Neural Network (GGCNN) into a ROS2 Service Client Framework.

LangSAM

The model is based on the combination of Segment Anything Model(SAM) and GroundingDINO. The goal of using this model was to have a text and image input generalized enough for facilitating communication about conveying intent for identifying region of interest within a scene.

GGCNN

Given a depth image, and a region of interest input from LangSAM, GGCNN uses a cropped depth map as input to predict a grasp pose.

MoveIt2! and ROS2/Gazebo

After receiving the grasp pose, the robot moves towards the object of interest and attempts to pick it up and place it at a target location.

Setup

  1. Create ROS2 workspace
mkdir -p ~/llm-grasping-panda/src
  1. Clone repo
cd ~/llm-grasping-panda/src
git clone https://github.com/SriHasitha/llm-grasp-capstone-docs.git
git clone -b humble https://github.com/nilseuropa/realsense_ros_gazebo.git

or

cd ~/llm-grasping-panda/src
git clone [email protected]:SriHasitha/llm-grasp-capstone-docs.git
git clone -b humble https://github.com/nilseuropa/realsense_ros_gazebo.git
  1. Build packages
cd ~/llm-grasping-panda
colcon build
  1. Source packages
source install/setup.bash

Launch Panda Gazebo Simulation Environment

  1. Launch only Gazebo sim:
ros2 launch franka_env panda_simulation.launch.py
  1. Launch Gazebo + MoveIt!2 Environment
ros2 launch frankaproject_env panda.launch.py
  1. Launch Gazebo + MoveIt!2 Environment + ROS2 Robot Triggers/Actions
ros2 launch frankaproject_env panda_interface.launch.py

Move Panda Arm

Action client sends desired end-effector pose as goal to the /MoveXYZW action

ros2 run move_panda move_panda_client

Run GGCNN Service

Initialize the GGCNN Service

ros2 run ros2_ggcnn ggcnn_service

Call the GGCNN service to predict grasp pose

ros2 service call /grasp_prediction ggcnn_interface/srv/GraspPrediction

Running Integrated Pipeline of ROS2/Gazebo + GGCNN + LangSAM for inference

Step 1: Run this service first

ros2 run langsam_vision vision_node

Step 2: Launch gazebo environment

ros2 launch frankaproject_env panda_interface.launch.py

Step 3: Initialize the GGCNN Service

ros2 run ros2_ggcnn ggcnn_service

Step 4: Launch unified launch file to load all remaining services and send prompt to LangSAM to trigger prediction

ros2 run move_panda move_panda_client --ros-args -p prompt:'Pick up the green ball'

About

ChatterArm: Large Language Model Augmented Vision-based Grasping​

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •