ChatterArm: Large Language Model Augmented Vision-based Grasping

RBE 594 Capstone Project

Authors : Sri Lakshmi Hasitha Bachimanchi, Dheeraj Bhogisetty, Soham Shantanu Aserkar

Abstract

With an aging global population, the need for scalable and effective elderly care solutions is becoming increasingly urgent. This project addresses the challenges of providing support to the elderly in everyday tasks such as fetching objects. The approach combines multi-modal large language model (LLM) with vision-based grasping technique and a robot manipulator to create an interactive robot. Our system allows for interaction through natural language text input, enabling the robot to recognize and manipulate objects with variations in shape and color. Results from simulation tests show that the manipulator can successfully execute tasks based on user commands, demonstrating its potential to operate effectively in real-world scenarios. The impact of this technology extends beyond individual assistance, with potential applications in inventory management, order fulfilment, and waste sorting.

Video Demonstration

Approach

We aim to integrate Language-Segment-Anything-Model(LangSAM) and Generative Grasp Convolutional Neural Network (GGCNN) into a ROS2 Service Client Framework.

LangSAM

The model is based on the combination of Segment Anything Model(SAM) and GroundingDINO. The goal of using this model was to have a text and image input generalized enough for facilitating communication about conveying intent for identifying region of interest within a scene.

GGCNN

Given a depth image, and a region of interest input from LangSAM, GGCNN uses a cropped depth map as input to predict a grasp pose.

MoveIt2! and ROS2/Gazebo

After receiving the grasp pose, the robot moves towards the object of interest and attempts to pick it up and place it at a target location.

Setup

Create ROS2 workspace

mkdir -p ~/llm-grasping-panda/src

Clone repo

cd ~/llm-grasping-panda/src
git clone https://github.com/SriHasitha/llm-grasp-capstone-docs.git
git clone -b humble https://github.com/nilseuropa/realsense_ros_gazebo.git

or

cd ~/llm-grasping-panda/src
git clone [email protected]:SriHasitha/llm-grasp-capstone-docs.git
git clone -b humble https://github.com/nilseuropa/realsense_ros_gazebo.git

Build packages

cd ~/llm-grasping-panda
colcon build

Source packages

source install/setup.bash

Launch Panda Gazebo Simulation Environment

Launch only Gazebo sim:

ros2 launch franka_env panda_simulation.launch.py

Launch Gazebo + MoveIt!2 Environment

ros2 launch frankaproject_env panda.launch.py

Launch Gazebo + MoveIt!2 Environment + ROS2 Robot Triggers/Actions

ros2 launch frankaproject_env panda_interface.launch.py

Move Panda Arm

Action client sends desired end-effector pose as goal to the /MoveXYZW action

ros2 run move_panda move_panda_client

Run GGCNN Service

Initialize the GGCNN Service

ros2 run ros2_ggcnn ggcnn_service

Call the GGCNN service to predict grasp pose

ros2 service call /grasp_prediction ggcnn_interface/srv/GraspPrediction

Running Integrated Pipeline of ROS2/Gazebo + GGCNN + LangSAM for inference

Step 1: Run this service first

ros2 run langsam_vision vision_node

Step 2: Launch gazebo environment

ros2 launch frankaproject_env panda_interface.launch.py

Step 3: Initialize the GGCNN Service

ros2 run ros2_ggcnn ggcnn_service

Step 4: Launch unified launch file to load all remaining services and send prompt to LangSAM to trigger prediction

ros2 run move_panda move_panda_client --ros-args -p prompt:'Pick up the green ball'

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
cpp_imagesub		cpp_imagesub
franka_env		franka_env
frankaproject_env		frankaproject_env
gazebo_images		gazebo_images
ggcnn		ggcnn
ggcnn_interface		ggcnn_interface
langsam_interface		langsam_interface
langsam_vision		langsam_vision
move_panda		move_panda
project_env		project_env
realsense_ros_gazebo @ f365635		realsense_ros_gazebo @ f365635
ros2_actions		ros2_actions
ros2_data		ros2_data
ros2_execution		ros2_execution
ros2_ggcnn		ros2_ggcnn
ros2_grasping		ros2_grasping
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatterArm: Large Language Model Augmented Vision-based Grasping

RBE 594 Capstone Project

Authors : Sri Lakshmi Hasitha Bachimanchi, Dheeraj Bhogisetty, Soham Shantanu Aserkar

Abstract

Video Demonstration

Approach

LangSAM

GGCNN

MoveIt2! and ROS2/Gazebo

Setup

Launch Panda Gazebo Simulation Environment

Move Panda Arm

Run GGCNN Service

Running Integrated Pipeline of ROS2/Gazebo + GGCNN + LangSAM for inference

Step 1: Run this service first

Step 2: Launch gazebo environment

Step 3: Initialize the GGCNN Service

Step 4: Launch unified launch file to load all remaining services and send prompt to LangSAM to trigger prediction

About

Releases

Packages

Contributors 3

Languages

SriHasitha/llm-grasp-capstone-docs

Folders and files

Latest commit

History

Repository files navigation

ChatterArm: Large Language Model Augmented Vision-based Grasping​

RBE 594 Capstone Project

Authors : Sri Lakshmi Hasitha Bachimanchi, Dheeraj Bhogisetty, Soham Shantanu Aserkar​

Abstract

Video Demonstration

Approach

LangSAM

GGCNN

MoveIt2! and ROS2/Gazebo

Setup

Launch Panda Gazebo Simulation Environment

Move Panda Arm

Run GGCNN Service

Running Integrated Pipeline of ROS2/Gazebo + GGCNN + LangSAM for inference

Step 1: Run this service first

Step 2: Launch gazebo environment

Step 3: Initialize the GGCNN Service

Step 4: Launch unified launch file to load all remaining services and send prompt to LangSAM to trigger prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

ChatterArm: Large Language Model Augmented Vision-based Grasping

Authors : Sri Lakshmi Hasitha Bachimanchi, Dheeraj Bhogisetty, Soham Shantanu Aserkar

Packages