With an aging global population, the need for scalable and effective elderly care solutions is becoming increasingly urgent. This project addresses the challenges of providing support to the elderly in everyday tasks such as fetching objects. The approach combines multi-modal large language model (LLM) with vision-based grasping technique and a robot manipulator to create an interactive robot. Our system allows for interaction through natural language text input, enabling the robot to recognize and manipulate objects with variations in shape and color. Results from simulation tests show that the manipulator can successfully execute tasks based on user commands, demonstrating its potential to operate effectively in real-world scenarios. The impact of this technology extends beyond individual assistance, with potential applications in inventory management, order fulfilment, and waste sorting.
We aim to integrate Language-Segment-Anything-Model(LangSAM) and Generative Grasp Convolutional Neural Network (GGCNN) into a ROS2 Service Client Framework.
The model is based on the combination of Segment Anything Model(SAM) and GroundingDINO. The goal of using this model was to have a text and image input generalized enough for facilitating communication about conveying intent for identifying region of interest within a scene.
Given a depth image, and a region of interest input from LangSAM, GGCNN uses a cropped depth map as input to predict a grasp pose.
After receiving the grasp pose, the robot moves towards the object of interest and attempts to pick it up and place it at a target location.
- Create ROS2 workspace
mkdir -p ~/llm-grasping-panda/src
- Clone repo
cd ~/llm-grasping-panda/src
git clone https://github.com/SriHasitha/llm-grasp-capstone-docs.git
git clone -b humble https://github.com/nilseuropa/realsense_ros_gazebo.git
or
cd ~/llm-grasping-panda/src
git clone [email protected]:SriHasitha/llm-grasp-capstone-docs.git
git clone -b humble https://github.com/nilseuropa/realsense_ros_gazebo.git
- Build packages
cd ~/llm-grasping-panda
colcon build
- Source packages
source install/setup.bash
- Launch only Gazebo sim:
ros2 launch franka_env panda_simulation.launch.py
- Launch Gazebo + MoveIt!2 Environment
ros2 launch frankaproject_env panda.launch.py
- Launch Gazebo + MoveIt!2 Environment + ROS2 Robot Triggers/Actions
ros2 launch frankaproject_env panda_interface.launch.py
Action client sends desired end-effector pose as goal to the /MoveXYZW action
ros2 run move_panda move_panda_client
Initialize the GGCNN Service
ros2 run ros2_ggcnn ggcnn_service
Call the GGCNN service to predict grasp pose
ros2 service call /grasp_prediction ggcnn_interface/srv/GraspPrediction
ros2 run langsam_vision vision_node
ros2 launch frankaproject_env panda_interface.launch.py
ros2 run ros2_ggcnn ggcnn_service
Step 4: Launch unified launch file to load all remaining services and send prompt to LangSAM to trigger prediction
ros2 run move_panda move_panda_client --ros-args -p prompt:'Pick up the green ball'