- ROS2(galactic)
- robomimi-1.3
- robosuite-offline
- stable-baselines3
We provide a docker image xizobu/galactic:3.0
- Effect of Noise Angle
IQ-Learn: baseline algorithm;
IQ-Learn (filter): Just filtering noise without using confidence, it becomes IQ-Learn when θn is set to 180°;
CIQL-E: Just filtering noise and using confidence;
CIQL-A: Penalizing noise and using confidence.
Ranking of algorithm performance: CIQL-A (40.3%) > CIQL-E (30.1%) > CIQL (filter, 26.8%) > IQ-Learn.
Compared to simply filtering noise, implementing fine-grained confidence assessment on the demonstration data can effectively enhance the performance of the algorithm. Additionally, penalizing noise is also superior to straightforward noise filtering.
- Recovering environment rewards
Reward function recovered by CIQL-A aligns more closely with human intent.
Evaluation and penalization of noise in the data are more aligned with human intentions compared to strategies trained with simple noise filtering.
- Performance of CIQLs and IQ-Learns
IQ-Learn: success rate of the task is very low;
IQ-Learn(filter): there are multiple cases where the robotic arm flies directly in a messy manner;
CIQL-Expert: the decision time is long and the grasping is not decisive enough;
CIQL-Agent: the decision time is short and the grasping is decisive.

%20.gif)
(a)Performance of IQ-Learn (b)Performance of IQ-Learn(filter)
(c)Performance of CIQL-E (d)Performance of CIQL-A
- Noise filtering visualization of two human datasets, better and worse
After filtering out the cluttered trajectories, an organized trend emerges.
Fine-grained confidence scores can be provided for human demonstration data without the need for active supervision signals from humans, a true reward function from the environment, or strict assumptions about noise.
Thouge applied only in the linear grasping task , our method greatly enhance the success rate of the grasping task in multi-tasks,thereby improving the success rate of the entire task.
- Activate Omgea.x device
-
Compile by using
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release --symlink-install
Note that using the local python environment rather than the conda -
Initialize Omega.x by running
./HapticDesk
in a terminal under the file path, Demonstrations/ws_forcedimension/src/forcedimension_ros2/fd_hardware/external/fd_sdk/bin -
Open two terminals on ws_forcedimension files and source the workspace
source install/setup.bash
-
Running the driver in one terminal
ros2 launch fd_bringup fd.launch.py
-
Publish end position data in another terminal
ros2 run tcp_socket ee_topic_sub
-
Start a demonstration task
refer to /root/RoboLearn/Demonstrations/launch/run.sh
python collect_human_demonstrations.py --robots IIWA --environment Lift --device omega
-
Merge demonstrations (demo = demo1 + demo2 ...)
python demonstration_merge.py --merge_directory collect_demonstration/Lift/IIWA_OSC_POSE
-
Converted data (demo -> pkl)
python demonstration_transition.py --dataset_type robosuite_demo.hdf5 --output_dir iqlearn_demonstrations --dataset_path Lift/IIWA_OSC_POSE
-
Train CIQL Agent Refer to /root/RoboLearn/Confidence-based-IQ-Learn/run_confidence.sh
IQ-Learn(IQ), CIQL-A(max_lamb) and CIQL-E(conf_expert)
python train_iq_dyrank.py env=robosuite_Lift_IIWA env.demo=robosuite_Lift_IIWA_better_worse_failed_90.pkl agent=sac agent.actor_lr=5e-06 agent.critic_lr=5e-06 agent.init_temp=0.001 expert.demos=90 seed=1 train.boundary_angle=30 C_aware.conf_learn=max_lamb
-
Evalute CIQL Agent
python test_iq_dyrank.py env=robosuite_Lift_IIWA agent=sac env.has_renderer=False eval.policy=xxx
-
Evalute Demonstrations using reward function recovered by CIQL
python test_iq_reward.py env=robosuite_Lift_IIWA env.demo=robosuite_Lift_IIWA_50.pkl expert.demos=50 agent=sac eval.policy=xxx
Thanks to the authors of IQ-Learn for their work and sharing!
The code structure is based on the repo IQ-Learn.
Details of the Omega device driver with its ROS2 workspace can be found in the ICube-Robotics/forcedimension_ros2.