Dependencies

ROS2(galactic)
robomimi-1.3
robosuite-offline
stable-baselines3

We provide a docker image xizobu/galactic:3.0

Result

Different Noise Angles and Datasets

Effect of Noise Angle
  IQ-Learn: baseline algorithm;
  IQ-Learn (filter): Just filtering noise without using confidence, it becomes IQ-Learn when θn is set to 180°;
  CIQL-E: Just filtering noise and using confidence;
  CIQL-A: Penalizing noise and using confidence.
  Ranking of algorithm performance: CIQL-A (40.3%) > CIQL-E (30.1%) > CIQL (filter, 26.8%) > IQ-Learn.
  Compared to simply filtering noise, implementing fine-grained confidence assessment on the demonstration data can effectively   enhance the performance of the algorithm. Additionally, penalizing noise is also superior to straightforward noise filtering.

CIQL Evaluation

Recovering environment rewards
Reward function recovered by CIQL-A aligns more closely with human intent.
Evaluation and penalization of noise in the data are more aligned with human intentions compared to strategies trained with simple noise filtering.

(a) CIQL-A’s recovered reward (b) CIQL-E’s recovered reward

Performance of CIQLs and IQ-Learns
  IQ-Learn: success rate of the task is very low;
  IQ-Learn(filter): there are multiple cases where the robotic arm flies directly in a messy manner;
  CIQL-Expert: the decision time is long and the grasping is not decisive enough;
  CIQL-Agent: the decision time is short and the grasping is decisive.

(a)Performance of IQ-Learn (b)Performance of IQ-Learn(filter)

(c)Performance of CIQL-E (d)Performance of CIQL-A

Demonstrations Evaluation

Noise filtering visualization of two human datasets, better and worse
After filtering out the cluttered trajectories, an organized trend emerges.
Fine-grained confidence scores can be provided for human demonstration data without the need for active supervision signals from humans, a true reward function from the environment, or strict assumptions about noise.

Multi-task Testing

Thouge applied only in the linear grasping task , our method greatly enhance the success rate of the grasping task in multi-tasks,thereby improving the success rate of the entire task.

Run

Collect demonstartions

Activate Omgea.x device

Compile by using
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release --symlink-install
Note that using the local python environment rather than the conda
Initialize Omega.x by running ./HapticDesk in a terminal under the file path, Demonstrations/ws_forcedimension/src/forcedimension_ros2/fd_hardware/external/fd_sdk/bin
Open two terminals on ws_forcedimension files and source the workspace
source install/setup.bash
Running the driver in one terminal
ros2 launch fd_bringup fd.launch.py
Publish end position data in another terminal
ros2 run tcp_socket ee_topic_sub

Start a demonstration task
refer to /root/RoboLearn/Demonstrations/launch/run.sh
python collect_human_demonstrations.py --robots IIWA --environment Lift --device omega
Merge demonstrations (demo = demo1 + demo2 ...)
python demonstration_merge.py --merge_directory collect_demonstration/Lift/IIWA_OSC_POSE
Converted data （demo -> pkl）
python demonstration_transition.py --dataset_type robosuite_demo.hdf5 --output_dir iqlearn_demonstrations --dataset_path Lift/IIWA_OSC_POSE

Train and Evalute Agent

Train CIQL Agent Refer to /root/RoboLearn/Confidence-based-IQ-Learn/run_confidence.sh
IQ-Learn(IQ), CIQL-A(max_lamb) and CIQL-E(conf_expert)
python train_iq_dyrank.py env=robosuite_Lift_IIWA env.demo=robosuite_Lift_IIWA_better_worse_failed_90.pkl agent=sac agent.actor_lr=5e-06 agent.critic_lr=5e-06 agent.init_temp=0.001 expert.demos=90 seed=1 train.boundary_angle=30 C_aware.conf_learn=max_lamb
Evalute CIQL Agent
python test_iq_dyrank.py env=robosuite_Lift_IIWA agent=sac env.has_renderer=False eval.policy=xxx
Evalute Demonstrations using reward function recovered by CIQL
python test_iq_reward.py env=robosuite_Lift_IIWA env.demo=robosuite_Lift_IIWA_50.pkl expert.demos=50 agent=sac eval.policy=xxx

Acknowledegement

Thanks to the authors of IQ-Learn for their work and sharing!
The code structure is based on the repo IQ-Learn.
Details of the Omega device driver with its ROS2 workspace can be found in the ICube-Robotics/forcedimension_ros2.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
Confidence-based-IQ-Learn		Confidence-based-IQ-Learn
Demonstrations		Demonstrations
Packages		Packages
README.md		README.md
docker_run.sh		docker_run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Result

Different Noise Angles and Datasets

CIQL Evaluation

Demonstrations Evaluation

Multi-task Testing

Run

Collect demonstartions

Train and Evalute Agent

Acknowledegement

About

Releases

Packages

Languages

XizoB/CIQL

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Result

Different Noise Angles and Datasets

CIQL Evaluation

Demonstrations Evaluation

Multi-task Testing

Run

Collect demonstartions

Train and Evalute Agent

Acknowledegement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages