Skip to content

CognitiveAISystems/SuperIgor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

SuperIgor (Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning)

SuperIgor is a framework for instruction-following tasks that combines large language models (LLMs) with goal-conditional reinforcement learning (RL). Unlike prior approaches that depend on predefined subtasks, SuperIgor enables an LLM to generate and refine high-level plans through self-learning. These plans guide an RL agent, which in turn provides feedback to improve future plan generation.

This forms an iterative feedback loop:

  • 📋 Plan Generation – The LLM proposes multiple structured action plans for each instruction.
  • 🎮 Policy Learning – The RL agent (trained with PPO) learns to execute these plans.
  • Plan Validation – Candidate plans are evaluated based on execution success.
  • 🔁 LLM Fine-Tuning – The LLM is refined with Direct Preference Optimization (DPO), aligning plan scores with the agent’s actual performance.

✨ Key Features

  • 🔹 Self-supervised plan generation — no manual annotation required
  • 🔹 Curriculum training to overcome sparse rewards
  • 🔹 Strong generalization to unseen or paraphrased instructions
  • 🔹 Implemented in the CrafText benchmark (Minecraft-like environment)

Agent Trajectories examples

Trajectory #1 Trajectory #2 Trajectory #3
Trajectory 1 Trajectory 2 Trajectory 3

🧱 Environment Setup

# 1) Create conda environment
conda env create -f environment.yml
conda activate super-igor

# 2) Install CrafText from source
git clone https://github.com/AIRI-Institute/CrafText.git
pip install -e ./CrafText

# 3) Install SuperIgor package
pip install -e .

🚀 Running Super IGOR Experiments

Follow these steps to run experiments with SuperIgor:

1. ▶️ Run dataset generation

bash ./super_scripts/sh_scripts/train/generate_skills.sh    

2. ▶️ Run policy training

bash ./super_scripts/sh_scripts/train/run_curriculum.sh 

3. ▶️ Run validation

  • Evaluate generated plans (one instruction → one plan):
bash ./super_scripts/sh_scripts/train/run_validation.sh

4. ▶️ Run LLM training

  • For DPO training: add --optimizer_type dpo_external
  • For SFT training: add --optimizer_type sft
bash ./super_scripts/sh_scripts/train/run_llm_training.sh
  • For DPO dataset generation:
bash ./super_scripts/sh_scripts/train/make_dpo_dataset.sh

5. ▶️ Run NLL scoring

bash super_scripts//sh_scripts/train/run_nll_eval.sh 

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors