Skip to content

inclusionAI/AWorld-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

70 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agentic Learning Powered by AWorld LogoAWorld

arXiv(V2P) | arXiv(RAG-R1) | arXiv(FunReason) | arXiv(EnvTuning)| arXiv(FunReason-MT)

πŸ€— Paper(V2P) | πŸ€— Paper(RAG-R1) | πŸ€— Paper(FunReason) | πŸ€— Paper(EnvTuning)| πŸ€— Paper(FunReason-MT)

EnvTuning

πŸ“£ News

[2025/10/29] πŸ”₯πŸ”₯πŸ”₯FunReason-MT We propose FunReason-MT, a novel data synthesis framework designed to address critical bottlenecks in multi-turn Function Calling (FC) data generation, achieving excellent performance in complex agentic tasks.

[2025/10/22] πŸ”₯πŸ”₯πŸ”₯EnvTuning We propose Environment Tuning, a novel training paradigm that enables agents to learn complex multi-turn tool use behaviors through environmental interaction rather than trajectory imitation, achieving significant improvements with only 400 training samples.

[2025/08/19] πŸ”₯πŸ”₯πŸ”₯V2P We propose V2P, a novel training method for multi-modal models that enables coordinate-free, human-like visual GUI Grounding.

[2025/07/01] πŸ”₯πŸ”₯πŸ”₯RAG-R1 We propose RAG-R1, a deepsearch training framework that incentivizing the search and reasoning capabilities of LLMs through multi-query parallelism.

[2025/05/16] πŸ”₯πŸ”₯πŸ”₯FunReason We propose FunReason, a novel framework that enhances LLMs' function calling capabilities through an automated data refinement strategy and a Self-Refinement Multiscale Loss approach.

πŸ“– Introduction

AWorld-RL is a comprehensive collection of cutting-edge agentic reinforcement learning algorithms developed by the AWorld Team. Built upon the AWorld Framework, this repository provides complete codebases, datasets, and checkpoints for training and evaluating autonomous agents that learn through multi-turn interactions with dynamic environments.

Our work focuses on enabling agents to effectively leverage environmental feedback for complex problem-solving across diverse domains, including multi-modal understanding, deep search, and function calling.

AgenticLearning Framework

πŸš€ Projects

FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling
Authors: Zengzhuang Xu, Bingguang Hao, Zechuan Wang et al. arXiv ModelDataset

Don't Just Fine-tune the Agent, Tune the Environment
Authors: Siyuan Lu, Zechuan Wang, Hongxuan Zhang, Qintong Wu, Leilei Gan, Chenyi Zhuang, Jinjie Gu, Tao Lin
arXiv Model

V2P: From Background Suppression to Center Peaking for Robust GUI Grounding
Authors: Jikai Chen, Long Chen, Dong Wang, Leilei Gan, Chenyi Zhuang, Jinjie Gu
arXiv Paper Model

RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
Authors: Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu
arXiv Model

FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
Authors: Bingguang Hao, Maolin Wang, Zengzhuang Xu, Cunyin Peng, Yicheng Chen, Xiangyu Zhao, Jinjie Gu, Chenyi Zhuang
arXiv Model

πŸ“š Overview

Table of Contents

Multi-Modal

  • Tools: PyAutoGUI Tools
  • LLM: Qwen2.5-7b-instruct
V2P-framework

Overall framework of V2P.

V2P-result

Performance on both ScreenSpot-v2 (left) and ScreenSpot-Pro (right).

Deepsearch

  • Tools: Search Engines (offline or online)
  • LLM: Qwen2.5-7b-instruct
RAG-R1-framework

Overall framework of RAG-R1.

RAG-R1-result

Performance comparisons on QA benchmarks under the EM metric. The best and second best results are bold and underlined, respectively.

Tool Use

  • Tools: Multi-turn Tool Use (BFCLv3 Benchmark)
  • LLM: Qwen3-4b-Instruct-2507
Key Highlights:
  • State-of-the-Art Performance: A 4B model trained on FunReason-MT data achieves state-of-the-art results among similarly sized models on the Berkeley Function-Calling Leaderboard (BFCLv3) Multi-Turn benchmark.
  • Closed-Source Model Outperformance: The FunReason-MT RL-trained 4B model surpasses most leading closed-source models (e.g., GPT-5, Gemini-2.5-Pro, Claude-Sonnet-4) and open-source models (e.g., DeepSeek-R1) in Multi-Turn evaluation.
  • Robust Framework: The solution addresses three structural deficiencies in data generation: Targeted Model Training, Isolation of Tool Architecture, and Multi-Turn Logical Dependency.
  • Agentic Generalization: The model demonstrates promising out-of-distribution generalization and improved agentic capability on the BFCLv4 benchmark (Web Search and Memory tasks).

πŸ”¬ Methodology: The FunReason-MT Framework

The framework tackles complexity and reliability challenges by breaking the data generation process into three core phases:

Phase Core Component Challenge Addressed Description
Phase I Environment-API Graph Interactions Targeted Model Training Samples tool calls using a Directed Sampler to efficiently collect multi-turn trajectories centered around a target complex tool ($T_a$).
Phase II Advanced Tool-Query Synthesis Isolation of Tool Architecture A Tooling Agent abstracts the multi-step execution trace into a single Advanced Tool ($T_{adv}$). A Querying Agent then reverse-engineers a challenging Hard Query ($Q_{hard}$) requiring this abstraction.
Phase III Guided Iterative Chain Multi-Turn Logical Dependency A Reasoning Agent attempts to solve $Q_{hard}$. A Critiquing Agent analyzes failures and provides targeted, corrective feedback, creating an iterative self-correction loop to enforce CoT accuracy.
FunReason-MT-Pipeline

πŸ“ˆ Experimental Results (BFCL Leaderboard)

The model achieves state-of-the-art performance, particularly after applying Reinforcement Learning (RL) on the synthesized data.

BFCLv3 Multi-Turn and Single-Turn Performance
Model (4B - 235B) Multi-Turn (Overall) Single-Turn (Overall)
Qwen3-4B-Instruct (Base) 15.75 78.19
Qwen3-4B + FunReason-MT (RL) 56.50 85.02
Claude-Sonnet-4-20250514 54.75 84.72
DeepSeek-R1-0528 44.50 78.22
GPT-4o-2024-11-20 42.50 77.21
BFCL Agentic Evaluation (BFCLv4 OOD)

The FunReason-MT trained model leads in out-of-distribution agentic tasks (Web Search and Memory).

Model BFCLv4 Overall Score
FunReason-MT-4B (RL) 15.10
ToolACE-2-8B 14.83
BitAgent-8B 8.24
XLAM-2-3b-fc-r 7.42
watt-tool-8B 6.30

  • Tools: Multi-turn Tool Use (BFCL Benchmark)
  • LLM: Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, watt-tool-8B

Training agents for complex multi-turn tool use tasks faces critical challenges: extreme scarcity of high-quality training data, overfitting with supervised fine-tuning (SFT) on synthetic data, and cold-start problems with training instability in standard reinforcement learning approaches. Environment Tuning addresses these challenges through a novel training paradigm that enables agents to learn complex behaviors through environmental interaction rather than trajectory imitation, even with minimal data.

EnvTuning-introduction

Limitations of existing paradigms (SFT overfitting and standard RL cold-start) and the advantages of Environment Tuning approach.

EnvTuning-pipeline

Four-stage curriculum learning pipeline with actionable environment augmentation and fine-grained progress rewards.

EnvTuning-results

With only 400 training samples, Environment Tuning achieves significant improvements on BFCL V3.

  • Tools: Real Human Function calling (BFCLv2 live&non-live)
  • LLM: Qwen2.5-7b-Coder-instruct

FunReason is a framework designed to enhance LLMs' function calling capabilities, achieving GPT-4o-comparable performance on BFCL, surpassing RL-based methods, mitigating catastrophic forgetting on HumanEval and MBPP, and using a data refinement strategy where natural CoT data outperforms artificial ones.

FunReason-Performance

Data refinement pipeline of FunReason.

Overview of FunReason's data refinement pipeline. The pipeline consists of five stages: Function Call Classification, Query and Tool Identification, CoT Identification, Function and Parameter Identification, and Format Identification. Each stage ensures specific aspects of data quality, with failing examples either being discarded or regenerated.

FunReason-Performance

Performance of FunReason.

Citation

Please cite our repo if our works are helpful for your research.

@article{xu2025funreason,
  title={FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling},
  author={Zengzhuang Xu and Bingguang Hao and Zechuan Wang and Yuntao Wen and Maolin Wang and Yang Liu and Long Chen and Dong Wang and Yicheng Chen and Cunyin Peng and Chenyi Zhuang and Jinjie Gu and Xiangyu Zhao and Shi Gu},
  journal={arXiv preprint arXiv:2510.24645},
  year={2025}
}

@article{lu2025don,
  title={Don't Just Fine-tune the Agent, Tune the Environment},
  author={Lu, Siyuan and Wang, Zechuan and Zhang, Hongxuan and Wu, Qintong and Gan, Leilei and Zhuang, Chenyi and Gu, Jinjie and Lin, Tao},
  journal={arXiv preprint arXiv:2510.10197},
  year={2025}
}

@article{chen2025v2p,
  title={V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task},
  author={Chen, Jikai and Chen, Long and Wang, Dong and Gan, Leilei and Zhuang, Chenyi and Gu, Jinjie},
  journal={arXiv preprint arXiv:2508.13634},
  year={2025}
}

@article{tan2025rag,
  title={RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism},
  author={Tan, Zhiwen and Huang, Jiaming and Wu, Qintong and Zhang, Hongxuan and Zhuang, Chenyi and Gu, Jinjie},
  journal={arXiv preprint arXiv:2507.02962},
  year={2025}
}

@article{hao2025funreason,
  title={FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement},
  author={Hao, Bingguang and Wang, Maolin and Xu, Zengzhuang and Peng, Cunyin and Chen, Yicheng and Zhao, Xiangyu and Gu, Jinjie and Zhuang, Chenyi},
  journal={arXiv preprint arXiv:2505.20192},
  year={2025}
}

πŸ“ž Contact

For any question or feedback, please reach out to us at [email protected] or [email protected]

License

This project is licensed under the MIT License.

About

Agentic Learning Powered by AWorld

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5