Agentic Learning Powered by AWorld

arXiv(V2P) ｜ arXiv(RAG-R1) ｜ arXiv(FunReason) ｜ arXiv(EnvTuning)｜ arXiv(FunReason-MT)

🤗 Paper(V2P) ｜ 🤗 Paper(RAG-R1) ｜ 🤗 Paper(FunReason) ｜ 🤗 Paper(EnvTuning)｜ 🤗 Paper(FunReason-MT)

📣 News

[2025/10/29] 🔥🔥🔥FunReason-MT We propose FunReason-MT, a novel data synthesis framework designed to address critical bottlenecks in multi-turn Function Calling (FC) data generation, achieving excellent performance in complex agentic tasks.

[2025/10/22] 🔥🔥🔥EnvTuning We propose Environment Tuning, a novel training paradigm that enables agents to learn complex multi-turn tool use behaviors through environmental interaction rather than trajectory imitation, achieving significant improvements with only 400 training samples.

[2025/08/19] 🔥🔥🔥V2P We propose V2P, a novel training method for multi-modal models that enables coordinate-free, human-like visual GUI Grounding.

[2025/07/01] 🔥🔥🔥RAG-R1 We propose RAG-R1, a deepsearch training framework that incentivizing the search and reasoning capabilities of LLMs through multi-query parallelism.

[2025/05/16] 🔥🔥🔥FunReason We propose FunReason, a novel framework that enhances LLMs' function calling capabilities through an automated data refinement strategy and a Self-Refinement Multiscale Loss approach.

📖 Introduction

AWorld-RL is a comprehensive collection of cutting-edge agentic reinforcement learning algorithms developed by the AWorld Team. Built upon the AWorld Framework, this repository provides complete codebases, datasets, and checkpoints for training and evaluating autonomous agents that learn through multi-turn interactions with dynamic environments.

Our work focuses on enabling agents to effectively leverage environmental feedback for complex problem-solving across diverse domains, including multi-modal understanding, deep search, and function calling.

🚀 Projects

FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling
Authors: Zengzhuang Xu, Bingguang Hao, Zechuan Wang et al.

Don't Just Fine-tune the Agent, Tune the Environment
Authors: Siyuan Lu, Zechuan Wang, Hongxuan Zhang, Qintong Wu, Leilei Gan, Chenyi Zhuang, Jinjie Gu, Tao Lin

V2P: From Background Suppression to Center Peaking for Robust GUI Grounding
Authors: Jikai Chen, Long Chen, Dong Wang, Leilei Gan, Chenyi Zhuang, Jinjie Gu

RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
Authors: Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu

FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
Authors: Bingguang Hao, Maolin Wang, Zengzhuang Xu, Cunyin Peng, Yicheng Chen, Xiangyu Zhao, Jinjie Gu, Chenyi Zhuang

📚 Overview

Multi-Modal

V2P

Tools: PyAutoGUI Tools
LLM: Qwen2.5-7b-instruct

Overall framework of V2P.

Performance on both ScreenSpot-v2 (left) and ScreenSpot-Pro (right).

Deepsearch

RAG-R1

Tools: Search Engines (offline or online)
LLM: Qwen2.5-7b-instruct

Overall framework of RAG-R1.

Performance comparisons on QA benchmarks under the EM metric. The best and second best results are bold and underlined, respectively.

Tool Use

FunReason-MT

Tools: Multi-turn Tool Use (BFCLv3 Benchmark)
LLM: Qwen3-4b-Instruct-2507

Key Highlights:

State-of-the-Art Performance: A 4B model trained on FunReason-MT data achieves state-of-the-art results among similarly sized models on the Berkeley Function-Calling Leaderboard (BFCLv3) Multi-Turn benchmark.
Closed-Source Model Outperformance: The FunReason-MT RL-trained 4B model surpasses most leading closed-source models (e.g., GPT-5, Gemini-2.5-Pro, Claude-Sonnet-4) and open-source models (e.g., DeepSeek-R1) in Multi-Turn evaluation.
Robust Framework: The solution addresses three structural deficiencies in data generation: Targeted Model Training, Isolation of Tool Architecture, and Multi-Turn Logical Dependency.
Agentic Generalization: The model demonstrates promising out-of-distribution generalization and improved agentic capability on the BFCLv4 benchmark (Web Search and Memory tasks).

🔬 Methodology: The FunReason-MT Framework

The framework tackles complexity and reliability challenges by breaking the data generation process into three core phases:

Phase	Core Component	Challenge Addressed	Description
Phase I	Environment-API Graph Interactions	Targeted Model Training	Samples tool calls using a Directed Sampler to efficiently collect multi-turn trajectories centered around a target complex tool ($T_a$).
Phase II	Advanced Tool-Query Synthesis	Isolation of Tool Architecture	A Tooling Agent abstracts the multi-step execution trace into a single Advanced Tool ($T_{adv}$). A Querying Agent then reverse-engineers a challenging Hard Query ($Q_{hard}$) requiring this abstraction.
Phase III	Guided Iterative Chain	Multi-Turn Logical Dependency	A Reasoning Agent attempts to solve $Q_{hard}$. A Critiquing Agent analyzes failures and provides targeted, corrective feedback, creating an iterative self-correction loop to enforce CoT accuracy.

📈 Experimental Results (BFCL Leaderboard)

The model achieves state-of-the-art performance, particularly after applying Reinforcement Learning (RL) on the synthesized data.

BFCLv3 Multi-Turn and Single-Turn Performance

Model (4B - 235B)	Multi-Turn (Overall)	Single-Turn (Overall)
Qwen3-4B-Instruct (Base)	15.75	78.19
Qwen3-4B + FunReason-MT (RL)	56.50	85.02
Claude-Sonnet-4-20250514	54.75	84.72
DeepSeek-R1-0528	44.50	78.22
GPT-4o-2024-11-20	42.50	77.21

BFCL Agentic Evaluation (BFCLv4 OOD)

The FunReason-MT trained model leads in out-of-distribution agentic tasks (Web Search and Memory).

Model	BFCLv4 Overall Score
FunReason-MT-4B (RL)	15.10
ToolACE-2-8B	14.83
BitAgent-8B	8.24
XLAM-2-3b-fc-r	7.42
watt-tool-8B	6.30

Environment Tuning

Tools: Multi-turn Tool Use (BFCL Benchmark)
LLM: Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, watt-tool-8B

Training agents for complex multi-turn tool use tasks faces critical challenges: extreme scarcity of high-quality training data, overfitting with supervised fine-tuning (SFT) on synthetic data, and cold-start problems with training instability in standard reinforcement learning approaches. Environment Tuning addresses these challenges through a novel training paradigm that enables agents to learn complex behaviors through environmental interaction rather than trajectory imitation, even with minimal data.

Limitations of existing paradigms (SFT overfitting and standard RL cold-start) and the advantages of Environment Tuning approach.

Four-stage curriculum learning pipeline with actionable environment augmentation and fine-grained progress rewards.

With only 400 training samples, Environment Tuning achieves significant improvements on BFCL V3.

FunReason

Tools: Real Human Function calling (BFCLv2 live&non-live)
LLM: Qwen2.5-7b-Coder-instruct

FunReason is a framework designed to enhance LLMs' function calling capabilities, achieving GPT-4o-comparable performance on BFCL, surpassing RL-based methods, mitigating catastrophic forgetting on HumanEval and MBPP, and using a data refinement strategy where natural CoT data outperforms artificial ones.

Data refinement pipeline of FunReason.

Overview of FunReason's data refinement pipeline. The pipeline consists of five stages: Function Call Classification, Query and Tool Identification, CoT Identification, Function and Parameter Identification, and Format Identification. Each stage ensures specific aspects of data quality, with failing examples either being discarded or regenerated.

Performance of FunReason.

Citation

Please cite our repo if our works are helpful for your research.

@article{xu2025funreason,
  title={FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling},
  author={Zengzhuang Xu and Bingguang Hao and Zechuan Wang and Yuntao Wen and Maolin Wang and Yang Liu and Long Chen and Dong Wang and Yicheng Chen and Cunyin Peng and Chenyi Zhuang and Jinjie Gu and Xiangyu Zhao and Shi Gu},
  journal={arXiv preprint arXiv:2510.24645},
  year={2025}
}

@article{lu2025don,
  title={Don't Just Fine-tune the Agent, Tune the Environment},
  author={Lu, Siyuan and Wang, Zechuan and Zhang, Hongxuan and Wu, Qintong and Gan, Leilei and Zhuang, Chenyi and Gu, Jinjie and Lin, Tao},
  journal={arXiv preprint arXiv:2510.10197},
  year={2025}
}

@article{chen2025v2p,
  title={V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task},
  author={Chen, Jikai and Chen, Long and Wang, Dong and Gan, Leilei and Zhuang, Chenyi and Gu, Jinjie},
  journal={arXiv preprint arXiv:2508.13634},
  year={2025}
}

@article{tan2025rag,
  title={RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism},
  author={Tan, Zhiwen and Huang, Jiaming and Wu, Qintong and Zhang, Hongxuan and Zhuang, Chenyi and Gu, Jinjie},
  journal={arXiv preprint arXiv:2507.02962},
  year={2025}
}

@article{hao2025funreason,
  title={FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement},
  author={Hao, Bingguang and Wang, Maolin and Xu, Zengzhuang and Peng, Cunyin and Chen, Yicheng and Zhao, Xiangyu and Gu, Jinjie and Zhuang, Chenyi},
  journal={arXiv preprint arXiv:2505.20192},
  year={2025}
}

📞 Contact

For any question or feedback, please reach out to us at [email protected] or [email protected]

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
EnvTuning		EnvTuning
FunReason-MT		FunReason-MT
FunReason/assets		FunReason/assets
RAG-R1		RAG-R1
V2P		V2P
assets		assets
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic Learning Powered by AWorld

📣 News

📖 Introduction

🚀 Projects

📚 Overview