Agentic Learning Powered by
AWorld
arXiv(V2P) ο½
arXiv(RAG-R1) ο½
arXiv(FunReason) ο½
arXiv(EnvTuning)ο½
arXiv(FunReason-MT)
π€ Paper(V2P) ο½ π€ Paper(RAG-R1) ο½ π€ Paper(FunReason) ο½ π€ Paper(EnvTuning)ο½ π€ Paper(FunReason-MT)
[2025/10/29] π₯π₯π₯FunReason-MT We propose FunReason-MT, a novel data synthesis framework designed to address critical bottlenecks in multi-turn Function Calling (FC) data generation, achieving excellent performance in complex agentic tasks.
[2025/10/22] π₯π₯π₯EnvTuning We propose Environment Tuning, a novel training paradigm that enables agents to learn complex multi-turn tool use behaviors through environmental interaction rather than trajectory imitation, achieving significant improvements with only 400 training samples.
[2025/08/19] π₯π₯π₯V2P We propose V2P, a novel training method for multi-modal models that enables coordinate-free, human-like visual GUI Grounding.
[2025/07/01] π₯π₯π₯RAG-R1 We propose RAG-R1, a deepsearch training framework that incentivizing the search and reasoning capabilities of LLMs through multi-query parallelism.
[2025/05/16] π₯π₯π₯FunReason We propose FunReason, a novel framework that enhances LLMs' function calling capabilities through an automated data refinement strategy and a Self-Refinement Multiscale Loss approach.
AWorld-RL is a comprehensive collection of cutting-edge agentic reinforcement learning algorithms developed by the AWorld Team. Built upon the AWorld Framework, this repository provides complete codebases, datasets, and checkpoints for training and evaluating autonomous agents that learn through multi-turn interactions with dynamic environments.
Our work focuses on enabling agents to effectively leverage environmental feedback for complex problem-solving across diverse domains, including multi-modal understanding, deep search, and function calling.
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling
Authors: Zengzhuang Xu, Bingguang Hao, Zechuan Wang et al.
Don't Just Fine-tune the Agent, Tune the Environment
Authors: Siyuan Lu, Zechuan Wang, Hongxuan Zhang, Qintong Wu, Leilei Gan, Chenyi Zhuang, Jinjie Gu, Tao Lin
V2P: From Background Suppression to Center Peaking for Robust GUI Grounding
Authors: Jikai Chen, Long Chen, Dong Wang, Leilei Gan, Chenyi Zhuang, Jinjie Gu
RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
Authors: Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
Authors: Bingguang Hao, Maolin Wang, Zengzhuang Xu, Cunyin Peng, Yicheng Chen, Xiangyu Zhao, Jinjie Gu, Chenyi Zhuang
- Tools: PyAutoGUI Tools
- LLM: Qwen2.5-7b-instruct
- Tools: Search Engines (offline or online)
- LLM: Qwen2.5-7b-instruct
Performance comparisons on QA benchmarks under the EM metric. The best and second best results are bold and underlined, respectively.
- Tools: Multi-turn Tool Use (BFCLv3 Benchmark)
- LLM: Qwen3-4b-Instruct-2507
- State-of-the-Art Performance: A 4B model trained on FunReason-MT data achieves state-of-the-art results among similarly sized models on the Berkeley Function-Calling Leaderboard (BFCLv3) Multi-Turn benchmark.
- Closed-Source Model Outperformance: The FunReason-MT RL-trained 4B model surpasses most leading closed-source models (e.g., GPT-5, Gemini-2.5-Pro, Claude-Sonnet-4) and open-source models (e.g., DeepSeek-R1) in Multi-Turn evaluation.
- Robust Framework: The solution addresses three structural deficiencies in data generation: Targeted Model Training, Isolation of Tool Architecture, and Multi-Turn Logical Dependency.
- Agentic Generalization: The model demonstrates promising out-of-distribution generalization and improved agentic capability on the BFCLv4 benchmark (Web Search and Memory tasks).
The framework tackles complexity and reliability challenges by breaking the data generation process into three core phases:
| Phase | Core Component | Challenge Addressed | Description |
|---|---|---|---|
| Phase I | Environment-API Graph Interactions | Targeted Model Training | Samples tool calls using a Directed Sampler to efficiently collect multi-turn trajectories centered around a target complex tool ( |
| Phase II | Advanced Tool-Query Synthesis | Isolation of Tool Architecture | A Tooling Agent abstracts the multi-step execution trace into a single Advanced Tool ( |
| Phase III | Guided Iterative Chain | Multi-Turn Logical Dependency | A Reasoning Agent attempts to solve |
The model achieves state-of-the-art performance, particularly after applying Reinforcement Learning (RL) on the synthesized data.
| Model (4B - 235B) | Multi-Turn (Overall) | Single-Turn (Overall) |
|---|---|---|
| Qwen3-4B-Instruct (Base) | 15.75 | 78.19 |
| Qwen3-4B + FunReason-MT (RL) | 56.50 | 85.02 |
| Claude-Sonnet-4-20250514 | 54.75 | 84.72 |
| DeepSeek-R1-0528 | 44.50 | 78.22 |
| GPT-4o-2024-11-20 | 42.50 | 77.21 |
The FunReason-MT trained model leads in out-of-distribution agentic tasks (Web Search and Memory).
| Model | BFCLv4 Overall Score |
|---|---|
| FunReason-MT-4B (RL) | 15.10 |
| ToolACE-2-8B | 14.83 |
| BitAgent-8B | 8.24 |
| XLAM-2-3b-fc-r | 7.42 |
| watt-tool-8B | 6.30 |
- Tools: Multi-turn Tool Use (BFCL Benchmark)
- LLM: Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, watt-tool-8B
Training agents for complex multi-turn tool use tasks faces critical challenges: extreme scarcity of high-quality training data, overfitting with supervised fine-tuning (SFT) on synthetic data, and cold-start problems with training instability in standard reinforcement learning approaches. Environment Tuning addresses these challenges through a novel training paradigm that enables agents to learn complex behaviors through environmental interaction rather than trajectory imitation, even with minimal data.
Limitations of existing paradigms (SFT overfitting and standard RL cold-start) and the advantages of Environment Tuning approach.
Four-stage curriculum learning pipeline with actionable environment augmentation and fine-grained progress rewards.
- Tools: Real Human Function calling (BFCLv2 live&non-live)
- LLM: Qwen2.5-7b-Coder-instruct
FunReason is a framework designed to enhance LLMs' function calling capabilities, achieving GPT-4o-comparable performance on BFCL, surpassing RL-based methods, mitigating catastrophic forgetting on HumanEval and MBPP, and using a data refinement strategy where natural CoT data outperforms artificial ones.
Overview of FunReason's data refinement pipeline. The pipeline consists of five stages: Function Call Classification, Query and Tool Identification, CoT Identification, Function and Parameter Identification, and Format Identification. Each stage ensures specific aspects of data quality, with failing examples either being discarded or regenerated.
Please cite our repo if our works are helpful for your research.
@article{xu2025funreason,
title={FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling},
author={Zengzhuang Xu and Bingguang Hao and Zechuan Wang and Yuntao Wen and Maolin Wang and Yang Liu and Long Chen and Dong Wang and Yicheng Chen and Cunyin Peng and Chenyi Zhuang and Jinjie Gu and Xiangyu Zhao and Shi Gu},
journal={arXiv preprint arXiv:2510.24645},
year={2025}
}
@article{lu2025don,
title={Don't Just Fine-tune the Agent, Tune the Environment},
author={Lu, Siyuan and Wang, Zechuan and Zhang, Hongxuan and Wu, Qintong and Gan, Leilei and Zhuang, Chenyi and Gu, Jinjie and Lin, Tao},
journal={arXiv preprint arXiv:2510.10197},
year={2025}
}
@article{chen2025v2p,
title={V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task},
author={Chen, Jikai and Chen, Long and Wang, Dong and Gan, Leilei and Zhuang, Chenyi and Gu, Jinjie},
journal={arXiv preprint arXiv:2508.13634},
year={2025}
}
@article{tan2025rag,
title={RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism},
author={Tan, Zhiwen and Huang, Jiaming and Wu, Qintong and Zhang, Hongxuan and Zhuang, Chenyi and Gu, Jinjie},
journal={arXiv preprint arXiv:2507.02962},
year={2025}
}
@article{hao2025funreason,
title={FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement},
author={Hao, Bingguang and Wang, Maolin and Xu, Zengzhuang and Peng, Cunyin and Chen, Yicheng and Zhao, Xiangyu and Gu, Jinjie and Zhuang, Chenyi},
journal={arXiv preprint arXiv:2505.20192},
year={2025}
}
For any question or feedback, please reach out to us at [email protected] or [email protected]
This project is licensed under the MIT License.








