Research papers and publications related to the development and evaluation of AI systems for software engineering
Criteria for inclusion:
- The work is related to AI systems for software engineering (AI4SE). For this list, "AI" tends to mean language models (LMs).
- The work is released as a research paper, preprint, or academic publication.
- The work includes open source code and artifacts.
Note
The focus of this list more on research and academic contributions. Products and open source tools evaluated on SWE-bench are out of scope, but are welcome as submissions to the SWE-bench leaderboard.
Benchmarks and tasks that evaluate AI systems on software engineering related tasks.
- SWE-bench: Can Language Models Resolve Real-world Github Issues? [Code]
- SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? [Code]
- SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents [Code]
- Commit0: Library Generation from Scratch [Code]
- Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving [Code]
- SWE-bench Goes Live! [Code]
Systems and inference scaffolds that enable AI systems to perform software engineering tasks, such as SWE-bench.
SWE-agent's
- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [Code]
- OpenHands: An Open Platform for AI Software Developers as Generalist Agents [Code]
- SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement [Code]
- Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents [Code]
- OrcaLoca: An LLM Agent Framework for Software Issue Localization [Code]
Workflow Based
- Agentless: Demystifying LLM-based Software Engineering Agents [Code]
- AutoCodeRover: Autonomous Program Improvement [Code]
Datasets, techniques, and infrastructure to train better LMs and AI systems for software engineering.
Datasets
- SWE-smith: Scaling Data for Software Engineering Agents [Code]
- Training Software Engineering Agents and Verifiers with SWE-Gym [Code]
- R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents [Code]
Training Techniques
- SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution [Code]
- SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [Code]
Infrastructure