Recent 2025 research papers on LLM applied to Software Testing AI

Curated by Lamhot Siagian

Category	Title & Link	Date	Authors (first author et al.)	One-line gist
Survey/Position	LLMs for Unit Testing: A Systematic Literature Review	2025-06-18	Quanjun Zhang et al.	First SLR of LLMs for unit testing up to Mar’25; tasks, usage, challenges, future work.
Survey/Position	Rethinking Testing for LLM Applications	2025-08-28	Atf et al.	Argues for layered testing (integration/orchestration/inference); proposes protocol.
Survey/Position	Challenges in Testing LLM-Based Applications	2025-03-01	Salem et al.	Taxonomy of test case design for LLM apps; open challenges and tooling gaps.
Survey/Position	Survey on Web Testing: On the Rise of AI	2025-03-08	I. Kertusha et al.	Surveys web testing with AI-driven test generation; adoption gaps.
Survey/Position	Survey of LLM-based Automated Program Repair	2025-06-30	B. Yang et al.	Categorizes 63 APR systems (2022–Jun 2025); augmentation patterns.
Survey/Position	Harden and Catch for Just-in-Time Assured LLM Testing	2025-04-23	Mark Harman et al.	Defines “harden” vs “catch” in LLM test design; open agenda.
Benchmark/Dataset	TCGBench: Reliable Test Case Generators	2025-07-22	Yuhan Cao et al.	Benchmark for LLM-generated test case generators; targeted task remains hard.
Benchmark/Dataset	ULT/PLT: Unit Test Benchmark	2025-08-01	Dong Huang et al.	Leak-resistant UTG benchmark; much lower LLM performance vs older sets.
Benchmark/Dataset	GBCV: Assessing LLM Test Generation	2025-02-05	Hung-Fu Chang et al.	Programs to systematically assess LLM test generation; GPT-4o & 3.5 studied.
Benchmark/Dataset	TestCase-Eval	2025-06-18	Zheyuan Yang et al.	500 Codeforces problems + 100k solutions to evaluate fault coverage/exposure.
Test Generation	CANDOR: Multi-Agent JUnit Tests	2025-08-27	Qinghua Xu et al.	Multi-agent LLM pipeline with consensus; reduces oracle hallucinations.
Test Generation	PALM: Rust Unit Tests	2025-06-11	Bei Chu et al.	Path-constraint analysis + LLM prompting; coverage boost.
Test Generation	EP/BVA Unit Tests	2025-05-14	—	LLM prompts for Equivalence Partitioning & Boundary Values; needs human supervision.
Test Generation	Automatic High-Level Test Case Generation	2025-03-26	—	Use-cases → high-level tests; aligns with business requirements.
Test Generation	Static Analysis-Guided UTG	2025-03-08	—	Java UTG without sample usages; uses program analysis.
Test Generation	Acceptance Tests with LLMs (Cypress + Gherkin)	2025-04-09	Maider Azanza et al.	User stories → Gherkin → Cypress tests; supports ATDD.
Test Generation	Testora: PR-Intent Testing	2025-03-27	Martin Pradel et al.	Generates/refines tests from PR intent; detects unintended changes.
Oracles/Assertions	DeCon: Detecting Incorrect Assertions	2025-01-06	H. Yu et al.	Generates postconditions to catch wrong assertions.
Oracles/Assertions	AssertCoder	2025-07-14	E. Tian et al.	Synthesizes SVAs from multimodal specs (text, diagrams, tables).
Oracles/Assertions	Spec2Assertion	2025-05-15	H. Hu et al.	Pre-RTL assertion generation with progressive reg.
Oracles/Assertions	AssertionBench	2025-02-28	V. Pulavarthi et al.	Finds many syntactic/semantic errors in LLM assertions.
Oracles/Assertions	exLong: Exceptional Behavior Tests	2025-05-31	L. Zhong et al.	Generates exception-focused tests via fine-tuned CodeLlama.
GUI/Exploratory	ScenGen: Scenario-based GUI Testing	2025-06-05	Shengcheng Yu et al.	Five-agent pipeline for GUI testing.
GUI/Exploratory	GERALLT: Exploratory GUI Testing	2025-05-23	T. Rosenbach et al.	Surfaces unintuitive GUI behaviors in exploration.
Fuzzing/Security	Reliable LLM-Driven Fuzz Testing	2025-03-02	Y. Cheng et al.	Reliability bottlenecks & agenda for fuzzing with LLMs.
Fuzzing/Security	Model-Based Fuzzing of Protocols	2025-08-03	P. Zhang et al.	LLM synthesizes protocol state models; sequence fuzzing.
Fuzzing/Security	ORFuzz: Testing Risky Outputs	2025-08-16	H. Zhang et al.	Fuzzing framework probing unsafe outputs from LLMs.
Industry/Framework	Continuous Evaluation in Industry	2025-04-26	Maider Azanza et al.	SonarQube-based evaluation at LKS Next.
Industry/Framework	Expectations vs Reality: AI in Testing	2025-04-07	Katja Karhu et al.	Secondary study; interest high, adoption low.
Repair	Test Repair in LLM UTG	2025-07-24	M. Konstantinou et al.	Repairs incorrect LLM tests with rules + re-prompting.
Repair	ReduceFix: LLM Program Repair	2025-07-19	B. Yang et al.	Uses input reducers to improve repair quality.
Ecosystem Study	Bugs & Testing in LLM Libraries	2025-06-14	—	Empirical study of bugs/testing practices in LLM libraries.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recent 2025 research papers on LLM applied to Software Testing AI

About

Uh oh!

Releases

Packages

Test-Architect/llm-research-catalog

Folders and files

Latest commit

History

Repository files navigation

Recent 2025 research papers on LLM applied to Software Testing AI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages