Skip to content

Test-Architect/llm-research-catalog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Recent 2025 research papers on LLM applied to Software Testing AI

Curated by Lamhot Siagian

Category Title & Link Date Authors (first author et al.) One-line gist
Survey/Position LLMs for Unit Testing: A Systematic Literature Review 2025-06-18 Quanjun Zhang et al. First SLR of LLMs for unit testing up to Mar’25; tasks, usage, challenges, future work.
Survey/Position Rethinking Testing for LLM Applications 2025-08-28 Atf et al. Argues for layered testing (integration/orchestration/inference); proposes protocol.
Survey/Position Challenges in Testing LLM-Based Applications 2025-03-01 Salem et al. Taxonomy of test case design for LLM apps; open challenges and tooling gaps.
Survey/Position Survey on Web Testing: On the Rise of AI 2025-03-08 I. Kertusha et al. Surveys web testing with AI-driven test generation; adoption gaps.
Survey/Position Survey of LLM-based Automated Program Repair 2025-06-30 B. Yang et al. Categorizes 63 APR systems (2022–Jun 2025); augmentation patterns.
Survey/Position Harden and Catch for Just-in-Time Assured LLM Testing 2025-04-23 Mark Harman et al. Defines “harden” vs “catch” in LLM test design; open agenda.
Benchmark/Dataset TCGBench: Reliable Test Case Generators 2025-07-22 Yuhan Cao et al. Benchmark for LLM-generated test case generators; targeted task remains hard.
Benchmark/Dataset ULT/PLT: Unit Test Benchmark 2025-08-01 Dong Huang et al. Leak-resistant UTG benchmark; much lower LLM performance vs older sets.
Benchmark/Dataset GBCV: Assessing LLM Test Generation 2025-02-05 Hung-Fu Chang et al. Programs to systematically assess LLM test generation; GPT-4o & 3.5 studied.
Benchmark/Dataset TestCase-Eval 2025-06-18 Zheyuan Yang et al. 500 Codeforces problems + 100k solutions to evaluate fault coverage/exposure.
Test Generation CANDOR: Multi-Agent JUnit Tests 2025-08-27 Qinghua Xu et al. Multi-agent LLM pipeline with consensus; reduces oracle hallucinations.
Test Generation PALM: Rust Unit Tests 2025-06-11 Bei Chu et al. Path-constraint analysis + LLM prompting; coverage boost.
Test Generation EP/BVA Unit Tests 2025-05-14 LLM prompts for Equivalence Partitioning & Boundary Values; needs human supervision.
Test Generation Automatic High-Level Test Case Generation 2025-03-26 Use-cases → high-level tests; aligns with business requirements.
Test Generation Static Analysis-Guided UTG 2025-03-08 Java UTG without sample usages; uses program analysis.
Test Generation Acceptance Tests with LLMs (Cypress + Gherkin) 2025-04-09 Maider Azanza et al. User stories → Gherkin → Cypress tests; supports ATDD.
Test Generation Testora: PR-Intent Testing 2025-03-27 Martin Pradel et al. Generates/refines tests from PR intent; detects unintended changes.
Oracles/Assertions DeCon: Detecting Incorrect Assertions 2025-01-06 H. Yu et al. Generates postconditions to catch wrong assertions.
Oracles/Assertions AssertCoder 2025-07-14 E. Tian et al. Synthesizes SVAs from multimodal specs (text, diagrams, tables).
Oracles/Assertions Spec2Assertion 2025-05-15 H. Hu et al. Pre-RTL assertion generation with progressive reg.
Oracles/Assertions AssertionBench 2025-02-28 V. Pulavarthi et al. Finds many syntactic/semantic errors in LLM assertions.
Oracles/Assertions exLong: Exceptional Behavior Tests 2025-05-31 L. Zhong et al. Generates exception-focused tests via fine-tuned CodeLlama.
GUI/Exploratory ScenGen: Scenario-based GUI Testing 2025-06-05 Shengcheng Yu et al. Five-agent pipeline for GUI testing.
GUI/Exploratory GERALLT: Exploratory GUI Testing 2025-05-23 T. Rosenbach et al. Surfaces unintuitive GUI behaviors in exploration.
Fuzzing/Security Reliable LLM-Driven Fuzz Testing 2025-03-02 Y. Cheng et al. Reliability bottlenecks & agenda for fuzzing with LLMs.
Fuzzing/Security Model-Based Fuzzing of Protocols 2025-08-03 P. Zhang et al. LLM synthesizes protocol state models; sequence fuzzing.
Fuzzing/Security ORFuzz: Testing Risky Outputs 2025-08-16 H. Zhang et al. Fuzzing framework probing unsafe outputs from LLMs.
Industry/Framework Continuous Evaluation in Industry 2025-04-26 Maider Azanza et al. SonarQube-based evaluation at LKS Next.
Industry/Framework Expectations vs Reality: AI in Testing 2025-04-07 Katja Karhu et al. Secondary study; interest high, adoption low.
Repair Test Repair in LLM UTG 2025-07-24 M. Konstantinou et al. Repairs incorrect LLM tests with rules + re-prompting.
Repair ReduceFix: LLM Program Repair 2025-07-19 B. Yang et al. Uses input reducers to improve repair quality.
Ecosystem Study Bugs & Testing in LLM Libraries 2025-06-14 Empirical study of bugs/testing practices in LLM libraries.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published