Curated by Lamhot Siagian
| Category | Title & Link | Date | Authors (first author et al.) | One-line gist |
|---|---|---|---|---|
| Survey/Position | LLMs for Unit Testing: A Systematic Literature Review | 2025-06-18 | Quanjun Zhang et al. | First SLR of LLMs for unit testing up to Mar’25; tasks, usage, challenges, future work. |
| Survey/Position | Rethinking Testing for LLM Applications | 2025-08-28 | Atf et al. | Argues for layered testing (integration/orchestration/inference); proposes protocol. |
| Survey/Position | Challenges in Testing LLM-Based Applications | 2025-03-01 | Salem et al. | Taxonomy of test case design for LLM apps; open challenges and tooling gaps. |
| Survey/Position | Survey on Web Testing: On the Rise of AI | 2025-03-08 | I. Kertusha et al. | Surveys web testing with AI-driven test generation; adoption gaps. |
| Survey/Position | Survey of LLM-based Automated Program Repair | 2025-06-30 | B. Yang et al. | Categorizes 63 APR systems (2022–Jun 2025); augmentation patterns. |
| Survey/Position | Harden and Catch for Just-in-Time Assured LLM Testing | 2025-04-23 | Mark Harman et al. | Defines “harden” vs “catch” in LLM test design; open agenda. |
| Benchmark/Dataset | TCGBench: Reliable Test Case Generators | 2025-07-22 | Yuhan Cao et al. | Benchmark for LLM-generated test case generators; targeted task remains hard. |
| Benchmark/Dataset | ULT/PLT: Unit Test Benchmark | 2025-08-01 | Dong Huang et al. | Leak-resistant UTG benchmark; much lower LLM performance vs older sets. |
| Benchmark/Dataset | GBCV: Assessing LLM Test Generation | 2025-02-05 | Hung-Fu Chang et al. | Programs to systematically assess LLM test generation; GPT-4o & 3.5 studied. |
| Benchmark/Dataset | TestCase-Eval | 2025-06-18 | Zheyuan Yang et al. | 500 Codeforces problems + 100k solutions to evaluate fault coverage/exposure. |
| Test Generation | CANDOR: Multi-Agent JUnit Tests | 2025-08-27 | Qinghua Xu et al. | Multi-agent LLM pipeline with consensus; reduces oracle hallucinations. |
| Test Generation | PALM: Rust Unit Tests | 2025-06-11 | Bei Chu et al. | Path-constraint analysis + LLM prompting; coverage boost. |
| Test Generation | EP/BVA Unit Tests | 2025-05-14 | — | LLM prompts for Equivalence Partitioning & Boundary Values; needs human supervision. |
| Test Generation | Automatic High-Level Test Case Generation | 2025-03-26 | — | Use-cases → high-level tests; aligns with business requirements. |
| Test Generation | Static Analysis-Guided UTG | 2025-03-08 | — | Java UTG without sample usages; uses program analysis. |
| Test Generation | Acceptance Tests with LLMs (Cypress + Gherkin) | 2025-04-09 | Maider Azanza et al. | User stories → Gherkin → Cypress tests; supports ATDD. |
| Test Generation | Testora: PR-Intent Testing | 2025-03-27 | Martin Pradel et al. | Generates/refines tests from PR intent; detects unintended changes. |
| Oracles/Assertions | DeCon: Detecting Incorrect Assertions | 2025-01-06 | H. Yu et al. | Generates postconditions to catch wrong assertions. |
| Oracles/Assertions | AssertCoder | 2025-07-14 | E. Tian et al. | Synthesizes SVAs from multimodal specs (text, diagrams, tables). |
| Oracles/Assertions | Spec2Assertion | 2025-05-15 | H. Hu et al. | Pre-RTL assertion generation with progressive reg. |
| Oracles/Assertions | AssertionBench | 2025-02-28 | V. Pulavarthi et al. | Finds many syntactic/semantic errors in LLM assertions. |
| Oracles/Assertions | exLong: Exceptional Behavior Tests | 2025-05-31 | L. Zhong et al. | Generates exception-focused tests via fine-tuned CodeLlama. |
| GUI/Exploratory | ScenGen: Scenario-based GUI Testing | 2025-06-05 | Shengcheng Yu et al. | Five-agent pipeline for GUI testing. |
| GUI/Exploratory | GERALLT: Exploratory GUI Testing | 2025-05-23 | T. Rosenbach et al. | Surfaces unintuitive GUI behaviors in exploration. |
| Fuzzing/Security | Reliable LLM-Driven Fuzz Testing | 2025-03-02 | Y. Cheng et al. | Reliability bottlenecks & agenda for fuzzing with LLMs. |
| Fuzzing/Security | Model-Based Fuzzing of Protocols | 2025-08-03 | P. Zhang et al. | LLM synthesizes protocol state models; sequence fuzzing. |
| Fuzzing/Security | ORFuzz: Testing Risky Outputs | 2025-08-16 | H. Zhang et al. | Fuzzing framework probing unsafe outputs from LLMs. |
| Industry/Framework | Continuous Evaluation in Industry | 2025-04-26 | Maider Azanza et al. | SonarQube-based evaluation at LKS Next. |
| Industry/Framework | Expectations vs Reality: AI in Testing | 2025-04-07 | Katja Karhu et al. | Secondary study; interest high, adoption low. |
| Repair | Test Repair in LLM UTG | 2025-07-24 | M. Konstantinou et al. | Repairs incorrect LLM tests with rules + re-prompting. |
| Repair | ReduceFix: LLM Program Repair | 2025-07-19 | B. Yang et al. | Uses input reducers to improve repair quality. |
| Ecosystem Study | Bugs & Testing in LLM Libraries | 2025-06-14 | — | Empirical study of bugs/testing practices in LLM libraries. |