This repository contains the source code of LISP and tables of data recorded in the experiments.
ICSE '25: LLM Based Input Space Partitioning Testing for Library APIs
Jiageng Li, Zhen Dong*, Chong Wang, Haozhen You, Cen Zhang, Yang Liu, Xin Peng
LISP is an approach that leverages LLMs to understand the code of a library API under test and perform input space partitioning based on its understanding and rich common knowledge.
Operating System Requirements
The artifact has been tested and is supported on Linux (Ubuntu 20.04/22.04) in an x86 architecture.
Please ensure that your machine has at least 4 GB of RAM for smooth execution.
Environment Requirements
If you prefer to run the artifact natively on your machine, ensure that the following software is installed:
- Java 11+ (for Java-based components)
- Python 3.10+ (for Python-based components)
- Apache Maven 3.6+ (for Java package management and building)
Setup
The LISP currently contains 2 components: llm-JQF and llm-seed-generator.
-
In the
./llm-JQF
directory, please execute the following command to install all Java dependencies:mvn install -DskipTests
-
In the
./llm-JQF
directory, please execute the following command to set an API Key:sh set-key.sh -k <Your API Key> -b <API Base URL>
Note:
Running the program can cost over $40 in OpenAI token costs. Including LISP, LISP-CG, Ablation Study.
OpenAI Official Base URL is
https://api.openai.com/v1
-
In the
./llm-seed-generator
directory, please execute the following command to install all Python dependencies:pip3 install -r requirements.txt
We have uploaded the signatures of the 2205 API methods mentioned in the paper in the ./llm-JQF/signs
folder, and they have been organized into ten different files based on the library names. The file structure is as follows:
LISP
├── llm-JQF
│ ├── signs
│ │ ├── commons-lang3
│ │ ├── guava
│ │ └── ...
│ └── ...
└── llm-ssed-generator
Options
We have placed a quick-start executable file in ./llm-JQF
, located at bin/jqf-llm
. It is designed with several options that are required during the evaluation process. Here is the translation and explanation for the options:
-i: Start instrumentation for coverage statistics collection.
-o: Output the coverage statistics results.
-l <signature-file>: Points to a specific signature file that contains the method you want to test.
-s <classification>: Start LISP-CG or ablation experiments. There are four types of classifications:
1. cg: Used for testing with LISP-CG.
2. skipUnder: Used for a specific ablation study type 1 (ISP+OI).
3. skipEP: Used for a specific ablation study type 2 (TDA+OI).
4. basic: Used for testing with LLM-Baseline.
These options can be used to customize the execution of the jqf-llm
tool based on the specific needs of evaluation or experiments.
To test a single method, you can use the following command:
bin/jqf-llm -i -o [-s <classification>] "<Library name>" "<API name>"
<Library name>
: This specifies the library from which the API method belongs. This should follow the format:group_id:artifact_id:version
For example:org.apache.commons:commons-lang3:3.13.0
com.google.guava:guava:32.1-jre
<API_name>
: This should follow the format:class_name.method_name(param_type_names)
For example:org.apache.commons.lang3.ArrayUtils.addAll(boolean[],boolean[])
com.google.common.primitives.Longs.min(long[])
When the command runs successfully, you should see output similar to the following:
Semantic Fuzzing with LLM
--------------------------
Test signature: org.apache.commons.lang3.ArrayUtils.addAll(boolean[],boolean[])
Elapsed time: 3s
Number of executions: 4 (total 4)
Valid inputs: 4 (100.00)
Unique failures: 0
API Coverage: 11 branches (100.00% of 11 branches)
Total Coverage: 16 branches (100.00% of 16 branches)
The -l
option should point to a file that contains multiple method signatures, one per line, with each signature in the format class_name.method_name(param_type_names)
. However, all the methods in the file must belong to the same library.
bin/jqf-llm -i -o -l <signature-file> [-s <classification>] "<Library name>"
Example for Multiple Methods
Now, assuming you want to test all the API methods listed in the signs/guava
using LISP-CG. In this case, the command would look like:
bin/jqf-llm -i -o -l "signs/method_signatures.txt" -s "cg" "com.google.guava:guava:32.1.2-jre"
After each execution of the command, a folder named result
will be generated in the project root directory. This folder will contain various output files.
LISP
├── llm-JQF
├── llm-seed-generator
└── result
│
├── commons-lang3_cg_1737620727667.json
├── ...
└── details
├── commons-lang3
│ ├── cg
│ │ ├── org.apache.commons.lang3.ArrayUtils.addAll(boolean[],boolean[])0
│ │ │ ├── coverage_hash
│ │ │ ├── detail.json
│ │ │ ├── graph.json
│ │ │ ├── input_generator
│ │ │ ├── llm_output.log
│ │ │ ├── RunCode.java
│ │ └── ...
│ ├── lisp
│ └── ...
├── guava
└── ...
The result
directory is where all the output generated from running the tests is stored. It includes both high-level results (e.g., coverage reports) and detailed logs/data (e.g., method-level coverage and experiment logs).
-
Run Summary (e.g.,
commons-lang3_cg_1737620727667.json
)This file is generated after each test run, and it contains key metrics related to the run, such as overall coverage, input generation count, runtime, token count, and exception count.
The filename is structured to include three components: the library name, the run mode, and the timestamp when the run finishes.
{ "coverage":1.0, "coveredEdge":11, "generatedInputsNum":6, "inputToken":9449, "outputToken":969, "runTime":27950, "successAPINum":1, "totalAPINum":1, "totalEdge":11, "unexpectedBehaviorNum":0, "validInputsNum":5 }
-
details/
SubdirectoryThe
details/
directory contains more specific data about individual libraries tested, including subfolders for each library (e.g.,commons-lang3
,guava
, etc.).- The
cg/
subfolder withindetails/commons-lang3/
is dedicated to storing the coverage results for Commons Lang 3's API methods, specifically for experiments that use the LISP-CG mode. Within this folder, the method signatures are used as filenames, with a unique index added at the end to distinguish between multiple runs for the same method. - Files Inside Each Method-Specific Folder:
coverage_hash
: This file stores a unique identifier (hash) for the coverage results from this particular experiment. It specifically records all the edges that were covered during the experiment.detail.json
: This file contains the detailed results of the run.graph.json
: This file contains the method information analyzed using Soot, a static analysis framework.input_generator
: This file records the input data generated by the LLM for testing the method.llm_output.log
: This file logs the interaction between the testing framework and the LLM.RunCode.java
: This Java file contains the test case code used to execute the method within the JQF framework.
- The
The evaluation of our artifact and its results is divided into 4 main research questions (RQ), each addressing a specific aspect of the evaluation process.
Below, we describe how the data for each of these research questions is obtained and stored within the project directory.
RQ1: Code Coverage
- Overall Coverage:
- Data located in Run Summary files (e.g.,
commons-lang3_cg_1737620727667.json
). This file provides the overall coverage statistics across all tested methods, giving a high-level summary of the coverage achieved during the entire experiment.
- Data located in Run Summary files (e.g.,
- Method-Specific Coverage:
- Data located in
detail.json
for each method. This file gives a detailed breakdown of coverage at the method level, including information on covered paths and specific branches.
- Data located in
RQ2: Usefulness
- Overall Usefulness:
- The Run Summary file provides an aggregate overview of all tests, highlighting unexpected behaviors that may have emerged during the testing process.
- Method-Specific Usefulness:
- Data from
detail.json
files for each individual method can reveal any unexpected behaviors or errors during execution.
- Data from
RQ3: Cost
- Overall Cost:
- The Run Summary file aggregates the total token usage and execution time for the entire set of tests, providing an overview of the resource consumption for the entire testing session.
- Method-Specific Cost:
detail.json
files record the token usage and execution time for each individual method tested, allowing us to analyze the cost at a granular level.
RQ4: Ablation Study
-
Overall Results:
- The Run Summary files will provide the aggregate results for each configuration, showing overall coverage, token usage, and time overhead for each run mode.
-
Method-Specific Results:
- By running the same method under different configurations, we can examine the detailed behavior of the method in each run mode. For example, the
detail.json
files will provide insights into how different modes affect code coverage, execution time, and errors.
- By running the same method under different configurations, we can examine the detailed behavior of the method in each run mode. For example, the