Skip to content

Latest commit

 

History

History
243 lines (174 loc) · 10.2 KB

README.md

File metadata and controls

243 lines (174 loc) · 10.2 KB

LISP

This repository contains the source code of LISP and tables of data recorded in the experiments.

ICSE '25: LLM Based Input Space Partitioning Testing for Library APIs

Jiageng Li, Zhen Dong*, Chong Wang, Haozhen You, Cen Zhang, Yang Liu, Xin Peng

LISP is an approach that leverages LLMs to understand the code of a library API under test and perform input space partitioning based on its understanding and rich common knowledge.

Native Setup

Operating System Requirements

The artifact has been tested and is supported on Linux (Ubuntu 20.04/22.04) in an x86 architecture.

Please ensure that your machine has at least 4 GB of RAM for smooth execution.

Environment Requirements

If you prefer to run the artifact natively on your machine, ensure that the following software is installed:

  • Java 11+ (for Java-based components)
  • Python 3.10+ (for Python-based components)
  • Apache Maven 3.6+ (for Java package management and building)

Setup

The LISP currently contains 2 components: llm-JQF and llm-seed-generator.

  1. In the ./llm-JQF directory, please execute the following command to install all Java dependencies:

    mvn install -DskipTests
  2. In the ./llm-JQF directory, please execute the following command to set an API Key:

    sh set-key.sh -k <Your API Key> -b <API Base URL>

    Note:

    Running the program can cost over $40 in OpenAI token costs. Including LISP, LISP-CG, Ablation Study.

    OpenAI Official Base URL is https://api.openai.com/v1

  3. In the ./llm-seed-generator directory, please execute the following command to install all Python dependencies:

    pip3 install -r requirements.txt

Usage

Selected API methods

We have uploaded the signatures of the 2205 API methods mentioned in the paper in the ./llm-JQF/signs folder, and they have been organized into ten different files based on the library names. The file structure is as follows:

LISP
├── llm-JQF
│   ├── signs
│   │   ├── commons-lang3
│   │   ├── guava
│   │   └── ...
│   └── ...
└── llm-ssed-generator

Script

Options

We have placed a quick-start executable file in ./llm-JQF, located at bin/jqf-llm. It is designed with several options that are required during the evaluation process. Here is the translation and explanation for the options:

-i: Start instrumentation for coverage statistics collection.
-o: Output the coverage statistics results.
-l <signature-file>: Points to a specific signature file that contains the method you want to test.
-s <classification>: Start LISP-CG or ablation experiments. There are four types of classifications:
				1. cg: Used for testing with LISP-CG.
				2. skipUnder: Used for a specific ablation study type 1 (ISP+OI).
				3. skipEP: Used for a specific ablation study type 2 (TDA+OI).
				4. basic: Used for testing with LLM-Baseline.

These options can be used to customize the execution of the jqf-llm tool based on the specific needs of evaluation or experiments.

Basic Run (Testing a Single Method)

To test a single method, you can use the following command:

bin/jqf-llm -i -o [-s <classification>] "<Library name>" "<API name>"
  • <Library name>: This specifies the library from which the API method belongs. This should follow the format: group_id:artifact_id:version For example:
    • org.apache.commons:commons-lang3:3.13.0
    • com.google.guava:guava:32.1-jre
  • <API_name>: This should follow the format: class_name.method_name(param_type_names) For example:
    • org.apache.commons.lang3.ArrayUtils.addAll(boolean[],boolean[])
    • com.google.common.primitives.Longs.min(long[])

When the command runs successfully, you should see output similar to the following:

Semantic Fuzzing with LLM
--------------------------

Test signature:       org.apache.commons.lang3.ArrayUtils.addAll(boolean[],boolean[])
Elapsed time:         3s
Number of executions: 4 (total 4)
Valid inputs:         4 (100.00)
Unique failures:      0
API   Coverage:       11 branches (100.00% of 11 branches)
Total Coverage:       16 branches (100.00% of 16 branches)

Testing Multiple Methods

The -l option should point to a file that contains multiple method signatures, one per line, with each signature in the format class_name.method_name(param_type_names). However, all the methods in the file must belong to the same library.

bin/jqf-llm -i -o -l <signature-file> [-s <classification>] "<Library name>"

Example for Multiple Methods

Now, assuming you want to test all the API methods listed in the signs/guava using LISP-CG. In this case, the command would look like:

bin/jqf-llm -i -o -l "signs/method_signatures.txt" -s "cg" "com.google.guava:guava:32.1.2-jre"

Results

After each execution of the command, a folder named result will be generated in the project root directory. This folder will contain various output files.

Structure of the result Folder

LISP
├── llm-JQF
├── llm-seed-generator
└── result
    │
    ├── commons-lang3_cg_1737620727667.json
    ├── ...
    └── details
        ├── commons-lang3
        │   ├── cg
        │   │   ├── org.apache.commons.lang3.ArrayUtils.addAll(boolean[],boolean[])0
        │   │   │   ├── coverage_hash
        │   │   │   ├── detail.json
        │   │   │   ├── graph.json
        │   │   │   ├── input_generator
        │   │   │   ├── llm_output.log
        │   │   │   ├── RunCode.java
        │   │   └── ...
        │   ├── lisp
        │   └── ...
        ├── guava
        └── ...

The result directory is where all the output generated from running the tests is stored. It includes both high-level results (e.g., coverage reports) and detailed logs/data (e.g., method-level coverage and experiment logs).

  1. Run Summary (e.g., commons-lang3_cg_1737620727667.json)

    This file is generated after each test run, and it contains key metrics related to the run, such as overall coverage, input generation count, runtime, token count, and exception count.

    The filename is structured to include three components: the library name, the run mode, and the timestamp when the run finishes.

    {
    	"coverage":1.0,
    	"coveredEdge":11,
    	"generatedInputsNum":6,
    	"inputToken":9449,
    	"outputToken":969,
    	"runTime":27950,
    	"successAPINum":1,
    	"totalAPINum":1,
    	"totalEdge":11,
    	"unexpectedBehaviorNum":0,
    	"validInputsNum":5
    }
    
  2. details/ Subdirectory

    The details/ directory contains more specific data about individual libraries tested, including subfolders for each library (e.g., commons-lang3, guava, etc.).

    • The cg/ subfolder within details/commons-lang3/ is dedicated to storing the coverage results for Commons Lang 3's API methods, specifically for experiments that use the LISP-CG mode. Within this folder, the method signatures are used as filenames, with a unique index added at the end to distinguish between multiple runs for the same method.
    • Files Inside Each Method-Specific Folder:
      • coverage_hash: This file stores a unique identifier (hash) for the coverage results from this particular experiment. It specifically records all the edges that were covered during the experiment.
      • detail.json: This file contains the detailed results of the run.
      • graph.json: This file contains the method information analyzed using Soot, a static analysis framework.
      • input_generator: This file records the input data generated by the LLM for testing the method.
      • llm_output.log: This file logs the interaction between the testing framework and the LLM.
      • RunCode.java: This Java file contains the test case code used to execute the method within the JQF framework.

Evaluation

The evaluation of our artifact and its results is divided into 4 main research questions (RQ), each addressing a specific aspect of the evaluation process.

Below, we describe how the data for each of these research questions is obtained and stored within the project directory.

RQ1: Code Coverage

  • Overall Coverage:
    • Data located in Run Summary files (e.g., commons-lang3_cg_1737620727667.json). This file provides the overall coverage statistics across all tested methods, giving a high-level summary of the coverage achieved during the entire experiment.
  • Method-Specific Coverage:
    • Data located in detail.json for each method. This file gives a detailed breakdown of coverage at the method level, including information on covered paths and specific branches.

RQ2: Usefulness

  • Overall Usefulness:
    • The Run Summary file provides an aggregate overview of all tests, highlighting unexpected behaviors that may have emerged during the testing process.
  • Method-Specific Usefulness:
    • Data from detail.json files for each individual method can reveal any unexpected behaviors or errors during execution.

RQ3: Cost

  • Overall Cost:
    • The Run Summary file aggregates the total token usage and execution time for the entire set of tests, providing an overview of the resource consumption for the entire testing session.
  • Method-Specific Cost:
    • detail.json files record the token usage and execution time for each individual method tested, allowing us to analyze the cost at a granular level.

RQ4: Ablation Study

  • Overall Results:

    • The Run Summary files will provide the aggregate results for each configuration, showing overall coverage, token usage, and time overhead for each run mode.
  • Method-Specific Results:

    • By running the same method under different configurations, we can examine the detailed behavior of the method in each run mode. For example, the detail.json files will provide insights into how different modes affect code coverage, execution time, and errors.