This repository is for the artifacts of the paper “Fact-Aligned and Template-Constrained Static Analyzer Rule Enhancement with LLMs”
RuleRefiner introduces a multi-stage LLM framework that eliminates false alarms in static analyzers by refining detection rules through dynamic profiling, differential fault localization, and constrained LLM modifications, achieving 80.28% success on 218 Semgrep issues (1.34x-2.45x over baselines) while producing expert-level rules.
- Python 3.10+
- pip 22.0+
# Install dependencies
pip install -r requirements.txt
We use LLM API service from Bailian platform
export DASHSCOPE_API_KEY=XXXXXXXXXXXXXXXX
For more detail or to support other LLMs out of our paper, please refers to deepseek.py
.
cat deepseek.py
Run the pipeline test, it should work without exception.
python3 semgrep_pipeline_test.py
For single defective rule refinment, use rr.py
.
python rr.py --input_file example.json --k 5
The input file format is as follow:
{
"id": "avoid-pyyaml-load",
"rule": "rules:\n- id: avoid-pyyaml-load\n languages:\n - python\n message: |\n Avoid using `load()`. `PyYAML.load` can create arbitrary Python\n objects. A malicious actor could exploit this to run arbitrary\n code. Use `safe_load()` instead.\n fix-regex:\n regex: load\n replacement: safe_load\n count: 1\n severity: ERROR\n patterns:\n - pattern-inside: |\n import yaml\n ...\n - pattern-not-inside: |\n $YAML = ruamel.yaml.YAML(...)\n ...\n $YAML.load(...)\n - pattern: yaml.load(...)\n",
"rule_path": "avoid-pyyaml-load.yaml",
"test_path": "avoid-pyyaml-load.py",
"splited_testsuite_b": [
"import yaml\n\n#ruleid:avoid-pyyaml-load\nyaml.load(\"!!python/object/new:os.system [echo EXPLOIT!]\")",
"import yaml\n\ndef thing(**kwargs):\n #ruleid:avoid-pyyaml-load\n yaml.load(\"!!python/object/new:os.system [echo EXPLOIT!]\", **kwargs)",
"import yaml\n\ndef check_ruamel_yaml():\n from ruamel.yaml import YAML\n yaml = YAML(typ=\"rt\")\n # ok:avoid-pyyaml-load\n yaml.load(\"thing.yaml\")",
"import yaml\n\ndef this_is_ok(stream):\n #ok:avoid-pyyaml-load\n return yaml.load(stream, Loader=yaml.CSafeLoader)"
],
"actual": [
true,
true,
false,
true
],
"expected": [
true,
true,
false,
false
],
"index": 4
}
For multiple rules refinement, for example, run a experiment, using semgrep_pipeline.py
python semgrep_pipeline.py [OPTIONS]
This run the Semgrep rule refinement pipeline with configurable modes and models. Executes the pipeline k
times (pass@k evaluation), aggregates results, and calculates success rate.
Argument | Type | Default | Choices | Description |
---|---|---|---|---|
--mode |
str | full |
naive , cot , fewshot , localization , template , full |
Pipeline execution mode: - naive : Basic LLM refinement- cot : Chain-of-Thought prompting- fewshot : Few-shot learning- localization : Fault localization only- template : Template-guided refinement- full : Full RuleRefiner pipeline |
--prompt_file |
str | results/semgrep_prompts.jsonl |
- | Output file for generated LLM prompts |
--result_file |
str | results/semgrep_result.jsonl |
- | Output file for raw LLM responses |
--verify_file |
str | results/semgrep_verify.jsonl |
- | Output file for verification results |
--temperature |
float | 0.0 |
0.0-1.0 |
LLM sampling temperature (0=deterministic) |
--model |
str | deepseek-v3 |
deepseek-v3 , qwen-plus |
LLM backend to use |
--k |
int | 1 |
1-10 |
Number of iterations for pass@k evaluation |
Basic execution with default parameters:
python semgrep_pipeline.py
Full pipeline with Qwen model:
python semgrep_pipeline.py \
--mode full \
--model qwen-plus \
--temperature 0.3 \
--k 5
Run localization-only mode:
python semgrep_pipeline.py \
--mode localization \
--prompt_file results/localization_prompts.jsonl
Evaluate few-shot learning with pass@3:
python semgrep_pipeline.py \
--mode fewshot \
--k 3 \
--result_file results/fewshot_results.jsonl
File Pattern | Contents |
---|---|
results/semgrep_prompts.jsonl.{i} |
Generated prompts for each rule |
results/semgrep_result.jsonl.{i} |
Raw LLM responses |
results/semgrep_verify.jsonl.{i} |
Verification results with success status |
├── dataset/ # Semgrep datasets
├── examples/ # Example rule files and test cases
├── experimental/ # Experimental predicate graph translation for CodeQL and PMD
├── results/ # Output files and evaluation results
├── scripts/ # useful scripts
├── venv/ # Virtual environment (excluded from version control)
│
├── semgrep2nx.py # Semgrep to Predicate Graph conversion & dynamic profiling
├── graph.py # Graph analysis
├── semgrep_locate.py # Fault localization implementation
├── semgrep_template.py # template generator
├── semgrep_prompt.py # prompt generation
├── semgrep_verify.py # Rule validation
├── semgrep_pipeline.py # pipeline
├── rr.py # entry of RuleRefiner
├── semgrep_syntax.py # Rule syntax fixer
└── testcase.py # Test case
│
├── models/ # LLM integration modules
│ ├── deepseek.py # DeepSeek model interface
│ ├── doubao.py # Doubao model interface
│ └── qwen.py # Qwen model interface
│
├── semgrep_output_parser.py # Output parsing utilities
├── utils.py # Common utilities
└── para.py # Parallel processing config
...
This is originally created for our paper "Fact-Aligned and Template-Constrained Static Analyzer Rule Enhancement with LLMs" (ASE 2025, to appear).