This is the codebase for the paper "Iterative Generation of Adversarial Example for Deep Code Models".
Create Environment
pip install -r requirements.txt
Build tree-sitter
We use tree-sitter
to parse code snippets and extract variable names. You need to go to ./python_parser/parser_folder
folder and build tree-sitter using the following commands:
bash build.sh
Model Fine-tuning
Use train.py
to train models.
Take an example:
cd CodeBERT_adv/Clone-detection/code
python train.py
Running Attacks
You should download the Dataset and Model from this url and place the file in the appropriate path.
Take an example:
cd CodeBERT_adv/Clone-detection/attack
python run_xxx.py
The run_xxx.py
here can be run_itgen.py
, run_alert.py
, run_beam.py
Take run_itgen.py
as an example:
import os
os.system("CUDA_VISIBLE_DEVICES=2 python attack_itgen.py \
--output_dir=../saved_models \
--model_type=roberta \
--tokenizer_name=microsoft/codebert-base \
--model_name_or_path=microsoft/codebert-base \
--csv_store_path result/attack_itgen_all.jsonl \
--base_model=microsoft/codebert-base-mlm \
--eval_data_file=../../../dataset/Clone-detection/test_sampled.txt \
--block_size 512 \
--eval_batch_size 2 \
--seed 123456")
Run experiments on other tasks of other models as well.
./CodeBERT_adv/
contains code for the CodeBERT experiment.
We are very grateful that the authors of CodeBERT, GraphCodeBERT, CodeT5, CodeGPT, PLBART, ALERT, BeamAttack make their code publicly available so that we can build this repository on top of their codes.