Skip to content
This repository was archived by the owner on Jul 30, 2024. It is now read-only.

Commit efe5561

Browse files
committed
first commit
1 parent df567f5 commit efe5561

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+12770
-1
lines changed

.idea/.gitignore

+3
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/PLBART.iml

+12
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/profiles_settings.xml

+6
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/misc.xml

+7
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/modules.xml

+8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/vcs.xml

+6
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

FILEs.md

+107
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
#### Files in this Repository
2+
3+
```
4+
.
5+
├── LICENSE
6+
├── README.md
7+
├── codeXglue
8+
│   ├── code_to_code
9+
│   │   ├── CodeBLEU
10+
│   │   │   ├── bleu.py
11+
│   │   │   ├── calc_code_bleu.py
12+
│   │   │   ├── dataflow_match.py
13+
│   │   │   ├── keywords
14+
│   │   │   │   ├── c_sharp.txt
15+
│   │   │   │   ├── java.txt
16+
│   │   │   │   └── python.txt
17+
│   │   │   ├── parser
18+
│   │   │   │   ├── DFG.py
19+
│   │   │   │   ├── __init__.py
20+
│   │   │   │   ├── build.py
21+
│   │   │   │   ├── build.sh
22+
│   │   │   │   ├── my-languages.so
23+
│   │   │   │   └── utils.py
24+
│   │   │   ├── readme.txt
25+
│   │   │   ├── syntax_match.py
26+
│   │   │   ├── utils.py
27+
│   │   │   └── weighted_ngram_match.py
28+
│   │   ├── bleu.py
29+
│   │   ├── clone_detection
30+
│   │   │   ├── encode.py
31+
│   │   │   ├── eval.py
32+
│   │   │   ├── evaluator.py
33+
│   │   │   ├── prepare.sh
34+
│   │   │   └── run.sh
35+
│   │   ├── defect_prediction
36+
│   │   │   ├── encode.py
37+
│   │   │   ├── eval.py
38+
│   │   │   ├── evaluator.py
39+
│   │   │   ├── prepare.sh
40+
│   │   │   └── run.sh
41+
│   │   ├── encode.py
42+
│   │   ├── evaluator.py
43+
│   │   ├── refin_prep.sh
44+
│   │   ├── refin_run.sh
45+
│   │   ├── trans_prep.sh
46+
│   │   └── trans_run.sh
47+
│   ├── code_to_text
48+
│   │   ├── download.sh
49+
│   │   ├── encode.py
50+
│   │   ├── evaluator.py
51+
│   │   ├── generate.sh
52+
│   │   ├── multilingual.sh
53+
│   │   ├── prep.sh
54+
│   │   ├── python_tokenizer.py
55+
│   │   └── run.sh
56+
│   └── text_to_code
57+
│   ├── CodeBLEU
58+
│   │   ├── bleu.py
59+
│   │   ├── calc_code_bleu.py
60+
│   │   ├── dataflow_match.py
61+
│   │   ├── keywords
62+
│   │   │   ├── c_sharp.txt
63+
│   │   │   ├── java.txt
64+
│   │   │   └── python.txt
65+
│   │   ├── parser
66+
│   │   │   ├── DFG.py
67+
│   │   │   ├── __init__.py
68+
│   │   │   ├── build.py
69+
│   │   │   ├── build.sh
70+
│   │   │   ├── my-languages.so
71+
│   │   │   └── utils.py
72+
│   │   ├── readme.txt
73+
│   │   ├── syntax_match.py
74+
│   │   ├── utils.py
75+
│   │   └── weighted_ngram_match.py
76+
│   ├── bleu.py
77+
│   ├── encode.py
78+
│   ├── evaluator.py
79+
│   ├── generate.sh
80+
│   ├── prep.sh
81+
│   └── run.sh
82+
├── preprocessing
83+
│   ├── __init__.py
84+
│   ├── detokenize.py
85+
│   ├── preprocess.py
86+
│   ├── src
87+
│   │   ├── __init__.py
88+
│   │   ├── code_tokenizer.py
89+
│   │   ├── dataset.py
90+
│   │   ├── javalang_tokenizer.py
91+
│   │   ├── test_tokenize_cpp.py
92+
│   │   ├── test_tokenize_java.py
93+
│   │   ├── test_tokenize_python.py
94+
│   │   ├── timeout.py
95+
│   │   └── utils.py
96+
│   └── test_preprocess.py
97+
├── pretrain
98+
│   ├── absolute.sh
99+
│   └── binarize.sh
100+
├── requirements.txt
101+
├── sentencepiece
102+
│   ├── encode.py
103+
│   └── train.py
104+
├── setup.py
105+
└── stackoverflow
106+
└── preprocess.py
107+
```

README.md

+41-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,42 @@
11
# PLBART
2-
Code pre-release of our work, **Unified Pre-training for Program Understanding and Generation** accepted at NAACL 2021.
2+
Code pre-release of our work, [Unified Pre-training for Program Understanding and Generation]() accepted at NAACL 2021.
3+
4+
#### Note. A detailed documentation is coming soon.
5+
6+
### Pre-training data
7+
8+
PLBART is pre-trained on Java and Python functions and natural language descriptions collected from Github and StackOverflow.
9+
10+
11+
### Evaluation tasks
12+
13+
We evaluated PLBART on five tasks.
14+
15+
- Code summarization [[info](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text#dataset)]
16+
- Code generation [[info](https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/text-to-code#task-definition)]
17+
- Code translation [[info](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans#task-definition)]
18+
- Clone detection [[info](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench#task-definition)]
19+
- Vulnerability detection [[info](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection#codexglue----defect-detection)]
20+
21+
22+
### Notes
23+
24+
- We will publish the pretrained PLBART checkpoint soon.
25+
- We list all the files in this repository [here](https://github.com/plbart-2020/PLBART/blob/main/FILEs.md).
26+
27+
### Acknowledgement
28+
29+
PLBART uses [Fairseq](https://github.com/pytorch/fairseq), [codeXglue](https://github.com/microsoft/CodeXGLUE), and [TransCoder](https://github.com/facebookresearch/TransCoder) and thanks the authors of these works for their contribution.
30+
31+
32+
### Citation
33+
34+
```
35+
@inproceedings{ahmad2020summarization,
36+
author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
37+
booktitle = {Proceedings of the 2021 Conference of the North {A}merican Chapter of the Association for Computational Linguistics},
38+
title = {Unified Pre-training for Program Understanding and Generation},
39+
year = {2021}
40+
}
41+
```
42+

0 commit comments

Comments
 (0)