Token-Level Supervised Contrastive Learning for Punctuation Restoration

Overview

This project adapts Supervised Contrastive Learning to the punctuation restoration task. This is the implementation of the paper Token-Level Supervised Contrastive Learning for Punctuation Restoration accepted by InterSpeech 2021

Data

The data has been converted by corresponding BERT tokenizer with labels, and saved into the pickle files under the dataset/ directory.

The original text files are from International Workshop on Spoken Language Translation, 2012

Model Architecture

We fine-tuned a Transformer-based language model with supervised contrastive learning for the punctuation restoration task.

Environment Installation

To install the environment for this project, we recommend Anaconda

After having installed Anaconda, environment can be built by:

conda env create -f environment.yml

Training

There are some example scripts under the example_scripts directory

Firstly, activate the anaconda environment by:

conda activate punc_interspeech

Then, execute the train.py by (Example):

python train.py --config=config/roberta-large-scl.yml -l 0.1 -t 0.6

Here, we provide several config files and example scripts in

config/
example_scripts/

During training, the log of the tensorboard will be located under runs/ directory, which will be created automatically after program started.

Meanwhile, the models for each epoch will be saved under the saved_model directory.

Evaluation

Evaluation can be done by running:

python evaluate.py --config=[config path] --checkpoint=[saved model file path]

Cite this work

@inproceedings{huang21g_interspeech,
  author={Qiushi Huang and Tom Ko and H. Lilian Tang and Xubo Liu and Bo Wu},
  title={{Token-Level Supervised Contrastive Learning for Punctuation Restoration}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2012--2016},
  doi={10.21437/Interspeech.2021-661}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
dataset		dataset
example_scripts		example_scripts
images		images
loss		loss
model		model
optimizers/lookahead		optimizers/lookahead
saved_model		saved_model
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
evaluate.py		evaluate.py
train.py		train.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Token-Level Supervised Contrastive Learning for Punctuation Restoration

Overview

Data

Model Architecture

Environment Installation

Training

Evaluation

Cite this work

About

Releases

Packages

Languages

License

hqsiswiliam/punctuation-restoration-scl

Folders and files

Latest commit

History

Repository files navigation

Token-Level Supervised Contrastive Learning for Punctuation Restoration

Overview

Data

Model Architecture

Environment Installation

Training

Evaluation

Cite this work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages