Skip to content

yyihaoc/punctuate-bert-mini

Repository files navigation

🧠 Punctuation Restoration Model

This repository contains a fine-tuned transformer-based model for restoring punctuation in automatic speech recognition (ASR) outputs and spoken transcripts. The model adds missing punctuation like ., ,, ?, !, : and ; to improve readability and downstream NLP performance.


✨ Features

  • Fine-tuned on diverse spoken text data (Wikipedia corpus, Hugging Face datasets, podcast transcripts, manual YouTube captions from TedTalks and interviews)
  • Supports ; : ! ? , . — uncommon punctuation like ; and : included
  • Built on top of google/bert_uncased_L-4_H-256_A-4
  • Easy to plug into any transcript-cleaning pipeline
  • Does not support auto capitalisations, and works only on clean transcripts without any ; : ! ? , . punctuation

📦 Installation

Follow these steps to install and run the punctuation restoration model locally.

1. Clone the repository

git clone https://github.com/yyihaoc/punctuate-bert-mini.git
cd punctuate-bert-mini

2. Set up the virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

3. Install dependencies

pip install -r requirements.txt

4. Run the model

Open test_result_bert_mini.py and replace the example input with your own text. Then run

python test_result_bert_mini.py

About

A punctuation model trained on Google's google/bert_uncased_L-4_H-256_A-4

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages