Dataless Text Classification on AG News with BERT

Explotiation of dataless text classification with a small BERT model on the AG News topic dataset. News articles and category labels are embedded using BERT. Similarity between article embeddings and label embeddings is used as a baseline approach and several experiments to improve accuracy are conducted. Additionally, the same BERT model is fine-tuned on the full dataset for supervised comparison. With dataless classification an accuracy of 77.6% is achieved whereas the fine-tuned BERT model achieves an accuracy of 91%.

Running the experiments

Install the required packages with pip install -r requirements.txt and run the notebooks in src/fine_tuning and src/dataless respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
figures		figures
models		models
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataless Text Classification on AG News with BERT

Running the experiments

About

Releases

Packages

Contributors 2

Languages

viktor-enzell/fine-tune-bert

Folders and files

Latest commit

History

Repository files navigation

Dataless Text Classification on AG News with BERT

Running the experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages