Benchmarking NorBert, NbBert and mBert 🤖⚔️

A collaborative project by Bendik Solevåg and Erik Hystad

Requirements

This project requires python v3.8 and pip v21.3.1
To install the neccesary dependencies, run the setup.py

Usage

To run the project, run python3 main.py in the project root.
Evaluation results can be found in the ./results/ directory after running the benchmark in question.

Description

This repository aims to benchmark state of the art models for norwegian language modelling in various tasks. These are the benchmarks, their datasets, and the files responsible for performing the testing.

Benchmark	Executable	Dataset
Sentence-level sentiment polarity	./sentence_level_sentiment_polarity.py	./Data/sentence_level_sentiment_polarity/train.json ./Data/sentence_level_sentiment_polarity/test.json
Dialect classification	./DialectClassification.py	./Data/dialect_classification/dialect_tweet_train.json ./Data/dialect_classification/dialect_tweet_test.json
Dependency parsing	./TokenClassification.py	./Data/pos_tagging/no_bokmaal-ud-train.conllu ./Data/pos_tagging/no_bokmaal-ud-test.conllu ./Data/pos_tagging/no_nynorsk-ud-train.conllu ./Data/pos_tagging/no_nynorsk-ud-test.conllu
Part-of-speech tagging	./TokenClassification.py	./Data/pos_tagging/no_bokmaal-ud-train.conllu ./Data/pos_tagging/no_bokmaal-ud-test.conllu ./Data/pos_tagging/no_nynorsk-ud-train.conllu ./Data/pos_tagging/no_nynorsk-ud-test.conllu
Named entity recognition	./TokenClassification.py	./Data/pos_tagging/no_bokmaal-ud-train.conllu ./Data/pos_tagging/no_bokmaal-ud-test.conllu ./Data/pos_tagging/no_nynorsk-ud-train.conllu ./Data/pos_tagging/no_nynorsk-ud-test.conllu

Resources

The models we are benchmarking are each described in their own paper.

A description of NbBERT can be found here
A description of NorBERT can be found here

We found that Huggingface had a well developed knowledge base, and found this article on fine tuning a pretrained model particularly helpful. This article on training for named entity recognition we also relied heavily upon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking NorBert, NbBert and mBert 🤖⚔️

Requirements

Usage

Description

Resources

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
Data		Data
results		results
.gitignore		.gitignore
DialectClassification.py		DialectClassification.py
Models.py		Models.py
README.md		README.md
SentenceLevelSentimentPolarity.py		SentenceLevelSentimentPolarity.py
TokenClassification.py		TokenClassification.py
main.py		main.py
setup.py		setup.py

BendikSolevag/NLP-benchmarking

Folders and files

Latest commit

History

Repository files navigation

Benchmarking NorBert, NbBert and mBert 🤖⚔️

Requirements

Usage

Description

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages