Contextualized hate speech classification

Code for "Assessing the impact of contextual information in hate speech detection", Pérez, J. M., Luque, F., Zayat, D., Kondratzky, M., Moro, A., Serrati, P., Zajac, J., Miguel, P., Gravano, A. & Cotik, V. (2022).

Link to paper

Dataset

The full dataset can be found at the huggingface hub.

You can check also our demo of contextualized hate speech detection

Instructions

Get the dataset and put it under data/
Generate and preprocess dataset

python bin/create_dataset.py data/articles.json data/comments.json --train_path data/train.json --test_path data/test.json
python bin/preprocess_dataset.py

Train models

First, plain hate classifiers

# Train non-contextualized model
python bin/train_hate_classifier.py --context 'none' --output_path models/bert-non-contextualized-hate-speech-es/ --epochs 10
python bin/train_category_classifier.py --context 'none' --output_path models/bert-non-contextualized-hate-category-es/ --epochs 5

# Train contextualized model
python bin/train_hate_classifier.py --context 'title' --output_path models/bert-contextualized-hate-speech-es/ --epochs 10
python bin/train_category_classifier.py --context 'title' --output_path models/bert-contextualized-hate-category-es/ --epochs 5


# Train fully contextualized
# Check out notebooks/Hatespeech_Colab_TPU.ipynb
python bin/train_category_classifier.py --context 'title+body' --output_path models/bert-title+body-hate-speech-es/ --epochs 5 --batch_size 8 --eval_batch_size 8
python bin/train_hate_classifier.py --context 'title+body' --output_path models/bert-title+body-hate-speech-es/ --epochs 10

For more instructions, check TRAIN_EVALUATE.md

Finetuning

First, preprocess data

python bin/preprocess_finetune_data.py "/content/drive/Shareddrives/HateSpeech/data/hatespeech-data/" "/content/drive/MyDrive/data/finetune-news/finetune_data/" --num_workers 10

Run finetuning

python bin/xla_spawn.py --num_cores 8 bin/finetune_lm.py config/no_context_ft.json

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
bin		bin
cards		cards
config		config
data		data
docs		docs
evaluations		evaluations
hatedetection		hatedetection
notebooks		notebooks
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.py		setup.py
test_train.sh		test_train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contextualized hate speech classification

Dataset

Instructions

Finetuning

About

Releases

Packages

Languages

finiteautomata/contextualized-hatespeech-classification

Folders and files

Latest commit

History

Repository files navigation

Contextualized hate speech classification

Dataset

Instructions

Finetuning

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages