Author: Anvar Iskhakov
Email: [email protected]
Group: BS21-AI
Install all required packages with
pip install -r requirements.txtDownload models folder and pasted it into /models folder in the project root
Download raw dataset folder and pasted it into /data/raw folder in the project root
To pre-process dataset to further training enter following command for the repository root:
python src/data/make_dataset.py One can add some arguments as well. Command with default arguments is:
python src/data/make_dataset.py --input_file data/raw/filtered.tsv --output_file data/internal/preprocessed.csv --tokenizer_model SkolkovoInstitute/roberta_toxicity_classifier To train final model on the preprocessed dataset enter following command for the repository root:
python src/models/train_model.py One can add some arguments as well. Command with default arguments is:
$ python src/models/train_model.py --max_length 85 --batch_size 64 --checkpoint_name test --model_name SkolkovoInstitute/bart-base-detox --train_ratio 0.8 -
-val_test_ratio 0.5 --learning_rate 0.00002 --weight_decay 0.01 --save_total_limit 1 --num_train_epochs 1 --save_steps 500 --eval_steps 500 --logging_steps 100
To use the final trained model on your own sentences enter following command for the repository root:
python src/models/predict_model.pyOne can add some arguments as well. Command with default arguments is:
python src/models/predict_model.py --checkpoint_name best --max_length 85 --model_name SkolkovoInstitute/bart-base-detoxOther hypotheses were tested in notebooks that can be found in /notebooks/extra_hypotheses