A Python package and app for training and running claim verification models.
Claim verification is a task in natural language processing (NLP) with applications ranging from fact-checking to verifying the accuracy of scientific citations. The models used in this package are based on the transformer deep-learning architecture.
- Web app
- Data Modules
- Support for local files and HuggingFace datasets.
- Consistent label encoding for different natural language inference (NLI) datasets (see below).
- Supports shuffling training data from multiple datasets for improved model generalization.
- Trainer
- Training and data modules implemented with PyTorch Lightning.
- Use any pretrained sequence classification model from HuggingFace.
- Logger is configured to plot training and validation loss on the same graph in TensorBoard.
There is no need to install pyvers to run the app.
The pip install
command takes care of the requirements for the app.
Then, run the two python
commands in different terminals.
pip install torch transformers litserve gradio
python app/server.py
python app/app.py
App usage:
- Browse to the URL generated by the last command.
- Input a claim and evidence (example).
- Hit "Enter" or press the Submit button to run the inference.
- The probabilities predicted by the model are printed in the Classification text box and visualized in the barchart.
- Change the model using the dropdown at the top. This automatically re-runs the inference using the selected model.
Screenshot:
Install pyvers if you want to fine-tune models or use the data modules.
Run these commands in the root directory of the repository.
- The first command installs the requirements.
- The second command install the pyvers package in development mode.
- Remove the
-e
for a standard installation.
- Remove the
pip install -r requirements.txt
pip install -e .
- This class loads data from local data files in JSON lines format (jsonl).
- Supported datasets include SciFact and Citation-Integrity.
- The schema for the data files is described here.
- Get data files for SciFact and Citation-Integrity with labels used in pyvers here.
- The data module can be used to shuffle training data from both datasets.
from pyvers.data import FileDataModule
# Set the model used for the tokenizer
model_name = "bert-base-uncased"
# Load data from one dataset
dm = FileDataModule("data/scifact", model_name)
# Shuffle training data from two datasets
dm = FileDataModule(["data/scifact", "data/citint"], model_name)
# Get some tokenized data
dm.setup("fit")
next(iter(dm.train_dataloader()))
- This class loads data from selected HuggingFace datasets.
- Supported datasets are copenlu/fever_gold_evidence, facebook/anli, and nyu-mll/multi_nli.
from pyvers.data import NLIDataModule
model_name = "bert-base-uncased"
# Load data from HuggingFace datasets
dm = NLIDataModule("facebook/anli", model_name)
# Get some tokenized data
dm.prepare_data()
dm.setup("fit")
next(iter(dm.train_dataloader()))
- This is a small handmade toy dataset.
- There are no data files; the dataset is hard-coded in the class definition.
This takes about a minute on a CPU.
# Import required modules
import pytorch_lightning as pl
from pyvers.data import ToyDataModule
from pyvers.model import PyversClassifier
# Initialize data and model
dm = ToyDataModule("bert-base-uncased")
model = PyversClassifier(dm.model_name)
# Train model
trainer = pl.Trainer(enable_checkpointing=False, max_epochs=20)
trainer.fit(model, datamodule=dm)
# Test model
trainer.test(model, datamodule=dm)
# Show predictions
predictions = trainer.predict(model, datamodule=dm)
print(predictions)
This is what we get (results vary between runs):
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric ┃ DataLoader 0 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ AUROC Macro │ 0.963 │
│ AUROC Weighted │ 0.963 │
│ Accuracy │ 88.9 │
│ F1 Macro │ 88.6 │
│ F1 Micro │ 88.9 │
│ F1_NEI │ 100.0 │
│ F1_REFUTE │ 80.0 │
│ F1_SUPPORT │ 85.7 │
└───────────────────────────┴───────────────────────────┘
[['SUPPORT', 'SUPPORT', 'SUPPORT', 'NEI', 'NEI', 'NEI', 'REFUTE', 'REFUTE', 'SUPPORT']]
# Ground-truth labels are:
# [['SUPPORT', 'SUPPORT', 'SUPPORT', 'NEI', 'NEI', 'NEI', 'REFUTE', 'REFUTE', 'REFUTE']]
This uses a DeBERTa model trained on MultiNLI, Fever-NLI and Adversarial-NLI (ANLI) for zero-shot classification of claim-evidence pairs.
import pytorch_lightning as pl
from pyvers.model import PyversClassifier
from pyvers.data import ToyDataModule
dm = ToyDataModule("MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli")
model = PyversClassifier(dm.model_name)
trainer = pl.Trainer()
dm.setup(stage="test")
predictions = trainer.predict(model, datamodule=dm)
print(predictions)
# [['SUPPORT', 'SUPPORT', 'SUPPORT', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE']]
The pretrained model successfully distinguishes between SUPPORT and REFUTE on the toy dataset but misclassifies NEI as REFUTE. This can be improved with fine-tuning.
When using a pre-trained model for zero-shot classification, check the mapping between labels and IDs.
from transformers import AutoConfig
model_name = "answerdotai/ModernBERT-base"
config = AutoConfig.from_pretrained(model_name, num_labels=3)
print(config.to_dict()["id2label"])
# {0: 'LABEL_0', 1: 'LABEL_1', 2: 'LABEL_2'}
model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"
config = AutoConfig.from_pretrained(model_name, num_labels=3)
print(config.to_dict()["id2label"])
# {0: 'entailment', 1: 'neutral', 2: 'contradiction'}
Because it uses labels that are consistent with the NLI categories listed below, for zero-shot classification we would choose the pretrained DeBERTa model rather than ModernBERT. However, fine-tuning either model for text classification should work (see this page for information on fine-tuning ModernBERT).
ID | pyvers | Fever* | MultiNLI, ANLI |
---|---|---|---|
0 | SUPPORT | SUPPORTS | entailment |
1 | NEI | NOT ENOUGH INFO | neutral |
2 | REFUTE | REFUTES | contradiction |
* Text labels only