pyvers

A Python package and app for training and running claim verification models.

Claim verification is a task in natural language processing (NLP) with applications ranging from fact-checking to verifying the accuracy of scientific citations. The models used in this package are based on the transformer deep-learning architecture.

Features

Web app
- Built with LitServe and Gradio.
Data Modules
- Support for local files and HuggingFace datasets.
- Consistent label encoding for different natural language inference (NLI) datasets (see below).
- Supports shuffling training data from multiple datasets for improved model generalization.
Trainer
- Training and data modules implemented with PyTorch Lightning.
- Use any pretrained sequence classification model from HuggingFace.
- Logger is configured to plot training and validation loss on the same graph in TensorBoard.

Running the app

There is no need to install pyvers to run the app. The pip install command takes care of the requirements for the app. Then, run the two python commands in different terminals.

pip install torch transformers litserve gradio
python app/server.py
python app/app.py

App usage:

Browse to the URL generated by the last command.
Input a claim and evidence (example).
Hit "Enter" or press the Submit button to run the inference.
The probabilities predicted by the model are printed in the Classification text box and visualized in the barchart.
Change the model using the dropdown at the top. This automatically re-runs the inference using the selected model.

Screenshot:

Installation

Install pyvers if you want to fine-tune models or use the data modules.

Run these commands in the root directory of the repository.

The first command installs the requirements.
The second command install the pyvers package in development mode.
- Remove the -e for a standard installation.

pip install -r requirements.txt
pip install -e .

Loading data

`pyvers.data.FileDataModule`

This class loads data from local data files in JSON lines format (jsonl).
Supported datasets include SciFact and Citation-Integrity.
The schema for the data files is described here.
Get data files for SciFact and Citation-Integrity with labels used in pyvers here.
The data module can be used to shuffle training data from both datasets.

from pyvers.data import FileDataModule
# Set the model used for the tokenizer
model_name = "bert-base-uncased"

# Load data from one dataset
dm = FileDataModule("data/scifact", model_name)

# Shuffle training data from two datasets
dm = FileDataModule(["data/scifact", "data/citint"], model_name)

# Get some tokenized data
dm.setup("fit")
next(iter(dm.train_dataloader()))

`pyvers.data.NLIDataModule`

This class loads data from selected HuggingFace datasets.
Supported datasets are copenlu/fever_gold_evidence, facebook/anli, and nyu-mll/multi_nli.

from pyvers.data import NLIDataModule
model_name = "bert-base-uncased"

# Load data from HuggingFace datasets
dm = NLIDataModule("facebook/anli", model_name)

# Get some tokenized data
dm.prepare_data()
dm.setup("fit")
next(iter(dm.train_dataloader()))

`pyvers.data.ToyDataModule`

This is a small handmade toy dataset.
There are no data files; the dataset is hard-coded in the class definition.

Fine-tuning example

This takes about a minute on a CPU.

# Import required modules
import pytorch_lightning as pl
from pyvers.data import ToyDataModule
from pyvers.model import PyversClassifier

# Initialize data and model
dm = ToyDataModule("bert-base-uncased")
model = PyversClassifier(dm.model_name)

# Train model
trainer = pl.Trainer(enable_checkpointing=False, max_epochs=20)
trainer.fit(model, datamodule=dm)

# Test model
trainer.test(model, datamodule=dm)

# Show predictions
predictions = trainer.predict(model, datamodule=dm)
print(predictions)

This is what we get (results vary between runs):

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│        AUROC Macro        │          0.963            │
│      AUROC Weighted       │          0.963            │
│         Accuracy          │           88.9            │
│         F1 Macro          │           88.6            │
│         F1 Micro          │           88.9            │
│          F1_NEI           │          100.0            │
│         F1_REFUTE         │           80.0            │
│        F1_SUPPORT         │           85.7            │
└───────────────────────────┴───────────────────────────┘

[['SUPPORT', 'SUPPORT', 'SUPPORT', 'NEI', 'NEI', 'NEI', 'REFUTE', 'REFUTE', 'SUPPORT']]

# Ground-truth labels are:
# [['SUPPORT', 'SUPPORT', 'SUPPORT', 'NEI', 'NEI', 'NEI', 'REFUTE', 'REFUTE', 'REFUTE']]

Zero-shot example

This uses a DeBERTa model trained on MultiNLI, Fever-NLI and Adversarial-NLI (ANLI) for zero-shot classification of claim-evidence pairs.

import pytorch_lightning as pl
from pyvers.model import PyversClassifier
from pyvers.data import ToyDataModule
dm = ToyDataModule("MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli")
model = PyversClassifier(dm.model_name)
trainer = pl.Trainer()
dm.setup(stage="test")
predictions = trainer.predict(model, datamodule=dm)
print(predictions)
# [['SUPPORT', 'SUPPORT', 'SUPPORT', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE']]

The pretrained model successfully distinguishes between SUPPORT and REFUTE on the toy dataset but misclassifies NEI as REFUTE. This can be improved with fine-tuning.

When using a pre-trained model for zero-shot classification, check the mapping between labels and IDs.

from transformers import AutoConfig

model_name = "answerdotai/ModernBERT-base"
config = AutoConfig.from_pretrained(model_name, num_labels=3)
print(config.to_dict()["id2label"])
# {0: 'LABEL_0', 1: 'LABEL_1', 2: 'LABEL_2'}

model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"
config = AutoConfig.from_pretrained(model_name, num_labels=3)
print(config.to_dict()["id2label"])
# {0: 'entailment', 1: 'neutral', 2: 'contradiction'}

Because it uses labels that are consistent with the NLI categories listed below, for zero-shot classification we would choose the pretrained DeBERTa model rather than ModernBERT. However, fine-tuning either model for text classification should work (see this page for information on fine-tuning ModernBERT).

Label to ID mapping

ID	pyvers	Fever*	MultiNLI, ANLI
0	SUPPORT	SUPPORTS	entailment
1	NEI	NOT ENOUGH INFO	neutral
2	REFUTE	REFUTES	contradiction

* Text labels only

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
app		app
assets		assets
pyvers		pyvers
scripts		scripts
.git-blame-ignore-revs		.git-blame-ignore-revs
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyvers

Features

Running the app

Installation

Loading data

`pyvers.data.FileDataModule`

`pyvers.data.NLIDataModule`

`pyvers.data.ToyDataModule`

Fine-tuning example

Zero-shot example

Label to ID mapping

About

Releases

Packages

Languages

License

jedick/pyvers

Folders and files

Latest commit

History

Repository files navigation

pyvers

Features

Running the app

Installation

Loading data

pyvers.data.FileDataModule

pyvers.data.NLIDataModule

pyvers.data.ToyDataModule

Fine-tuning example

Zero-shot example

Label to ID mapping

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`pyvers.data.FileDataModule`

`pyvers.data.NLIDataModule`

`pyvers.data.ToyDataModule`

Packages