Skip to content

The data and the PyTorch implementation for the models and experiments in the paper "Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction"

License

Notifications You must be signed in to change notification settings

epfl-dlab/BoostCD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BoostCD

1. Setup

Start by cloning the repository:

git clone https://github.com/epfl-dlab/BoostCD.git

We recommend creating a new conda virtual environment as follows:

conda env create -f environment.yml

This command also installs all the necessary packages.

2. Downloading data and models

The data is available on huggingface and can be loaded with

from datasets import load_dataset
dataset = load_dataset("msakota/boostie")

3. Usage

Training

To train a model from scratch on the desired data, run:

# specify a directory where training data is located
RUN_NAME="train_boostie_base_fe"
python src/genie/run_train.py run_name=$RUN_NAME +experiment/train=boostie_base_fe

Inference

To run inference on a trained model:

CHECKPOINT_PATH="./models/boostie_base_fe.ckpt" # specify path to the trained model
RUN_NAME="inference_boostie_base_fe"
python src/genie/run_inference.py run_name=$RUN_NAME checkpoint_path=$CHECKPOINT_PATH +experiment/inference=boostie_base_fe

Evaluation

To compute the micro and macro performance, as well as the performance bucketed by relation frequency and number of target triplets, you only need the run's WandB path and to execute:

python src/genie/run_process_predictions.py +experiment/process_predictions=complete_boostie wandb_run_path=$WANDB_PATH

Citation

If you found our resources useful, please consider citing our work.

@misc{šakota2025combiningconstrainedunconstraineddecoding,
      title={Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction}, 
      author={Marija Šakota and Robert West},
      year={2025},
      eprint={2506.14901},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.14901}, 
}

About

The data and the PyTorch implementation for the models and experiments in the paper "Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages