This is the official code for ACL 2023 Findings paper "Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training". Our code is mainly built upon the official Github Repository of Facebookresearch/Contriever.
conda create -n recontriever python=3.9
conda activate recontriever
pip install -r requirements.txt
Models | Link |
---|---|
Contriever_Reproduced | Yibin-Lei/Contriever-Reproduced |
ReContriever | Yibin-Lei/ReContriever |
You can use them with:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Yibin-Lei/Contriever-Reproduced")
model = AutoModel.from_pretrained("Yibin-Lei/Contriever-Reproduced")
Please refer to the readme of Facebookresearch/Contriever, which provides a detailed guide for data preprocessing.
We provide the scripts to pre-train ReContriever and Contriever in ./pretrain_scripts with 16 A100 GPUs.
Our one-document-multiple-pair strategy is implemented in ./src/data.py".
Our relevance-aware contrastive loss is implemented in ./src/releance_aware.py
For BEIR evaluation, simply run
python eval_beir.py --model_name_or_path $your_model_path$ --dataset $data_name$
For open-domain QA retrieval tasks, we use the evaluation scripts provided by the oriram/spider.