Code for our paper Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations at EMNLP 2023
- transformers == 4.18.0
- pytorch-lightning == 1.6.1
- ParaNMT
- Discofuse Wikipedia balanced portion
- Wikisplit
- Google Sentence Compression The data folder should have a similar structure as the following:
└── data
└── paranmt
└── para-nmt-5m-processed.txt
└── discofuse
├── discofuse-train-balanced.txt
└── discofuse-valid-balanced.txt
└── discofuse-test-balanced.txt
└── wikisplit
├── wikisplit-train.txt
└── wikisplit-valid.txt
└── wikisplit-test.txt
└── google
├── sent-comp-train.txt
└── sent-comp-test.txt
To train InterSent from scratch, run the following:
bash train.sh
To evaluate InterSent on interpretability, run the following with your checkpoint path:
bash test.sh
To evaluate InterSent on STS, run the following with your checkpoint path:
bash stseval.sh