- Datasets
- Implemented
- stanfordnlp/imdb
- 25k Train / 25k Test
- labels (0 - neg; 1 - pos)
- yelp_review_full
- 650k Train / 50k Test
- labels (1 star ... 5 star)
- arXiv-Abstract-Label-20k
- 10k Train / 10k Test
- labels (8 Primary Categories: Math, CS,...)
- stanfordnlp/imdb
- TODO:
- Challenging datasets
- DBLP
- Amazon reviews
- Explore Datasets from Kaggle
- Challenging datasets
- Implemented
- Embedding Models
- Implemented:
- Bert
- Bert-Large
- Instructor
- T5
- GPT2 (Medium)
- TODO:
- Implement larger models from MTEB scoreboard
- Implemented:
- Instructions
- Experiment Log
- TODO:
- Effects of instructions: study how sensitive models are to instructions.
- Evaluation
- SVM (Linear)
- MLP
- Does the inclusion of instructions improve the embeddings?
- How can prompt engineering be used to further enhance embedding models?
git clone https://github.com/ZikunFu/Embedding-Model-with-Instructions.git
cd Embedding-Model-with-Instructions
conda create --name embed --file environment.yml
conda activate embed
pip install -r requirements.txt
- Clone the repository.
- Create and activate the environment.
- Run the Jupyter Notebook.
- Hugging Face for providing the pre-trained models.