Embedding Model with Instructions

Current Progress

Datasets
- Implemented
  - stanfordnlp/imdb
    - 25k Train / 25k Test
    - labels (0 - neg; 1 - pos)
  - yelp_review_full
    - 650k Train / 50k Test
    - labels (1 star ... 5 star)
  - arXiv-Abstract-Label-20k
    - 10k Train / 10k Test
    - labels (8 Primary Categories: Math, CS,...)
- TODO:
  - Challenging datasets
    - DBLP
    - Amazon reviews
    - Explore Datasets from Kaggle
Embedding Models
- Implemented:
  - Bert
  - Bert-Large
  - Instructor
  - T5
  - GPT2 (Medium)
- TODO:
  - Implement larger models from MTEB scoreboard
Instructions
- Experiment Log
- TODO:
  - Effects of instructions: study how sensitive models are to instructions.
Evaluation
- SVM (Linear)
- MLP

Research Questions:

Does the inclusion of instructions improve the embeddings?
How can prompt engineering be used to further enhance embedding models?

Setup Instructions

Clone the Repository

git clone https://github.com/ZikunFu/Embedding-Model-with-Instructions.git
cd Embedding-Model-with-Instructions

For Conda

conda create --name embed --file environment.yml
conda activate embed

For Pip

pip install -r requirements.txt

Usage

Clone the repository.
Create and activate the environment.
Run the Jupyter Notebook.

Acknowledgments

Hugging Face for providing the pre-trained models.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.gitignore		.gitignore
README.md		README.md
Research_embed.ipynb		Research_embed.ipynb
TextEmbeddingPipeline.py		TextEmbeddingPipeline.py
arXiv.ipynb		arXiv.ipynb
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embedding Model with Instructions

Current Progress

Research Questions:

Setup Instructions

Clone the Repository

For Conda

For Pip

Usage

Acknowledgments

About

Contributors 2

Languages

ZikunFu/Embedding-Model-with-Instructions

Folders and files

Latest commit

History

Repository files navigation

Embedding Model with Instructions

Current Progress

Research Questions:

Setup Instructions

Clone the Repository

For Conda

For Pip

Usage

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages