Skip to content

Commit e57aa7b

Browse files
committed
docs: updated readme and embeddings notebook and added citation
1 parent 52f9b3f commit e57aa7b

File tree

4 files changed

+57
-284
lines changed

4 files changed

+57
-284
lines changed

CITATION.cff

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
cff-version: 1.2.0
2+
message: "If you use this software, please cite it as below."
3+
authors:
4+
- family-names: "Lim"
5+
given-names: "Chee Kin"
6+
title: "open-text-embeddings"
7+
date-released: 2023-10-10
8+
url: "https://github.com/limcheekin/open-text-embeddings"

README.md

Lines changed: 47 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
1-
# Open Source Text Embedding Models with OpenAI API-Compatible Endpoint
1+
# open-text-embeddings
22

3-
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
3+
[![PyPI](https://img.shields.io/pypi/v/open-text-embeddings)](https://pypi.org/project/open-text-embeddings/)
4+
[![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1wfgfkt6xk3meSF5jWHDMqo6mL0ZvPw2f?usp=sharing)
45
[![Publish Python Package](https://github.com/limcheekin/open-text-embeddings/actions/workflows/publish.yml/badge.svg)](https://github.com/limcheekin/open-text-embeddings/actions/workflows/publish.yml)
56

67
Many open source projects support the compatibility of the `completions` and the `chat/completions` endpoints of the OpenAI API, but do not support the `embeddings` endpoint.
78

89
The goal of this project is to create an OpenAI API-compatible version of the `embeddings` endpoint, which serves open source sentence-transformers models and other models supported by the LangChain's [HuggingFaceEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.huggingface.HuggingFaceEmbeddings.html), HuggingFaceInstructEmbeddings and HuggingFaceBgeEmbeddings class.
910

10-
## Supported Text Embeddings Models
11+
## ℹ️ Supported Text Embeddings Models
1112

1213
Below is a compilation of open-source models that are tested via the `embeddings` endpoint:
1314

@@ -17,9 +18,15 @@ Below is a compilation of open-source models that are tested via the `embeddings
1718
- [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
1819
- [universal-sentence-encoder-large/5](https://tfhub.dev/google/universal-sentence-encoder-large/5) (Please refer to the `universal_sentence_encoder` branch for more details)
1920

20-
The models mentioned above have undergone personal testing and verification. It is worth noting that all sentence-transformers models are expected to perform seamlessly with the endpoint.
21+
The models mentioned above have undergone testing and verification. It is worth noting that all sentence-transformers models are expected to perform seamlessly with the endpoint.
2122

22-
## Standalone FastAPI Server
23+
## 🔍 Demo
24+
25+
Try out open-text-embeddings in your browser:
26+
27+
[![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1wfgfkt6xk3meSF5jWHDMqo6mL0ZvPw2f?usp=sharing)
28+
29+
## 🖥️ Standalone FastAPI Server
2330

2431
To run the embeddings endpoint locally as a standalone FastAPI server, follow these steps:
2532

@@ -52,7 +59,7 @@ To run the embeddings endpoint locally as a standalone FastAPI server, follow th
5259
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
5360
```
5461

55-
## AWS Lambda Function
62+
## ☁️ AWS Lambda Function
5663

5764
To deploy the embeddings endpoint as an AWS Lambda Function using GitHub Actions, follow these steps:
5865

@@ -62,22 +69,52 @@ To deploy the embeddings endpoint as an AWS Lambda Function using GitHub Actions
6269

6370
3. Manually trigger the `Deploy Dev` or `Remove Dev` GitHub Actions to deploy or remove the AWS Lambda Function.
6471

65-
## Testing the Embeddings Endpoint
72+
## 🧪 Testing the Embeddings Endpoint
6673

67-
To test the embeddings endpoint, the repository includes an [embeddings.ipynb](https://github.com/limcheekin/open-text-embeddings/blob/main/embeddings.ipynb) notebook with a LangChain-compatible `OpenAIEmbeddings` class.
74+
To test the `embeddings` endpoint, the repository includes an [embeddings.ipynb](https://github.com/limcheekin/open-text-embeddings/blob/main/embeddings.ipynb) notebook with a LangChain-compatible `OpenAIEmbeddings` class.
6875

6976
To get started:
7077

7178
1. Install the dependencies by executing the following command:
7279

7380
```bash
74-
pip install --no-cache-dir open-text-embeddings openai tiktoken
81+
pip install --no-cache-dir open-text-embeddings openai
7582
```
7683

7784
2. Execute the cells in the notebook to test the embeddings endpoint.
7885

79-
## Contributions
86+
## ❓ Known Issues
87+
88+
1. Gzip compression for web request doesn't seems working in AWS Lambda Function.
89+
90+
## 🧑‍💼 Contributing
91+
92+
Contributions are welcome! Please check out the issues on the repository, and feel free to open a pull request.
93+
For more information, please see the [contributing guidelines](CONTRIBUTING.md).
94+
95+
<a href="https://github.com/limcheekin/open-text-embeddings/graphs/contributors">
96+
<img src="https://contrib.rocks/image?repo=limcheekin/open-text-embeddings" />
97+
</a>
8098

8199
Thank you very much for the following contributions:
82100

83101
- [Vokturz](https://github.com/Vokturz) contributed [#2](https://github.com/limcheekin/open-text-embeddings/pull/2): support for CPU/GPU choice and initialization before starting the app.
102+
103+
## 📔 License
104+
105+
This project is licensed under the terms of the MIT license.
106+
107+
## 🗒️ Citation
108+
109+
If you utilize this repository, please consider citing it with:
110+
111+
```
112+
@misc{open-text-embeddings,
113+
author = {Lim Chee Kin},
114+
title = {open-text-embeddings: Open Source Text Embedding Models with OpenAI API-Compatible Endpoint},
115+
year = {2023},
116+
publisher = {GitHub},
117+
journal = {GitHub repository},
118+
howpublished = {\url{https://github.com/limcheekin/open-text-embeddings}},
119+
}
120+
```

embeddings.ipynb

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"id": "278b6c63",
77
"metadata": {},
88
"source": [
9-
"# OpenAI Embeddings"
9+
"# open-text-embeddings"
1010
]
1111
},
1212
{
@@ -16,7 +16,7 @@
1616
"metadata": {},
1717
"outputs": [],
1818
"source": [
19-
"%pip install -U langchain openai tiktoken"
19+
"%pip install -U langchain openai"
2020
]
2121
},
2222
{
@@ -161,17 +161,6 @@
161161
"doc_result[0] == query_result"
162162
]
163163
},
164-
{
165-
"cell_type": "code",
166-
"execution_count": null,
167-
"id": "aaad49f8",
168-
"metadata": {},
169-
"outputs": [],
170-
"source": [
171-
"# if you are behind an explicit proxy, you can use the OPENAI_PROXY environment variable to pass through\n",
172-
"os.environ[\"OPENAI_PROXY\"] = \"http://proxy.yourcompany.com:8080\""
173-
]
174-
},
175164
{
176165
"cell_type": "code",
177166
"execution_count": null,

0 commit comments

Comments
 (0)