Skip to content

Commit

Permalink
added persistence for sentencetransformer models
Browse files Browse the repository at this point in the history
  • Loading branch information
Cyb3rWard0g committed Jan 23, 2025
1 parent ac0e50a commit c21d4c4
Show file tree
Hide file tree
Showing 2 changed files with 182 additions and 38 deletions.
193 changes: 158 additions & 35 deletions cookbook/vectorstores/chroma_sentencetransformers_all-MiniLM-L6-v2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,24 @@
"!pip install floki-ai chromadb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enable Logging"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"logging.basicConfig(level=logging.INFO)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -45,7 +63,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [
{
Expand All @@ -54,7 +72,7 @@
"True"
]
},
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -75,14 +93,26 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:datasets:PyTorch version 2.5.1 available.\n",
"INFO:floki.document.embedder.sentence:Loading SentenceTransformer model from local path: model\n",
"INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: model\n",
"INFO:floki.document.embedder.sentence:Model loaded successfully.\n"
]
}
],
"source": [
"from floki.document.embedder import SentenceTransformerEmbedder\n",
"\n",
"embedding_function = SentenceTransformerEmbedder(\n",
" model=\"all-MiniLM-L6-v2\"\n",
" model=\"all-MiniLM-L6-v2\",\n",
" cache_dir=\"model\"\n",
")"
]
},
Expand All @@ -97,9 +127,17 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:floki.storage.vectorstores.chroma:ChromaVectorStore initialized with collection: example_collection\n"
]
}
],
"source": [
"from floki.storage import ChromaVectorStore\n",
"\n",
Expand Down Expand Up @@ -130,7 +168,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -190,9 +228,30 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:floki.document.embedder.sentence:Generating embeddings for 10 input(s).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "f2de0ae5fbe84c838b47b2cf393ca7ef",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Batches: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
Expand All @@ -217,24 +276,24 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Retrieved documents:\n",
"ID: b6020c96-2c81-452f-b01f-a7143d6aacff, Text: Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to., Metadata: {'location': 'The Shire', 'topic': 'wisdom'}\n",
"ID: f864aba5-1c70-451c-8c9b-e681cb5dc1c2, Text: Frodo: I wish the Ring had never come to me. I wish none of this had happened., Metadata: {'location': 'Moria', 'topic': 'destiny'}\n",
"ID: b3bce064-a6f3-4b8c-9c5b-66f9664b5c4a, Text: Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master., Metadata: {'location': 'Rivendell', 'topic': 'power'}\n",
"ID: 3bd8be9e-8573-4a10-b83b-ab2e46f11045, Text: Sam: I can't carry it for you, but I can carry you!, Metadata: {'location': 'Mount Doom', 'topic': 'friendship'}\n",
"ID: b51c9a0d-4698-46fa-af42-8878dd0466f8, Text: Legolas: A red sun rises. Blood has been spilled this night., Metadata: {'location': 'Rohan', 'topic': 'war'}\n",
"ID: fe633494-08ee-4c8e-86d4-5d54331a9896, Text: Gimli: Certainty of death. Small chance of success. What are we waiting for?, Metadata: {'location': \"Helm's Deep\", 'topic': 'bravery'}\n",
"ID: 6e2676a6-79b7-4837-9c2d-c93aebeb046e, Text: Boromir: One does not simply walk into Mordor., Metadata: {'location': 'Rivendell', 'topic': 'impossible tasks'}\n",
"ID: 2b2aeda6-2629-46d3-8f5c-6bafac9b893f, Text: Galadriel: Even the smallest person can change the course of the future., Metadata: {'location': 'Lothlórien', 'topic': 'hope'}\n",
"ID: 8aee61b6-9e7a-4187-bf1e-b269c27776a6, Text: Théoden: So it begins., Metadata: {'location': \"Helm's Deep\", 'topic': 'battle'}\n",
"ID: 091fa07a-2672-4d6f-adf8-6812990f440f, Text: Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life., Metadata: {'location': 'Rivendell', 'topic': 'sacrifice'}\n"
"ID: b70624bd-e2cd-45c1-91ea-5793e7ca379b, Text: Gandalf: A wizard is never late, Frodo Baggins. Nor is he early; he arrives precisely when he means to., Metadata: {'location': 'The Shire', 'topic': 'wisdom'}\n",
"ID: 9138873e-19a8-4261-bb2a-4dc7cf88160a, Text: Frodo: I wish the Ring had never come to me. I wish none of this had happened., Metadata: {'location': 'Moria', 'topic': 'destiny'}\n",
"ID: 97f0faca-c592-4464-8caf-35a8bf334250, Text: Aragorn: You cannot wield it! None of us can. The One Ring answers to Sauron alone. It has no other master., Metadata: {'location': 'Rivendell', 'topic': 'power'}\n",
"ID: e953a8ad-e73d-41bb-9fc6-275e0abd3c71, Text: Sam: I can't carry it for you, but I can carry you!, Metadata: {'location': 'Mount Doom', 'topic': 'friendship'}\n",
"ID: 7698118d-33b7-4d63-8fc8-c81ef7514d29, Text: Legolas: A red sun rises. Blood has been spilled this night., Metadata: {'location': 'Rohan', 'topic': 'war'}\n",
"ID: 28a85d66-d0ce-4cc0-a60d-2d7978b6b337, Text: Gimli: Certainty of death. Small chance of success. What are we waiting for?, Metadata: {'location': \"Helm's Deep\", 'topic': 'bravery'}\n",
"ID: d5608037-ec12-4fc6-bb4d-cdcc5ef0cad6, Text: Boromir: One does not simply walk into Mordor., Metadata: {'location': 'Rivendell', 'topic': 'impossible tasks'}\n",
"ID: c7f2c3a0-abf3-4077-8a94-791f8ed35c6c, Text: Galadriel: Even the smallest person can change the course of the future., Metadata: {'location': 'Lothlórien', 'topic': 'hope'}\n",
"ID: 91362ebe-c4bd-4c97-a4ae-5a93c637053a, Text: Théoden: So it begins., Metadata: {'location': \"Helm's Deep\", 'topic': 'battle'}\n",
"ID: 34a3b15c-52a4-46ee-8c84-da177e2639f0, Text: Elrond: The strength of the Ring-bearer is failing. In his heart, Frodo begins to understand. The quest will claim his life., Metadata: {'location': 'Rivendell', 'topic': 'sacrifice'}\n"
]
}
],
Expand All @@ -257,14 +316,35 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:floki.document.embedder.sentence:Generating embeddings for 1 input(s).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "46a7ca139eaa4bc8b95616d3405e0b1e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Batches: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Updated document: [{'id': 'b6020c96-2c81-452f-b01f-a7143d6aacff', 'metadata': {'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}, 'document': 'Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.'}]\n"
"Updated document: [{'id': 'b70624bd-e2cd-45c1-91ea-5793e7ca379b', 'metadata': {'location': 'Fangorn Forest', 'topic': 'hope and wisdom'}, 'document': 'Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.'}]\n"
]
}
],
Expand Down Expand Up @@ -296,7 +376,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 9,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -327,9 +407,30 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:floki.document.embedder.sentence:Generating embeddings for 1 input(s).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2ff73c84b07b44669fea90d946f61696",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Batches: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
Expand Down Expand Up @@ -363,9 +464,31 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 11,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:floki.document.embedder.sentence:Generating embeddings for 1 input(s).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "065019811989450ca349856d3ee75d36",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Batches: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Search for documents with specific metadata filters\n",
"filter_conditions = {\n",
Expand All @@ -380,13 +503,13 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ids': [['b6020c96-2c81-452f-b01f-a7143d6aacff']],\n",
"{'ids': [['b70624bd-e2cd-45c1-91ea-5793e7ca379b']],\n",
" 'embeddings': None,\n",
" 'documents': [['Gandalf: Even the wisest cannot foresee all ends, but hope remains while the Company is true.']],\n",
" 'uris': None,\n",
Expand All @@ -398,7 +521,7 @@
" <IncludeEnum.metadatas: 'metadatas'>]}"
]
},
"execution_count": 13,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -418,16 +541,16 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['example_collection']"
"[Collection(name=example_collection)]"
]
},
"execution_count": 14,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -438,7 +561,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -448,7 +571,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 15,
"metadata": {},
"outputs": [
{
Expand All @@ -457,7 +580,7 @@
"[]"
]
},
"execution_count": 16,
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
Expand Down
Loading

0 comments on commit c21d4c4

Please sign in to comment.