|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "b5dcbf95-9a30-416d-afed-d5b2bf0e8651", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# GraphRAG with MongoDB and LangChain\n", |
| 9 | + "\n", |
| 10 | + "This notebook is a companion to the [GraphRAG with MongoDB and LangChain](https://www.mongodb.com/docs/atlas/atlas-vector-search/ai-integrations/langchain/graph-rag/) tutorial. Refer to the page for set-up instructions and detailed explanations.\n", |
| 11 | + "\n", |
| 12 | + "This notebook demonstrates a GraphRAG implementation using MongoDB Atlas and LangChain. Compared to vector-based RAG, which structures your data as vector embeddings, GraphRAG structures data as a knowledge graph with entities and their relationships. This enables relationship-aware retrieval and multi-hop reasoning.\n", |
| 13 | + "\n", |
| 14 | + "<a target=\"_blank\" href=\"https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/ai-integrations/langchain-graphrag.ipynb\">\n", |
| 15 | + " <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n", |
| 16 | + "</a>" |
| 17 | + ] |
| 18 | + }, |
| 19 | + { |
| 20 | + "cell_type": "code", |
| 21 | + "execution_count": null, |
| 22 | + "id": "23f70093-83ea-4ecc-87db-2f2f89e546d7", |
| 23 | + "metadata": { |
| 24 | + "scrolled": true |
| 25 | + }, |
| 26 | + "outputs": [], |
| 27 | + "source": [ |
| 28 | + "pip install --quiet --upgrade pymongo langchain_community wikipedia langchain_openai langchain_mongodb pyvis" |
| 29 | + ] |
| 30 | + }, |
| 31 | + { |
| 32 | + "cell_type": "markdown", |
| 33 | + "id": "d96955f9-a370-4f45-970d-ef187ee6195c", |
| 34 | + "metadata": {}, |
| 35 | + "source": [ |
| 36 | + "## Set up your environment\n", |
| 37 | + "\n", |
| 38 | + "Before you begin, make sure you have the following:\n", |
| 39 | + "\n", |
| 40 | + "- An Atlas cluster up and running (you'll need the [connection string](https://www.mongodb.com/docs/guides/atlas/connection-string/))\n", |
| 41 | + "- An API key to access an LLM (This tutorial uses a model from OpenAI, but you can use any model [supported by LangChain](https://python.langchain.com/docs/integrations/chat/))" |
| 42 | + ] |
| 43 | + }, |
| 44 | + { |
| 45 | + "cell_type": "code", |
| 46 | + "execution_count": null, |
| 47 | + "id": "0119b58d-f14e-4f36-a284-345d94478537", |
| 48 | + "metadata": {}, |
| 49 | + "outputs": [], |
| 50 | + "source": [ |
| 51 | + "import os\n", |
| 52 | + "\n", |
| 53 | + "os.environ[\"OPENAI_API_KEY\"] = \"<api-key>\"\n", |
| 54 | + "ATLAS_CONNECTION_STRING = \"<connection-string>\"\n", |
| 55 | + "ATLAS_DB_NAME = \"langchain_db\" # MongoDB database to store the knowledge graph\n", |
| 56 | + "ATLAS_COLLECTION = \"wikipedia\" # MongoDB collection to store the knowledge graph" |
| 57 | + ] |
| 58 | + }, |
| 59 | + { |
| 60 | + "cell_type": "markdown", |
| 61 | + "id": "0adf66a8", |
| 62 | + "metadata": {}, |
| 63 | + "source": [ |
| 64 | + "## Use MongoDB Atlas as a knowledge graph\n", |
| 65 | + "\n", |
| 66 | + "Use the `MongoDBGraphStore` component to store your data as a knowledge graph. This component allows you to implement GraphRAG by storing entities (nodes) and their relationships (edges) in a MongoDB collection. It stores each entity as a document with relationship fields that reference other documents in your collection." |
| 67 | + ] |
| 68 | + }, |
| 69 | + { |
| 70 | + "cell_type": "code", |
| 71 | + "execution_count": null, |
| 72 | + "id": "f4e8db2f-d918-41aa-92f8-41f80a6d747a", |
| 73 | + "metadata": {}, |
| 74 | + "outputs": [], |
| 75 | + "source": [ |
| 76 | + "from langchain_openai import OpenAI\n", |
| 77 | + "from langchain.chat_models import init_chat_model\n", |
| 78 | + "\n", |
| 79 | + "# For best results, use latest models such as gpt-4o and Claude Sonnet 3.5+, etc.\n", |
| 80 | + "chat_model = init_chat_model(\"gpt-4o\", model_provider=\"openai\", temperature=0)" |
| 81 | + ] |
| 82 | + }, |
| 83 | + { |
| 84 | + "cell_type": "code", |
| 85 | + "execution_count": null, |
| 86 | + "id": "72cd5c08-e17b-4f47-bca7-ded0fb25fb85", |
| 87 | + "metadata": {}, |
| 88 | + "outputs": [], |
| 89 | + "source": [ |
| 90 | + "from langchain_community.document_loaders import WikipediaLoader\n", |
| 91 | + "from langchain.text_splitter import TokenTextSplitter\n", |
| 92 | + "\n", |
| 93 | + "# Load Wikipedia pages corresponding to the query \"Sherlock Holmes\"\n", |
| 94 | + "wikipedia_pages = WikipediaLoader(query=\"Sherlock Holmes\", load_max_docs=3).load()\n", |
| 95 | + "\n", |
| 96 | + "# Split the documents into chunks for efficient downstream processing (graph creation)\n", |
| 97 | + "text_splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)\n", |
| 98 | + "wikipedia_docs = text_splitter.split_documents(wikipedia_pages)\n", |
| 99 | + "\n", |
| 100 | + "# Print the first document\n", |
| 101 | + "wikipedia_docs[0]" |
| 102 | + ] |
| 103 | + }, |
| 104 | + { |
| 105 | + "cell_type": "code", |
| 106 | + "execution_count": null, |
| 107 | + "id": "2dc8f05b-0f9a-4293-b9ea-761030c98dca", |
| 108 | + "metadata": {}, |
| 109 | + "outputs": [], |
| 110 | + "source": [ |
| 111 | + "from langchain_mongodb.graphrag.graph import MongoDBGraphStore\n", |
| 112 | + "\n", |
| 113 | + "graph_store = MongoDBGraphStore(\n", |
| 114 | + " connection_string = ATLAS_CONNECTION_STRING,\n", |
| 115 | + " database_name = ATLAS_DB_NAME,\n", |
| 116 | + " collection_name = ATLAS_COLLECTION,\n", |
| 117 | + " entity_extraction_model = chat_model\n", |
| 118 | + ")" |
| 119 | + ] |
| 120 | + }, |
| 121 | + { |
| 122 | + "cell_type": "code", |
| 123 | + "execution_count": null, |
| 124 | + "id": "3664189e", |
| 125 | + "metadata": {}, |
| 126 | + "outputs": [], |
| 127 | + "source": [ |
| 128 | + "# Extract entities and create knowledge graph in Atlas\n", |
| 129 | + "# This might take a few minutes; you can ignore any warnings\n", |
| 130 | + "graph_store.add_documents(wikipedia_docs)" |
| 131 | + ] |
| 132 | + }, |
| 133 | + { |
| 134 | + "cell_type": "markdown", |
| 135 | + "id": "b167c2eb-b2c5-45ef-bdc9-8230f7da4c52", |
| 136 | + "metadata": {}, |
| 137 | + "source": [ |
| 138 | + "## Visualize the knowledge graph\n", |
| 139 | + "\n", |
| 140 | + "To visualize the knowledge graph, you can export the structured data to a visualization library like `pyvis`.\n", |
| 141 | + "This helps you to explore and understand the relationships and hierarchies within your data." |
| 142 | + ] |
| 143 | + }, |
| 144 | + { |
| 145 | + "cell_type": "code", |
| 146 | + "execution_count": null, |
| 147 | + "id": "8b515723-a8a4-435b-b386-5cb3244c2745", |
| 148 | + "metadata": {}, |
| 149 | + "outputs": [], |
| 150 | + "source": [ |
| 151 | + "import networkx as nx\n", |
| 152 | + "from pyvis.network import Network\n", |
| 153 | + "\n", |
| 154 | + "def visualize_graph(collection):\n", |
| 155 | + " docs = list(collection.find())\n", |
| 156 | + " \n", |
| 157 | + " def format_attributes(attrs):\n", |
| 158 | + " return \"<br>\".join(f\"{k}: {', '.join(v)}\" for k, v in attrs.items()) if attrs else \"\"\n", |
| 159 | + " \n", |
| 160 | + " G = nx.DiGraph()\n", |
| 161 | + "\n", |
| 162 | + " # Create nodes\n", |
| 163 | + " for doc in docs:\n", |
| 164 | + " node_id = str(doc[\"_id\"])\n", |
| 165 | + " info = f\"Type: {doc.get('type', '')}\"\n", |
| 166 | + " if \"attributes\" in doc:\n", |
| 167 | + " attr_info = format_attributes(doc[\"attributes\"])\n", |
| 168 | + " if attr_info:\n", |
| 169 | + " info += \"<br>\" + attr_info\n", |
| 170 | + " G.add_node(node_id, label=node_id, title=info.replace(\"<br>\", \"\\n\"))\n", |
| 171 | + "\n", |
| 172 | + " # Create edges\n", |
| 173 | + " for doc in docs:\n", |
| 174 | + " source = str(doc[\"_id\"])\n", |
| 175 | + " rels = doc.get(\"relationships\", {})\n", |
| 176 | + " targets = rels.get(\"target_ids\", [])\n", |
| 177 | + " types = rels.get(\"types\", [])\n", |
| 178 | + " attrs = rels.get(\"attributes\", [])\n", |
| 179 | + " \n", |
| 180 | + " for i, target in enumerate(targets):\n", |
| 181 | + " edge_type = types[i] if i < len(types) else \"\"\n", |
| 182 | + " extra = attrs[i] if i < len(attrs) else {}\n", |
| 183 | + " edge_info = f\"Relationship: {edge_type}\"\n", |
| 184 | + " if extra:\n", |
| 185 | + " edge_info += \"<br>\" + format_attributes(extra)\n", |
| 186 | + " G.add_edge(source, str(target), label=edge_type, title=edge_info.replace(\"<br>\", \"\\n\"))\n", |
| 187 | + "\n", |
| 188 | + " # Build and configure network\n", |
| 189 | + " nt = Network(notebook=True, cdn_resources='in_line', width=\"800px\", height=\"600px\", directed=True)\n", |
| 190 | + " nt.from_nx(G)\n", |
| 191 | + " nt.set_options('''\n", |
| 192 | + " var options = {\n", |
| 193 | + " \"interaction\": {\n", |
| 194 | + " \"hover\": true,\n", |
| 195 | + " \"tooltipDelay\": 200\n", |
| 196 | + " },\n", |
| 197 | + " \"nodes\": {\n", |
| 198 | + " \"font\": {\"multi\": \"html\"}\n", |
| 199 | + " },\n", |
| 200 | + " \"physics\": {\n", |
| 201 | + " \"repulsion\": {\n", |
| 202 | + " \"nodeDistance\": 300,\n", |
| 203 | + " \"centralGravity\": 0.2,\n", |
| 204 | + " \"springLength\": 200,\n", |
| 205 | + " \"springStrength\": 0.05,\n", |
| 206 | + " \"damping\": 0.09\n", |
| 207 | + " }\n", |
| 208 | + " }\n", |
| 209 | + " }\n", |
| 210 | + " ''')\n", |
| 211 | + "\n", |
| 212 | + " return nt.generate_html()" |
| 213 | + ] |
| 214 | + }, |
| 215 | + { |
| 216 | + "cell_type": "code", |
| 217 | + "execution_count": null, |
| 218 | + "id": "62f9040e", |
| 219 | + "metadata": {}, |
| 220 | + "outputs": [], |
| 221 | + "source": [ |
| 222 | + "from IPython.display import HTML, display\n", |
| 223 | + "from pymongo import MongoClient\n", |
| 224 | + "\n", |
| 225 | + "client = MongoClient(ATLAS_CONNECTION_STRING)\n", |
| 226 | + "\n", |
| 227 | + "collection = client[ATLAS_DB_NAME][ATLAS_COLLECTION]\n", |
| 228 | + "html = visualize_graph(collection)\n", |
| 229 | + "\n", |
| 230 | + "display(HTML(html))" |
| 231 | + ] |
| 232 | + }, |
| 233 | + { |
| 234 | + "cell_type": "markdown", |
| 235 | + "id": "fbea568d-c656-4271-9e40-6ee01292255e", |
| 236 | + "metadata": {}, |
| 237 | + "source": [ |
| 238 | + "## Answer questions on your data\n", |
| 239 | + "\n", |
| 240 | + "The `MongoDBGraphStore` class provides a `chat_response` method that you can use to answer questions on your data. It executes queries by using the `$graphLookup` aggregation stage." |
| 241 | + ] |
| 242 | + }, |
| 243 | + { |
| 244 | + "cell_type": "code", |
| 245 | + "execution_count": null, |
| 246 | + "id": "506c7366-972c-4e50-88c4-3d5b0151e363", |
| 247 | + "metadata": {}, |
| 248 | + "outputs": [], |
| 249 | + "source": [ |
| 250 | + "query = \"Who inspired Sherlock Holmes?\"\n", |
| 251 | + "\n", |
| 252 | + "answer = graph_store.chat_response(query)\n", |
| 253 | + "answer.content" |
| 254 | + ] |
| 255 | + } |
| 256 | + ], |
| 257 | + "metadata": { |
| 258 | + "kernelspec": { |
| 259 | + "display_name": "Python 3 (ipykernel)", |
| 260 | + "language": "python", |
| 261 | + "name": "python3" |
| 262 | + }, |
| 263 | + "language_info": { |
| 264 | + "codemirror_mode": { |
| 265 | + "name": "ipython", |
| 266 | + "version": 3 |
| 267 | + }, |
| 268 | + "file_extension": ".py", |
| 269 | + "mimetype": "text/x-python", |
| 270 | + "name": "python", |
| 271 | + "nbconvert_exporter": "python", |
| 272 | + "pygments_lexer": "ipython3", |
| 273 | + "version": "3.9.6" |
| 274 | + } |
| 275 | + }, |
| 276 | + "nbformat": 4, |
| 277 | + "nbformat_minor": 5 |
| 278 | +} |
0 commit comments