Skip to content

Commit f47874f

Browse files
authored
Merge pull request #16 from davidhou17/DOCSP-48649
(DOCSP-48649): Create notebook for GraphRAG with Langchain and MongoDB
2 parents bb68147 + 063e14e commit f47874f

File tree

2 files changed

+286
-8
lines changed

2 files changed

+286
-8
lines changed
Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "b5dcbf95-9a30-416d-afed-d5b2bf0e8651",
6+
"metadata": {},
7+
"source": [
8+
"# GraphRAG with MongoDB and LangChain\n",
9+
"\n",
10+
"This notebook is a companion to the [GraphRAG with MongoDB and LangChain](https://www.mongodb.com/docs/atlas/atlas-vector-search/ai-integrations/langchain/graph-rag/) tutorial. Refer to the page for set-up instructions and detailed explanations.\n",
11+
"\n",
12+
"This notebook demonstrates a GraphRAG implementation using MongoDB Atlas and LangChain. Compared to vector-based RAG, which structures your data as vector embeddings, GraphRAG structures data as a knowledge graph with entities and their relationships. This enables relationship-aware retrieval and multi-hop reasoning.\n",
13+
"\n",
14+
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/ai-integrations/langchain-graphrag.ipynb\">\n",
15+
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
16+
"</a>"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": null,
22+
"id": "23f70093-83ea-4ecc-87db-2f2f89e546d7",
23+
"metadata": {
24+
"scrolled": true
25+
},
26+
"outputs": [],
27+
"source": [
28+
"pip install --quiet --upgrade pymongo langchain_community wikipedia langchain_openai langchain_mongodb pyvis"
29+
]
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"id": "d96955f9-a370-4f45-970d-ef187ee6195c",
34+
"metadata": {},
35+
"source": [
36+
"## Set up your environment\n",
37+
"\n",
38+
"Before you begin, make sure you have the following:\n",
39+
"\n",
40+
"- An Atlas cluster up and running (you'll need the [connection string](https://www.mongodb.com/docs/guides/atlas/connection-string/))\n",
41+
"- An API key to access an LLM (This tutorial uses a model from OpenAI, but you can use any model [supported by LangChain](https://python.langchain.com/docs/integrations/chat/))"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"id": "0119b58d-f14e-4f36-a284-345d94478537",
48+
"metadata": {},
49+
"outputs": [],
50+
"source": [
51+
"import os\n",
52+
"\n",
53+
"os.environ[\"OPENAI_API_KEY\"] = \"<api-key>\"\n",
54+
"ATLAS_CONNECTION_STRING = \"<connection-string>\"\n",
55+
"ATLAS_DB_NAME = \"langchain_db\" # MongoDB database to store the knowledge graph\n",
56+
"ATLAS_COLLECTION = \"wikipedia\" # MongoDB collection to store the knowledge graph"
57+
]
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"id": "0adf66a8",
62+
"metadata": {},
63+
"source": [
64+
"## Use MongoDB Atlas as a knowledge graph\n",
65+
"\n",
66+
"Use the `MongoDBGraphStore` component to store your data as a knowledge graph. This component allows you to implement GraphRAG by storing entities (nodes) and their relationships (edges) in a MongoDB collection. It stores each entity as a document with relationship fields that reference other documents in your collection."
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"execution_count": null,
72+
"id": "f4e8db2f-d918-41aa-92f8-41f80a6d747a",
73+
"metadata": {},
74+
"outputs": [],
75+
"source": [
76+
"from langchain_openai import OpenAI\n",
77+
"from langchain.chat_models import init_chat_model\n",
78+
"\n",
79+
"# For best results, use latest models such as gpt-4o and Claude Sonnet 3.5+, etc.\n",
80+
"chat_model = init_chat_model(\"gpt-4o\", model_provider=\"openai\", temperature=0)"
81+
]
82+
},
83+
{
84+
"cell_type": "code",
85+
"execution_count": null,
86+
"id": "72cd5c08-e17b-4f47-bca7-ded0fb25fb85",
87+
"metadata": {},
88+
"outputs": [],
89+
"source": [
90+
"from langchain_community.document_loaders import WikipediaLoader\n",
91+
"from langchain.text_splitter import TokenTextSplitter\n",
92+
"\n",
93+
"# Load Wikipedia pages corresponding to the query \"Sherlock Holmes\"\n",
94+
"wikipedia_pages = WikipediaLoader(query=\"Sherlock Holmes\", load_max_docs=3).load()\n",
95+
"\n",
96+
"# Split the documents into chunks for efficient downstream processing (graph creation)\n",
97+
"text_splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)\n",
98+
"wikipedia_docs = text_splitter.split_documents(wikipedia_pages)\n",
99+
"\n",
100+
"# Print the first document\n",
101+
"wikipedia_docs[0]"
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": null,
107+
"id": "2dc8f05b-0f9a-4293-b9ea-761030c98dca",
108+
"metadata": {},
109+
"outputs": [],
110+
"source": [
111+
"from langchain_mongodb.graphrag.graph import MongoDBGraphStore\n",
112+
"\n",
113+
"graph_store = MongoDBGraphStore(\n",
114+
" connection_string = ATLAS_CONNECTION_STRING,\n",
115+
" database_name = ATLAS_DB_NAME,\n",
116+
" collection_name = ATLAS_COLLECTION,\n",
117+
" entity_extraction_model = chat_model\n",
118+
")"
119+
]
120+
},
121+
{
122+
"cell_type": "code",
123+
"execution_count": null,
124+
"id": "3664189e",
125+
"metadata": {},
126+
"outputs": [],
127+
"source": [
128+
"# Extract entities and create knowledge graph in Atlas\n",
129+
"# This might take a few minutes; you can ignore any warnings\n",
130+
"graph_store.add_documents(wikipedia_docs)"
131+
]
132+
},
133+
{
134+
"cell_type": "markdown",
135+
"id": "b167c2eb-b2c5-45ef-bdc9-8230f7da4c52",
136+
"metadata": {},
137+
"source": [
138+
"## Visualize the knowledge graph\n",
139+
"\n",
140+
"To visualize the knowledge graph, you can export the structured data to a visualization library like `pyvis`.\n",
141+
"This helps you to explore and understand the relationships and hierarchies within your data."
142+
]
143+
},
144+
{
145+
"cell_type": "code",
146+
"execution_count": null,
147+
"id": "8b515723-a8a4-435b-b386-5cb3244c2745",
148+
"metadata": {},
149+
"outputs": [],
150+
"source": [
151+
"import networkx as nx\n",
152+
"from pyvis.network import Network\n",
153+
"\n",
154+
"def visualize_graph(collection):\n",
155+
" docs = list(collection.find())\n",
156+
" \n",
157+
" def format_attributes(attrs):\n",
158+
" return \"<br>\".join(f\"{k}: {', '.join(v)}\" for k, v in attrs.items()) if attrs else \"\"\n",
159+
" \n",
160+
" G = nx.DiGraph()\n",
161+
"\n",
162+
" # Create nodes\n",
163+
" for doc in docs:\n",
164+
" node_id = str(doc[\"_id\"])\n",
165+
" info = f\"Type: {doc.get('type', '')}\"\n",
166+
" if \"attributes\" in doc:\n",
167+
" attr_info = format_attributes(doc[\"attributes\"])\n",
168+
" if attr_info:\n",
169+
" info += \"<br>\" + attr_info\n",
170+
" G.add_node(node_id, label=node_id, title=info.replace(\"<br>\", \"\\n\"))\n",
171+
"\n",
172+
" # Create edges\n",
173+
" for doc in docs:\n",
174+
" source = str(doc[\"_id\"])\n",
175+
" rels = doc.get(\"relationships\", {})\n",
176+
" targets = rels.get(\"target_ids\", [])\n",
177+
" types = rels.get(\"types\", [])\n",
178+
" attrs = rels.get(\"attributes\", [])\n",
179+
" \n",
180+
" for i, target in enumerate(targets):\n",
181+
" edge_type = types[i] if i < len(types) else \"\"\n",
182+
" extra = attrs[i] if i < len(attrs) else {}\n",
183+
" edge_info = f\"Relationship: {edge_type}\"\n",
184+
" if extra:\n",
185+
" edge_info += \"<br>\" + format_attributes(extra)\n",
186+
" G.add_edge(source, str(target), label=edge_type, title=edge_info.replace(\"<br>\", \"\\n\"))\n",
187+
"\n",
188+
" # Build and configure network\n",
189+
" nt = Network(notebook=True, cdn_resources='in_line', width=\"800px\", height=\"600px\", directed=True)\n",
190+
" nt.from_nx(G)\n",
191+
" nt.set_options('''\n",
192+
" var options = {\n",
193+
" \"interaction\": {\n",
194+
" \"hover\": true,\n",
195+
" \"tooltipDelay\": 200\n",
196+
" },\n",
197+
" \"nodes\": {\n",
198+
" \"font\": {\"multi\": \"html\"}\n",
199+
" },\n",
200+
" \"physics\": {\n",
201+
" \"repulsion\": {\n",
202+
" \"nodeDistance\": 300,\n",
203+
" \"centralGravity\": 0.2,\n",
204+
" \"springLength\": 200,\n",
205+
" \"springStrength\": 0.05,\n",
206+
" \"damping\": 0.09\n",
207+
" }\n",
208+
" }\n",
209+
" }\n",
210+
" ''')\n",
211+
"\n",
212+
" return nt.generate_html()"
213+
]
214+
},
215+
{
216+
"cell_type": "code",
217+
"execution_count": null,
218+
"id": "62f9040e",
219+
"metadata": {},
220+
"outputs": [],
221+
"source": [
222+
"from IPython.display import HTML, display\n",
223+
"from pymongo import MongoClient\n",
224+
"\n",
225+
"client = MongoClient(ATLAS_CONNECTION_STRING)\n",
226+
"\n",
227+
"collection = client[ATLAS_DB_NAME][ATLAS_COLLECTION]\n",
228+
"html = visualize_graph(collection)\n",
229+
"\n",
230+
"display(HTML(html))"
231+
]
232+
},
233+
{
234+
"cell_type": "markdown",
235+
"id": "fbea568d-c656-4271-9e40-6ee01292255e",
236+
"metadata": {},
237+
"source": [
238+
"## Answer questions on your data\n",
239+
"\n",
240+
"The `MongoDBGraphStore` class provides a `chat_response` method that you can use to answer questions on your data. It executes queries by using the `$graphLookup` aggregation stage."
241+
]
242+
},
243+
{
244+
"cell_type": "code",
245+
"execution_count": null,
246+
"id": "506c7366-972c-4e50-88c4-3d5b0151e363",
247+
"metadata": {},
248+
"outputs": [],
249+
"source": [
250+
"query = \"Who inspired Sherlock Holmes?\"\n",
251+
"\n",
252+
"answer = graph_store.chat_response(query)\n",
253+
"answer.content"
254+
]
255+
}
256+
],
257+
"metadata": {
258+
"kernelspec": {
259+
"display_name": "Python 3 (ipykernel)",
260+
"language": "python",
261+
"name": "python3"
262+
},
263+
"language_info": {
264+
"codemirror_mode": {
265+
"name": "ipython",
266+
"version": 3
267+
},
268+
"file_extension": ".py",
269+
"mimetype": "text/x-python",
270+
"name": "python",
271+
"nbconvert_exporter": "python",
272+
"pygments_lexer": "ipython3",
273+
"version": "3.9.6"
274+
}
275+
},
276+
"nbformat": 4,
277+
"nbformat_minor": 5
278+
}

ai-integrations/llamaindex.ipynb

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@
8484
"source": [
8585
"# Load the sample data\n",
8686
"mkdir -p 'data/'\n",
87-
"wget 'https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4HkJP' -O 'data/atlas_best_practices.pdf'"
87+
"wget 'https://investors.mongodb.com/node/13176/pdf' -O 'data/mongodb-earnings-report.pdf'"
8888
]
8989
},
9090
{
@@ -93,7 +93,7 @@
9393
"metadata": {},
9494
"outputs": [],
9595
"source": [
96-
"sample_data = SimpleDirectoryReader(input_files=[\"./data/atlas_best_practices.pdf\"]).load_data()\n",
96+
"sample_data = SimpleDirectoryReader(input_files=[\"./data/mongodb-earnings-report.pdf\"]).load_data()\n",
9797
"\n",
9898
"# Print the first document\n",
9999
"sample_data[0]"
@@ -176,7 +176,7 @@
176176
"outputs": [],
177177
"source": [
178178
"retriever = vector_store_index.as_retriever(similarity_top_k=3)\n",
179-
"nodes = retriever.retrieve(\"MongoDB Atlas security\")\n",
179+
"nodes = retriever.retrieve(\"MongoDB acquisition\")\n",
180180
"\n",
181181
"for node in nodes:\n",
182182
" print(node)"
@@ -197,10 +197,10 @@
197197
"source": [
198198
"# Specify metadata filters\n",
199199
"metadata_filters = MetadataFilters(\n",
200-
" filters=[ExactMatchFilter(key=\"metadata.page_label\", value=\"17\")]\n",
200+
" filters=[ExactMatchFilter(key=\"metadata.page_label\", value=\"2\")]\n",
201201
")\n",
202202
"retriever = vector_store_index.as_retriever(similarity_top_k=3, filters=metadata_filters)\n",
203-
"nodes = retriever.retrieve(\"MongoDB Atlas security\")\n",
203+
"nodes = retriever.retrieve(\"MongoDB acquisition\")\n",
204204
"\n",
205205
"for node in nodes:\n",
206206
" print(node)"
@@ -226,7 +226,7 @@
226226
"query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)\n",
227227
"\n",
228228
"# Prompt the LLM\n",
229-
"response = query_engine.query('How can I secure my MongoDB Atlas cluster?')\n",
229+
"response = query_engine.query(\"What was MongoDB's latest acquisition?\")\n",
230230
"\n",
231231
"print(response)\n",
232232
"print(\"\\nSource documents: \")\n",
@@ -248,7 +248,7 @@
248248
"source": [
249249
"# Specify metadata filters\n",
250250
"metadata_filters = MetadataFilters(\n",
251-
" filters=[ExactMatchFilter(key=\"metadata.page_label\", value=\"17\")]\n",
251+
" filters=[ExactMatchFilter(key=\"metadata.page_label\", value=\"2\")]\n",
252252
")\n",
253253
"\n",
254254
"# Instantiate Atlas Vector Search as a retriever\n",
@@ -258,7 +258,7 @@
258258
"query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)\n",
259259
"\n",
260260
"# Prompt the LLM\n",
261-
"response = query_engine.query('How can I secure my MongoDB Atlas cluster?')\n",
261+
"response = query_engine.query(\"What was MongoDB's latest acquisition?\")\n",
262262
"\n",
263263
"print(response)\n",
264264
"print(\"\\nSource documents: \")\n",

0 commit comments

Comments
 (0)