Updating code blocks

ajosh0504 · ajosh0504 · commit aa3dd7263e2f · 2024-07-16T15:53:16.000-07:00
diff --git a/docs/50-prepare-the-data/2-load-data.mdx b/docs/50-prepare-the-data/2-load-data.mdx
@@ -1,5 +1,5 @@
 # 👐 Load the dataset
 
-First, let's download the dataset for our lab. We'll use four RAG-focused blogs from our Developer Center as the source data for our RAG application.
+First, let's download the dataset for our lab. We'll use a subset of articles from the MongoDB Developer Center as the source data for our RAG application.
 
-Run all the cells under the **Step 3: Load the dataset** section in the notebook to load the blog content as LangChain Document objects.
+Run all the cells under the **Step 3: Load the dataset** section in the notebook to load the articles as a list of Python objects consisting of the content and relevant metadata.
diff --git a/docs/50-prepare-the-data/3-chunk-data.mdx b/docs/50-prepare-the-data/3-chunk-data.mdx
@@ -2,7 +2,7 @@
 
 Since we are working with large documents, we first need to break them up into smaller chunks before embedding and storing them in MongoDB.
 
-Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 4: Chunk up the data** section in the notebook to chunk up the documents we loaded.
+Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 4: Chunk up the data** section in the notebook to chunk up the articles we loaded.
 
 The answers for code blocks in this section are as follows:
 
@@ -13,7 +13,7 @@ The answers for code blocks in this section are as follows:
 <div>
 ```python
 RecursiveCharacterTextSplitter.from_tiktoken_encoder(
-    encoding_name="cl100k_base", chunk_size=200, chunk_overlap=30
+    encoding_name="cl100k_base", separators=separators, chunk_size=200, chunk_overlap=30
 )
 ```
 </div>
@@ -25,7 +25,7 @@ RecursiveCharacterTextSplitter.from_tiktoken_encoder(
 <summary>Answer</summary>
 <div>
 ```python
-text_splitter.split_documents(docs)
+doc[text_field]
 ```
 </div>
 </details>
@@ -36,7 +36,34 @@ text_splitter.split_documents(docs)
 <summary>Answer</summary>
 <div>
 ```python
-doc.dict() for doc in split_docs
+text_splitter.split_text(text)
+```
+</div>
+</details>
+
+**CODE_BLOCK_6**
+
+<details>
+<summary>Answer</summary>
+<div>
+```python
+for chunk in chunks:
+    temp = doc.copy()
+    temp[text_field] = chunk
+    chunked_data.append(temp)
+```
+</div>
+</details>
+
+**CODE_BLOCK_7**
+
+<details>
+<summary>Answer</summary>
+<div>
+```python
+for doc in docs:
+    chunks = get_chunks(doc, "body")
+    split_docs.extend(chunks)
 ```
 </div>
 </details>
diff --git a/docs/50-prepare-the-data/4-embed-data.mdx b/docs/50-prepare-the-data/4-embed-data.mdx
@@ -2,11 +2,11 @@
 
 To perform vector search on our data, we need to embed it (i.e. generate embedding vectors) before ingesting it into MongoDB.
 
-Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 5: Generate embeddings** section in the notebook to generate embeddings for the chunked documents.
+Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 5: Generate embeddings** section in the notebook to embed the chunked articles.
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_6**
+**CODE_BLOCK_8**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
 </div>
 </details>
 
-**CODE_BLOCK_7**
+**CODE_BLOCK_9**
 
 <details>
 <summary>Answer</summary>
@@ -29,15 +29,15 @@ return embedding.tolist()
 </div>
 </details>
 
-**CODE_BLOCK_8**
+**CODE_BLOCK_10**
 
 <details>
 <summary>Answer</summary>
 <div>
 ```python
 for doc in split_docs:
     temp = doc.copy()
-    temp["embedding"] = get_embedding(temp["page_content"])
+    temp["embedding"] = get_embedding(temp["body"])
     embedded_docs.append(temp)
 ```
 </div>
diff --git a/docs/50-prepare-the-data/5-ingest-data.mdx b/docs/50-prepare-the-data/5-ingest-data.mdx
@@ -2,13 +2,13 @@ import Screenshot from "@site/src/components/Screenshot";
 
 # 👐 Ingest data into MongoDB
 
-The final step to build a MongoDB vector store for our RAG application is to ingest the embedded documents into MongoDB.
+The final step to build a MongoDB vector store for our RAG application is to ingest the embedded article chunks into MongoDB.
 
 Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 6: Ingest data into MongoDB** section in the notebook to ingest the embedded documents into MongoDB.
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_9**
+**CODE_BLOCK_11**
 
 <details>
 <summary>Answer</summary>
@@ -19,7 +19,7 @@ MongoClient(MONGODB_URI)
 </div>
 </details>
 
-**CODE_BLOCK_10**
+**CODE_BLOCK_12**
 
 <details>
 <summary>Answer</summary>
@@ -30,7 +30,7 @@ mongo_client[DB_NAME][COLLECTION_NAME]
 </div>
 </details>
 
-**CODE_BLOCK_11**
+**CODE_BLOCK_13**
 
 <details>
 <summary>Answer</summary>
@@ -41,7 +41,7 @@ collection.delete_many({})
 </div>
 </details>
 
-**CODE_BLOCK_12**
+**CODE_BLOCK_14**
 
 <details>
 <summary>Answer</summary>
diff --git a/docs/60-perform-semantic-search/3-vector-search.mdx b/docs/60-perform-semantic-search/3-vector-search.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 8:
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_13**
+**CODE_BLOCK_15**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ get_embedding(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_14**
+**CODE_BLOCK_16**
 
 <details>
 <summary>Answer</summary>
@@ -36,7 +36,7 @@ get_embedding(user_query)
     {
         "$project": {
             "_id": 0,
-            "page_content": 1,
+            "body": 1,
             "score": {"$meta": "vectorSearchScore"},
         }
     },
@@ -45,7 +45,7 @@ get_embedding(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_15**
+**CODE_BLOCK_17**
 
 <details>
 <summary>Answer</summary>
diff --git a/docs/60-perform-semantic-search/4-pre-filtering.mdx b/docs/60-perform-semantic-search/4-pre-filtering.mdx
@@ -10,7 +10,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_16**
+**CODE_BLOCK_18**
 
 <details>
 <summary>Answer</summary>
@@ -25,7 +25,7 @@ The answers for code blocks in this section are as follows:
             "type": "vector"
         },
         {
-            "path": "metadata.language"
+            "path": "metadata.contentType",
             "type": "filter"
         }
     ]
@@ -34,7 +34,7 @@ The answers for code blocks in this section are as follows:
 </div>
 </details>
 
-**CODE_BLOCK_17**
+**CODE_BLOCK_19**
 
 <details>
 <summary>Answer</summary>
@@ -48,13 +48,13 @@ The answers for code blocks in this section are as follows:
             "path": "embedding",
             "numCandidates": 150,
             "limit": 5,
-            "filter": {"metadata.language": "en"}
+            "filter": {"metadata.contentType": "Video"}
         }
     },
     {
         "$project": {
             "_id": 0,
-            "page_content": 1,
+            "body": 1,
             "score": {"$meta": "vectorSearchScore"}
         }
     }
@@ -63,7 +63,7 @@ The answers for code blocks in this section are as follows:
 </div>
 </details>
 
-**CODE_BLOCK_18**
+**CODE_BLOCK_20**
 
 <details>
 <summary>Answer</summary>
@@ -78,11 +78,11 @@ The answers for code blocks in this section are as follows:
             "type": "vector"
         },
         {
-            "path": "metadata.language"
+            "path": "metadata.contentType",
             "type": "filter"
         },
         {
-            "path": "type"
+            "path": "updated",
             "type": "filter"
         }
     ]
@@ -91,7 +91,7 @@ The answers for code blocks in this section are as follows:
 </div>
 </details>
 
-**CODE_BLOCK_19**
+**CODE_BLOCK_21**
 
 <details>
 <summary>Answer</summary>
@@ -107,16 +107,16 @@ The answers for code blocks in this section are as follows:
             "limit": 5,
             "filter": {
                 "$and": [
-                    {"metadata.language": "en"},
-                    {"type": "Document"}
+                    {"metadata.contentType": "Video"},
+                    {"updated": {"$gte": "2024-05-20"}}
                 ]
             }
         }
     },
     {
         "$project": {
             "_id": 0,
-            "page_content": 1,
+            "body": 1,
             "score": {"$meta": "vectorSearchScore"}
         }
     }
diff --git a/docs/70-build-rag-app/2-build-rag-app.mdx b/docs/70-build-rag-app/2-build-rag-app.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 9:
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_20**
+**CODE_BLOCK_22**
 
 <details>
 <summary>Answer</summary>
@@ -17,18 +17,18 @@ vector_search(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_21**
+**CODE_BLOCK_23**
 
 <details>
 <summary>Answer</summary>
 <div>
 ```python
-"\n\n".join([d.get("page_content", "") for d in context])
+"\n\n".join([d.get("body", "") for d in context])
 ```
 </div>
 </details>
 
-**CODE_BLOCK_22**
+**CODE_BLOCK_24**
 
 <details>
 <summary>Answer</summary>
diff --git a/docs/70-build-rag-app/3-stream-responses.mdx b/docs/70-build-rag-app/3-stream-responses.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_23**
+**CODE_BLOCK_25**
 
 <details>
 <summary>Answer</summary>
@@ -27,7 +27,7 @@ fw_client.chat.completions.create(
 </div>
 </details>
 
-**CODE_BLOCK_24**
+**CODE_BLOCK_26**
 
 <details>
 <summary>Answer</summary>
diff --git a/docs/80-add-memory/2-add-memory.mdx b/docs/80-add-memory/2-add-memory.mdx

Original file line number	Diff line number	Diff line change
@@ -10,7 +10,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
`10`	`10`
`11`	`11`	`The answers for code blocks in this section are as follows:`
`12`	`12`
`13`		`-CODE_BLOCK_16`
	`13`	`+CODE_BLOCK_18`
`14`	`14`
`15`	`15`	`<details>`
`16`	`16`	`<summary>Answer</summary>`
`@@ -25,7 +25,7 @@ The answers for code blocks in this section are as follows:`
`25`	`25`	`"type": "vector"`
`26`	`26`	`},`
`27`	`27`	`{`
`28`		`- "path": "metadata.language"`
	`28`	`+ "path": "metadata.contentType",`
`29`	`29`	`"type": "filter"`
`30`	`30`	`}`
`31`	`31`	`]`
`@@ -34,7 +34,7 @@ The answers for code blocks in this section are as follows:`
`34`	`34`	`</div>`
`35`	`35`	`</details>`
`36`	`36`
`37`		`-CODE_BLOCK_17`
	`37`	`+CODE_BLOCK_19`
`38`	`38`
`39`	`39`	`<details>`
`40`	`40`	`<summary>Answer</summary>`
`@@ -48,13 +48,13 @@ The answers for code blocks in this section are as follows:`
`48`	`48`	`"path": "embedding",`
`49`	`49`	`"numCandidates": 150,`
`50`	`50`	`"limit": 5,`
`51`		`- "filter": {"metadata.language": "en"}`
	`51`	`+ "filter": {"metadata.contentType": "Video"}`
`52`	`52`	`}`
`53`	`53`	`},`
`54`	`54`	`{`
`55`	`55`	`"$project": {`
`56`	`56`	`"_id": 0,`
`57`		`- "page_content": 1,`
	`57`	`+ "body": 1,`
`58`	`58`	`"score": {"$meta": "vectorSearchScore"}`
`59`	`59`	`}`
`60`	`60`	`}`
`@@ -63,7 +63,7 @@ The answers for code blocks in this section are as follows:`
`63`	`63`	`</div>`
`64`	`64`	`</details>`
`65`	`65`
`66`		`-CODE_BLOCK_18`
	`66`	`+CODE_BLOCK_20`
`67`	`67`
`68`	`68`	`<details>`
`69`	`69`	`<summary>Answer</summary>`
`@@ -78,11 +78,11 @@ The answers for code blocks in this section are as follows:`
`78`	`78`	`"type": "vector"`
`79`	`79`	`},`
`80`	`80`	`{`
`81`		`- "path": "metadata.language"`
	`81`	`+ "path": "metadata.contentType",`
`82`	`82`	`"type": "filter"`
`83`	`83`	`},`
`84`	`84`	`{`
`85`		`- "path": "type"`
	`85`	`+ "path": "updated",`
`86`	`86`	`"type": "filter"`
`87`	`87`	`}`
`88`	`88`	`]`
`@@ -91,7 +91,7 @@ The answers for code blocks in this section are as follows:`
`91`	`91`	`</div>`
`92`	`92`	`</details>`
`93`	`93`
`94`		`-CODE_BLOCK_19`
	`94`	`+CODE_BLOCK_21`
`95`	`95`
`96`	`96`	`<details>`
`97`	`97`	`<summary>Answer</summary>`
`@@ -107,16 +107,16 @@ The answers for code blocks in this section are as follows:`
`107`	`107`	`"limit": 5,`
`108`	`108`	`"filter": {`
`109`	`109`	`"$and": [`
`110`		`- {"metadata.language": "en"},`
`111`		`- {"type": "Document"}`
	`110`	`+ {"metadata.contentType": "Video"},`
	`111`	`+ {"updated": {"$gte": "2024-05-20"}}`
`112`	`112`	`]`
`113`	`113`	`}`
`114`	`114`	`}`
`115`	`115`	`},`
`116`	`116`	`{`
`117`	`117`	`"$project": {`
`118`	`118`	`"_id": 0,`
`119`		`- "page_content": 1,`
	`119`	`+ "body": 1,`
`120`	`120`	`"score": {"$meta": "vectorSearchScore"}`
`121`	`121`	`}`
`122`	`122`	`}`