Skip to content

Commit aa3dd72

Browse files
author
ajosh0504
committed
Updating code blocks
1 parent f15c2a1 commit aa3dd72

File tree

9 files changed

+74
-47
lines changed

9 files changed

+74
-47
lines changed
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# 👐 Load the dataset
22

3-
First, let's download the dataset for our lab. We'll use four RAG-focused blogs from our Developer Center as the source data for our RAG application.
3+
First, let's download the dataset for our lab. We'll use a subset of articles from the MongoDB Developer Center as the source data for our RAG application.
44

5-
Run all the cells under the **Step 3: Load the dataset** section in the notebook to load the blog content as LangChain Document objects.
5+
Run all the cells under the **Step 3: Load the dataset** section in the notebook to load the articles as a list of Python objects consisting of the content and relevant metadata.

docs/50-prepare-the-data/3-chunk-data.mdx

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Since we are working with large documents, we first need to break them up into smaller chunks before embedding and storing them in MongoDB.
44

5-
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 4: Chunk up the data** section in the notebook to chunk up the documents we loaded.
5+
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 4: Chunk up the data** section in the notebook to chunk up the articles we loaded.
66

77
The answers for code blocks in this section are as follows:
88

@@ -13,7 +13,7 @@ The answers for code blocks in this section are as follows:
1313
<div>
1414
```python
1515
RecursiveCharacterTextSplitter.from_tiktoken_encoder(
16-
encoding_name="cl100k_base", chunk_size=200, chunk_overlap=30
16+
encoding_name="cl100k_base", separators=separators, chunk_size=200, chunk_overlap=30
1717
)
1818
```
1919
</div>
@@ -25,7 +25,7 @@ RecursiveCharacterTextSplitter.from_tiktoken_encoder(
2525
<summary>Answer</summary>
2626
<div>
2727
```python
28-
text_splitter.split_documents(docs)
28+
doc[text_field]
2929
```
3030
</div>
3131
</details>
@@ -36,7 +36,34 @@ text_splitter.split_documents(docs)
3636
<summary>Answer</summary>
3737
<div>
3838
```python
39-
doc.dict() for doc in split_docs
39+
text_splitter.split_text(text)
40+
```
41+
</div>
42+
</details>
43+
44+
**CODE_BLOCK_6**
45+
46+
<details>
47+
<summary>Answer</summary>
48+
<div>
49+
```python
50+
for chunk in chunks:
51+
temp = doc.copy()
52+
temp[text_field] = chunk
53+
chunked_data.append(temp)
54+
```
55+
</div>
56+
</details>
57+
58+
**CODE_BLOCK_7**
59+
60+
<details>
61+
<summary>Answer</summary>
62+
<div>
63+
```python
64+
for doc in docs:
65+
chunks = get_chunks(doc, "body")
66+
split_docs.extend(chunks)
4067
```
4168
</div>
4269
</details>

docs/50-prepare-the-data/4-embed-data.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22

33
To perform vector search on our data, we need to embed it (i.e. generate embedding vectors) before ingesting it into MongoDB.
44

5-
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 5: Generate embeddings** section in the notebook to generate embeddings for the chunked documents.
5+
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 5: Generate embeddings** section in the notebook to embed the chunked articles.
66

77
The answers for code blocks in this section are as follows:
88

9-
**CODE_BLOCK_6**
9+
**CODE_BLOCK_8**
1010

1111
<details>
1212
<summary>Answer</summary>
@@ -17,7 +17,7 @@ SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
1717
</div>
1818
</details>
1919

20-
**CODE_BLOCK_7**
20+
**CODE_BLOCK_9**
2121

2222
<details>
2323
<summary>Answer</summary>
@@ -29,15 +29,15 @@ return embedding.tolist()
2929
</div>
3030
</details>
3131

32-
**CODE_BLOCK_8**
32+
**CODE_BLOCK_10**
3333

3434
<details>
3535
<summary>Answer</summary>
3636
<div>
3737
```python
3838
for doc in split_docs:
3939
temp = doc.copy()
40-
temp["embedding"] = get_embedding(temp["page_content"])
40+
temp["embedding"] = get_embedding(temp["body"])
4141
embedded_docs.append(temp)
4242
```
4343
</div>

docs/50-prepare-the-data/5-ingest-data.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@ import Screenshot from "@site/src/components/Screenshot";
22

33
# 👐 Ingest data into MongoDB
44

5-
The final step to build a MongoDB vector store for our RAG application is to ingest the embedded documents into MongoDB.
5+
The final step to build a MongoDB vector store for our RAG application is to ingest the embedded article chunks into MongoDB.
66

77
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 6: Ingest data into MongoDB** section in the notebook to ingest the embedded documents into MongoDB.
88

99
The answers for code blocks in this section are as follows:
1010

11-
**CODE_BLOCK_9**
11+
**CODE_BLOCK_11**
1212

1313
<details>
1414
<summary>Answer</summary>
@@ -19,7 +19,7 @@ MongoClient(MONGODB_URI)
1919
</div>
2020
</details>
2121

22-
**CODE_BLOCK_10**
22+
**CODE_BLOCK_12**
2323

2424
<details>
2525
<summary>Answer</summary>
@@ -30,7 +30,7 @@ mongo_client[DB_NAME][COLLECTION_NAME]
3030
</div>
3131
</details>
3232

33-
**CODE_BLOCK_11**
33+
**CODE_BLOCK_13**
3434

3535
<details>
3636
<summary>Answer</summary>
@@ -41,7 +41,7 @@ collection.delete_many({})
4141
</div>
4242
</details>
4343

44-
**CODE_BLOCK_12**
44+
**CODE_BLOCK_14**
4545

4646
<details>
4747
<summary>Answer</summary>

docs/60-perform-semantic-search/3-vector-search.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 8:
66

77
The answers for code blocks in this section are as follows:
88

9-
**CODE_BLOCK_13**
9+
**CODE_BLOCK_15**
1010

1111
<details>
1212
<summary>Answer</summary>
@@ -17,7 +17,7 @@ get_embedding(user_query)
1717
</div>
1818
</details>
1919

20-
**CODE_BLOCK_14**
20+
**CODE_BLOCK_16**
2121

2222
<details>
2323
<summary>Answer</summary>
@@ -36,7 +36,7 @@ get_embedding(user_query)
3636
{
3737
"$project": {
3838
"_id": 0,
39-
"page_content": 1,
39+
"body": 1,
4040
"score": {"$meta": "vectorSearchScore"},
4141
}
4242
},
@@ -45,7 +45,7 @@ get_embedding(user_query)
4545
</div>
4646
</details>
4747

48-
**CODE_BLOCK_15**
48+
**CODE_BLOCK_17**
4949

5050
<details>
5151
<summary>Answer</summary>

docs/60-perform-semantic-search/4-pre-filtering.mdx

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
1010

1111
The answers for code blocks in this section are as follows:
1212

13-
**CODE_BLOCK_16**
13+
**CODE_BLOCK_18**
1414

1515
<details>
1616
<summary>Answer</summary>
@@ -25,7 +25,7 @@ The answers for code blocks in this section are as follows:
2525
"type": "vector"
2626
},
2727
{
28-
"path": "metadata.language"
28+
"path": "metadata.contentType",
2929
"type": "filter"
3030
}
3131
]
@@ -34,7 +34,7 @@ The answers for code blocks in this section are as follows:
3434
</div>
3535
</details>
3636

37-
**CODE_BLOCK_17**
37+
**CODE_BLOCK_19**
3838

3939
<details>
4040
<summary>Answer</summary>
@@ -48,13 +48,13 @@ The answers for code blocks in this section are as follows:
4848
"path": "embedding",
4949
"numCandidates": 150,
5050
"limit": 5,
51-
"filter": {"metadata.language": "en"}
51+
"filter": {"metadata.contentType": "Video"}
5252
}
5353
},
5454
{
5555
"$project": {
5656
"_id": 0,
57-
"page_content": 1,
57+
"body": 1,
5858
"score": {"$meta": "vectorSearchScore"}
5959
}
6060
}
@@ -63,7 +63,7 @@ The answers for code blocks in this section are as follows:
6363
</div>
6464
</details>
6565

66-
**CODE_BLOCK_18**
66+
**CODE_BLOCK_20**
6767

6868
<details>
6969
<summary>Answer</summary>
@@ -78,11 +78,11 @@ The answers for code blocks in this section are as follows:
7878
"type": "vector"
7979
},
8080
{
81-
"path": "metadata.language"
81+
"path": "metadata.contentType",
8282
"type": "filter"
8383
},
8484
{
85-
"path": "type"
85+
"path": "updated",
8686
"type": "filter"
8787
}
8888
]
@@ -91,7 +91,7 @@ The answers for code blocks in this section are as follows:
9191
</div>
9292
</details>
9393

94-
**CODE_BLOCK_19**
94+
**CODE_BLOCK_21**
9595

9696
<details>
9797
<summary>Answer</summary>
@@ -107,16 +107,16 @@ The answers for code blocks in this section are as follows:
107107
"limit": 5,
108108
"filter": {
109109
"$and": [
110-
{"metadata.language": "en"},
111-
{"type": "Document"}
110+
{"metadata.contentType": "Video"},
111+
{"updated": {"$gte": "2024-05-20"}}
112112
]
113113
}
114114
}
115115
},
116116
{
117117
"$project": {
118118
"_id": 0,
119-
"page_content": 1,
119+
"body": 1,
120120
"score": {"$meta": "vectorSearchScore"}
121121
}
122122
}

docs/70-build-rag-app/2-build-rag-app.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 9:
66

77
The answers for code blocks in this section are as follows:
88

9-
**CODE_BLOCK_20**
9+
**CODE_BLOCK_22**
1010

1111
<details>
1212
<summary>Answer</summary>
@@ -17,18 +17,18 @@ vector_search(user_query)
1717
</div>
1818
</details>
1919

20-
**CODE_BLOCK_21**
20+
**CODE_BLOCK_23**
2121

2222
<details>
2323
<summary>Answer</summary>
2424
<div>
2525
```python
26-
"\n\n".join([d.get("page_content", "") for d in context])
26+
"\n\n".join([d.get("body", "") for d in context])
2727
```
2828
</div>
2929
</details>
3030

31-
**CODE_BLOCK_22**
31+
**CODE_BLOCK_24**
3232

3333
<details>
3434
<summary>Answer</summary>

docs/70-build-rag-app/3-stream-responses.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
66

77
The answers for code blocks in this section are as follows:
88

9-
**CODE_BLOCK_23**
9+
**CODE_BLOCK_25**
1010

1111
<details>
1212
<summary>Answer</summary>
@@ -27,7 +27,7 @@ fw_client.chat.completions.create(
2727
</div>
2828
</details>
2929

30-
**CODE_BLOCK_24**
30+
**CODE_BLOCK_26**
3131

3232
<details>
3333
<summary>Answer</summary>

0 commit comments

Comments
 (0)