sakunaharinda · rampal-punia · Jul 12, 2024 · Jul 12, 2024 · Jul 12, 2024
diff --git a/README.md b/README.md
@@ -1,13 +1,40 @@
-# Learn RAG with Langchain 🦜⛓️‍💥
+# Master RAG with Langchain 🦜⛓️‍💥
 
-**Welcome to your ultimate guide for mastering Retrieval-Augmented Generation (RAG) with LangChain!**
+## Your Comprehensive Guide to Retrieval-Augmented Generation!
 
-In today's rapidly evolving landscape of artificial intelligence, the ability to generate highly accurate and contextually relevant information is paramount. Retrieval-Augmented Generation (RAG) is a cutting-edge technique that enhances the capabilities of generative models by integrating external knowledge sources. This not only improves the quality of the generated content but also ensures that it is grounded in reliable data. 
+Discover how to harness the power of Retrieval-Augmented Generation (RAG) using LangChain in this comprehensive tutorial series.
 
-This tutorial series is dedicated to providing you with a comprehensive, step-by-step guide to implementing RAG using LangChain, a powerful framework designed for building and deploying robust language model applications. We begin with an introduction to the basic RAG pipeline, providing a foundation for understanding how retrieval-based systems and generative models can be combined to produce accurate and contextually relevant responses. As we progress, we'll delve into the nuances of query transformation, a crucial step that refines user queries to ensure the language model comprehends and processes them accurately. This is followed by an exploration of hypothetical document embeddings, a technique used to generate vector representations of potential documents, which aids in assessing their relevance before retrieval.
+### Why RAG Matters
 
-Further enhancing the RAG pipeline, we'll discuss routing mechanisms that intelligently select the most appropriate data sources for answering queries. This dynamic selection ensures that the information retrieved is both relevant and comes from the best possible source. Additionally, we'll cover the construction of executable queries, effective indexing strategies, and various retrieval techniques such as self RAG, adaptive RAG, and CRAG (Conditional Retrieval-Augmented Generation), each offering unique advantages for different use cases. The final step in the pipeline is the generation phase, where the language model synthesizes the retrieved information to produce coherent and accurate responses.
+In the rapidly evolving AI landscape, RAG stands out as a cutting-edge technique that:
 
-Our tutorial culminates in a practical application: building a hospital management system. By integrating all the concepts learned throughout the blog, you'll see how to apply the RAG pipeline in a real-world scenario, showcasing its power and flexibility. Whether you're new to RAG or looking to refine your skills, this guide provides valuable insights and practical knowledge to help you succeed. Let’s embark on this exciting journey into the world of Retrieval-Augmented Generation with LangChain!
+- Enhances generative models with external knowledge sources
+- Improves content quality and accuracy
+- Grounds responses in reliable data 
 
-The organization and the content of this series is primarily based on [Langchain Tutoral Series](https://www.youtube.com/watch?v=wd7TZ4w1mSw&list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x) with some interesting improvements.
+### What You'll Learn
+
+This step-by-step guide covers:
+
+1. **RAG Pipeline Basics**: Understand the foundation of combining retrieval systems with generative models.
+2. **Query Transformation**: Refine user queries for optimal language model processing.
+3. **Hypothetical Document Embeddings**: Generate vector representations to assess document relevance.
+4. **Intelligent Routing**: Select the most appropriate data sources for each query.
+5. **Advanced Techniques**:
+    - Executable query construction
+    - Effective indexing strategies
+    - Retrieval techniques: `Self RAG`, `Adaptive RAG`, and `CRAG` (Conditional Retrieval-Augmented Generation)
+
+
+### Practical Application
+Apply your knowledge by building a hospital management system, demonstrating RAG's real-world potential.
+
+### Who Is This For?
+
+- Beginners looking to understand RAG
+- Experienced practitioners aiming to refine their skills
+
+### Get Started
+Embark on your journey to master Retrieval-Augmented Generation with LangChain. Each tutorial builds on the previous, providing you with a solid foundation and advanced techniques.
+
+**This series is inspired by the [LangChain Tutorial Series](https://www.youtube.com/watch?v=wd7TZ4w1mSw&list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x), enhanced with additional insights and improvements.**
diff --git a/book/10_Final.ipynb b/book/10_Final.ipynb
@@ -6,11 +6,11 @@
    "source": [
     "# Putting it all together with Neo4J\n",
     "\n",
-    "In this section we put everything we learned in previous sections into practice by creating an LLM agent that will answer user questions about a hospital. To do that we use two datasources: a Neo4J vector indices that contains documents on user reviews of different hospitals and a Neo4J graph database containing the information about those hospitals, visits, doctors, payments, etc. Our RAG pipeline will be correctly re-direct the user query to each datasource and will answer the question in the end. Let's begin!\n",
+    "In this section, we put everything we learned in previous sections into practice by creating an LLM agent that will answer user questions about a hospital. To do that we use two datasources: a Neo4J vector indices that contain documents on user reviews of different hospitals and a Neo4J graph database containing information about those hospitals, visits, doctors, payments, etc. Our RAG pipeline will correctly re-direct the user query to each datasource and will answer the question in the end. Let's begin!\n",
     "\n",
     "| ![arag](resources/final.png) | \n",
     "|:--:| \n",
-    "| *RAG based hospital informtion system* |"
+    "| *RAG based hospital information system* |"
    ]
   },
   {
@@ -59,13 +59,13 @@
     "\n",
     "- `hospitals.csv`: Contains the names of the hospitals, the state that the hospital is located, and a unique id.\n",
     "- `physicians.csv`: Contains information about physicians including their names, date of birth, graduation year, medical school, and salary.\n",
-    "- `payers.csv`: Contains names and unique ids of five different insuarance companies that paid bills of patients.\n",
+    "- `payers.csv`: Contains names and unique ids of five different insurance companies that paid bills of patients.\n",
     "- `patients.csv`: Contains information about patients and their sex, date of birth, blood type, identified by a unique id.\n",
-    "- `visits.csv`: This file connects all the mentioned files with infomation about each patient's visits, date of admission, billing amount, room number, admission type, discharge date, test results, visit id, physician id,payer id, hospital id, chief complaint, treatment description, primary diagnosis, and visit status.\n",
-    "- `reviews.csv`: Contains user reviews posted by patients in their respective vists treated by a physician in a hospital. \n",
+    "- `visits.csv`: This file connects all the mentioned files with information about each patient's visits, date of admission, billing amount, room number, admission type, discharge date, test results, visit id, physician id, payer id, hospital id, chief complaint, treatment description, primary diagnosis, and visit status.\n",
+    "- `reviews.csv`: Contains user reviews posted by patients in their respective visits treated by a physician in a hospital. \n",
     "\n",
     "\n",
-    "Next we use the above CSV files and relationships between them are used to create a Neo4J graph in [Neo4J AuraDB](https://neo4j.com/cloud/platform/aura-graph-database/?utm_source=Google&utm_medium=PaidSearch&utm_campaign=Evergreen&utm_content=APAC-Search-SEMBrand-Evergreen-None-SEM-SEM-NonABM&utm_term=auradb&utm_adgroup=auradb&gad_source=1&gclid=Cj0KCQjw6auyBhDzARIsALIo6v-fHIGNfhxYHr6ZxUpuoE-wSFEfJHw93acnry6XSQ5JTZKMlJ84ojQaAthHEALw_wcB). You can create an account for free and create a DB instance hosted in GCP. You have to download the login information need for authentication through neo4j python library and langchain. The login information should contain: \n",
+    "Next we use the above CSV files and relationships between them are used to create a Neo4J graph in [Neo4J AuraDB](https://neo4j.com/cloud/platform/aura-graph-database/?utm_source=Google&utm_medium=PaidSearch&utm_campaign=Evergreen&utm_content=APAC-Search-SEMBrand-Evergreen-None-SEM-SEM-NonABM&utm_term=auradb&utm_adgroup=auradb&gad_source=1&gclid=Cj0KCQjw6auyBhDzARIsALIo6v-fHIGNfhxYHr6ZxUpuoE-wSFEfJHw93acnry6XSQ5JTZKMlJ84ojQaAthHEALw_wcB). You can create an account for free and create a DB instance hosted in GCP. You need to download the login information needed for authentication through the neo4j python library and langchain. The login information should contain: \n",
     "\n",
     "- NEO4J_URI\n",
     "- NEO4J_USERNAME\n",
@@ -102,7 +102,7 @@
    "source": [
     "`````{admonition} See also\n",
     ":class: tip\n",
-    "For more information about the Cypher Query Language refer the [documentation](https://neo4j.com/product/cypher-graph-query-language/?utm_source=Google&utm_medium=PaidSearch&utm_campaign=Evergreen&utm_content=AMS-Search-SEMBrand-Evergreen-None-SEM-SEM-NonABM&utm_term=cypher%20query%20language&utm_adgroup=cypher-language&gad_source=1&gclid=Cj0KCQjw6auyBhDzARIsALIo6v9vwpW2dCruoed6H21Hsv12uccW6jF9oAgfqPKAgzeN27_8Xnm6ecEaAk9LEALw_wcB). Python SDK documentation can be found [here](https://neo4j.com/docs/python-manual/current/).\n",
+    "For more information about the Cypher Query Language refer to the [documentation](https://neo4j.com/product/cypher-graph-query-language/?utm_source=Google&utm_medium=PaidSearch&utm_campaign=Evergreen&utm_content=AMS-Search-SEMBrand-Evergreen-None-SEM-SEM-NonABM&utm_term=cypher%20query%20language&utm_adgroup=cypher-language&gad_source=1&gclid=Cj0KCQjw6auyBhDzARIsALIo6v9vwpW2dCruoed6H21Hsv12uccW6jF9oAgfqPKAgzeN27_8Xnm6ecEaAk9LEALw_wcB). Python SDK documentation can be found [here](https://neo4j.com/docs/python-manual/current/).\n",
     "`````"
    ]
   },
@@ -339,7 +339,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "After creating our graph, you will be able to see all the nodes and relationships in the AuraDB dashboard. Also you can execute Cypher queries against your graph and get results. \n",
+    "After creating our graph, you will be able to see all the nodes and relationships in the AuraDB dashboard. Also, you can execute Cypher queries against your graph and get results. \n",
     "\n",
     "For instance, if you want to know the total number visits in all hospitals in Texas that were paid by the company \"Cigna\" and the total billing amount, you can execute,\n",
     "\n",
@@ -358,7 +358,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "After we are satisfied with the graph, we can then create a `Neo4jVector` index through langchain to embedd the user reviews (which is a node in the graph) with properties `review`, `physician_name`, `hospital_name`, `patient_name`. Vector search indices were released as a public beta in Neo4j 5.11. They allow you to run semantic queries directly on your graph. This is really convenient for your chatbot because you can store review embeddings in the same place as your structured hospital system data. Here we have to provide the  `node_label` that we are going to embed, a name to the index, and the `embedding` in addition to the information required for authentication (`username`, `password`, and `url`). Finally we can create the retriever as we have done many times earlier."
+    "After we are satisfied with the graph, we can then create a `Neo4jVector` index through langchain to embed the user reviews (which is a node in the graph) with properties `review`, `physician_name`, `hospital_name`, `patient_name`. Vector search indices were released as a public beta in Neo4j 5.11. They allow you to run semantic queries directly on your graph. This is really convenient for your chatbot because you can store review embeddings in the same place as your structured hospital system data. Here we have to provide the  `node_label` that we are going to embed, a name to the index, and the `embedding` in addition to the information required for authentication (`username`, `password`, and `url`). Finally, we can create the retriever as we have done many times earlier."
    ]
   },
   {
@@ -400,7 +400,7 @@
     {
      "data": {
       "text/plain": [
-       "'Patients have expressed mixed feelings about hospital efficiency. They appreciated the professionalism of the doctors and the caring nature of the nursing staff, but were disappointed by the lack of communication about treatment plans, confusing administrative processes, and constant interruptions during the night. Some also mentioned the lack of vegetarian options in the cafeteria.'"
+       "'Patients have expressed mixed feelings about hospital efficiency. They appreciated the professionalism of the doctors and the caring nature of the nursing staff but were disappointed by the lack of communication about treatment plans, confusing administrative processes, and constant interruptions during the night. Some also mentioned the lack of vegetarian options in the cafeteria.'"
       ]
      },
      "execution_count": 14,
@@ -413,9 +413,9 @@
     "\n",
     "\n",
     "# review_template = \"\"\"\n",
-    "# You are an assistant for answering questions based on the user reviews about an hospital.\n",
+    "# You are an assistant for answering questions based on the user reviews about a hospital.\n",
     "#     Use the following pieces of retrieved context to answer the question. \n",
-    "#     Be detailed as possible. \n",
+    "#     Be as detailed as possible. \n",
     "#     If you don't know the answer, just say that you don't know.\n",
     "#     \\nQuestion: {question} \\nContext: {context} \\nAnswer:\"\n",
     "# \"\"\"\n",
@@ -634,7 +634,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "After defining the cypher generation prompt we create a `graph_query_chain` that translates the user question into a Cypher query, executes it against the database, and returns the answer. You can set the `verbose` as `True`, if you want to see the intermediate steps like the generated query."
+    "After defining the cypher generation prompt we create a `graph_query_chain` that translates the user question into a Cypher query, executes it against the database, and returns the answer. You can set the `verbose` as `True` if it is required to see the intermediate steps like the generated query."
    ]
   },
   {
@@ -665,7 +665,7 @@
     "    \n",
     ")\n",
     "\n",
-    "question = \"Who are the physicians who treated patients that have O- blood type?\"\n",
+    "question = \"Who are the physicians who treated patients having O- blood type?\"\n",
     "\n",
     "result = graph_query_chain.invoke({'query': question})\n",
     "intermediate_results = result['intermediate_steps']\n",
@@ -677,7 +677,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Chain to answer when fallback happens."
+    "### Chain to answer when fallback happens."
    ]
   },
   {
@@ -688,7 +688,7 @@
    "source": [
     "fallback_prompt = ChatPromptTemplate.from_template(\n",
     "    \"\"\"\n",
-    "    You are an assistant for question-answering tasks. Answer the question based upon your knowledge. Use three sentences maximum and keep the answer concise.\\n\\n\n",
+    "    You are an assistant for question-answering tasks. Answer the question based on your knowledge. Use three sentences maximum and keep the answer concise.\\n\\n\n",
     "    Question: {question}\n",
     "    \"\"\"\n",
     ")\n",
@@ -700,7 +700,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Query router that does the query analysis and re-direct the user question either to the Neo4J vector index, Neo4J graph database, or to the fallback."
+    "A query router that performs query analysis and redirects the user's question either to the Neo4J vector index, the Neo4J graph database, or to the fallback."
    ]
   },
   {
@@ -732,7 +732,7 @@
     "\n",
     "query_router_prompt = ChatPromptTemplate.from_template(\n",
     "    \"\"\"You are an expert at routing a user question to a vectorstore or to a graph database containing information from a hospital system. The vectorstore contains documents related to the user reviews of a hospital.\n",
-    "Use the vectorstore for questions that can be answered using peoples' opinions on the hospital. Otherwise, use graph to answer questions using the graph database containing information from a company database that manages several hospitals. If the question can be answered using LLM's internal knowledge, use fallback.\\n\\n\n",
+    "Use the vectorstore for questions that can be answered using peoples' opinions on the hospital. Otherwise, use graphs to answer questions using the graph database containing information from a company database that manages several hospitals. If the question can be answered using LLM's internal knowledge, use fallback.\\n\\n\n",
     "Question: {question}\"\"\"\n",
     ")\n",
     "\n",
@@ -750,7 +750,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Hallucination evaluator checks whether or not the generated answer is grounded by the facts."
+    "### Hallucination evaluator checks whether or not the generated answer is grounded by the facts."
    ]
   },
   {
@@ -806,7 +806,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Graph state definition."
+    "### Graph state definition"
    ]
   },
   {
@@ -838,7 +838,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Creating methods for the nodes."
+    "### Creating methods for the nodes"
    ]
   },
   {
@@ -933,7 +933,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Creating conditional edges."
+    "### Creating conditional edges"
    ]
   },
   {
@@ -998,7 +998,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Creating the grpah with defined nodes and edges."
+    "### Creating the graph with defined nodes and edges"
    ]
   },
   {
@@ -1093,7 +1093,7 @@
       "> 🧠 Generating an answer ...\n",
       "> ✅ \u001b[92mAnswer addresses the question\u001b[0m\n",
       "\n",
-      "Question: What is total number visits in all hospitals in Texas that were paid by the company 'Cigna'?\n",
+      "Question: What is the total number of visits in all hospitals in Texas that were paid by the company 'Cigna'?\n",
       "Answer: The total number of visits in all hospitals in Texas that were paid by the company 'Cigna' is 204.\n"
      ]
     }