From 9020a934be35b0798c972eb77a22fb62ce654ca5 Mon Sep 17 00:00:00 2001
From: Farzad Sunavala <40604067+farzad528@users.noreply.github.com>
Date: Fri, 24 Jan 2025 06:56:26 -0600
Subject: [PATCH] docs: add Azure RAG example (#675)

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
Co-authored-by: Farzad Sunavala <fsunavala@microsoft.com>
---
 docs/examples/rag_azuresearch.ipynb | 893 ++++++++++++++++++++++++++++
 mkdocs.yml                          |  10 +-
 2 files changed, 899 insertions(+), 4 deletions(-)
 create mode 100644 docs/examples/rag_azuresearch.ipynb
diff --git a/docs/examples/rag_azuresearch.ipynb b/docs/examples/rag_azuresearch.ipynb
new file mode 100644
index 00000000..abbe3774
--- /dev/null
+++ b/docs/examples/rag_azuresearch.ipynb
@@ -0,0 +1,893 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Ag9kcX2B_atc"
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/DS4SD/docling/blob/main/docs/examples/rag_azuresearch.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# RAG with Azure AI Search"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "| Step               | Tech               | Execution |\n",
+    "| ------------------ | ------------------ | --------- |\n",
+    "| Embedding          | Azure OpenAI       | 🌐 Remote |\n",
+    "| Vector Store       | Azure AI Search    | 🌐 Remote |\n",
+    "| Gen AI  | Azure OpenAI | 🌐 Remote |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "\n",
+    "This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:\n",
+    "- [Docling](https://ds4sd.github.io/docling/) for document parsing and chunking\n",
+    "- [Azure AI Search](https://azure.microsoft.com/products/ai-services/ai-search/?msockid=0109678bea39665431e37323ebff6723) for vector indexing and retrieval\n",
+    "- [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service?msockid=0109678bea39665431e37323ebff6723) for embeddings and chat completion\n",
+    "\n",
+    "This sample demonstrates how to:\n",
+    "1. Parse a PDF with Docling.\n",
+    "2. Chunk the parsed text.\n",
+    "3. Use Azure OpenAI for embeddings.\n",
+    "4. Index and search in Azure AI Search.\n",
+    "5. Run a retrieval-augmented generation (RAG) query with Azure OpenAI GPT-4o.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If running in a fresh environment (like Google Colab), uncomment and run this single command:\n",
+    "%pip install \"docling~=2.12\" azure-search-documents==11.5.2 azure-identity openai rich torch python-dotenv"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 0: Prerequisites\n",
+    " - **Azure AI Search** resource\n",
+    " - **Azure OpenAI** resource with a deployed embedding & chat completion model (e.g. `text-embedding-3-small` & `gpt-4o`) \n",
+    " - **Docling 2.12+** (installs `docling_core` automatically)  Docling installed (Python 3.8+ environment)\n",
+    "\n",
+    "- A **GPU-enabled environment** is preferred for faster parsing. Docling 2.12 automatically detects GPU if present.\n",
+    "  - If you only have CPU, parsing large PDFs can be slower.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "from dotenv import load_dotenv\n",
+    "\n",
+    "load_dotenv()\n",
+    "\n",
+    "\n",
+    "def _get_env(key, default=None):\n",
+    "    try:\n",
+    "        from google.colab import userdata\n",
+    "\n",
+    "        try:\n",
+    "            return userdata.get(key)\n",
+    "        except userdata.SecretNotFoundError:\n",
+    "            pass\n",
+    "    except ImportError:\n",
+    "        pass\n",
+    "    return os.getenv(key, default)\n",
+    "\n",
+    "\n",
+    "AZURE_SEARCH_ENDPOINT = _get_env(\"AZURE_SEARCH_ENDPOINT\")\n",
+    "AZURE_SEARCH_KEY = _get_env(\"AZURE_SEARCH_KEY\")  # Ensure this your Admin Key\n",
+    "AZURE_SEARCH_INDEX_NAME = _get_env(\"AZURE_SEARCH_INDEX_NAME\", \"docling-rag-sample\")\n",
+    "AZURE_OPENAI_ENDPOINT = _get_env(\"AZURE_OPENAI_ENDPOINT\")\n",
+    "AZURE_OPENAI_API_KEY = _get_env(\"AZURE_OPENAI_API_KEY\")\n",
+    "AZURE_OPENAI_API_VERSION = _get_env(\"AZURE_OPENAI_API_VERSION\", \"2024-10-21\")\n",
+    "AZURE_OPENAI_CHAT_MODEL = _get_env(\n",
+    "    \"AZURE_OPENAI_CHAT_MODEL\"\n",
+    ")  # Using a deployed model named \"gpt-4o\"\n",
+    "AZURE_OPENAI_EMBEDDINGS = _get_env(\n",
+    "    \"AZURE_OPENAI_EMBEDDINGS\", \"text-embedding-3-small\"\n",
+    ")  # Using a deployed model named \"text-embeddings-3-small\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 1: Parse the PDF with Docling\n",
+    "\n",
+    "We’ll parse the **Microsoft GraphRAG Research Paper** (~15 pages). Parsing should be relatively quick, even on CPU, but it will be faster on a GPU or MPS device if available.\n",
+    "\n",
+    "*(If you prefer a different document, simply provide a different URL or local file path.)*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">Parsing a ~</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">15</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">-page PDF. The process should be relatively quick, even on CPU...</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;33mParsing a ~\u001b[0m\u001b[1;33m15\u001b[0m\u001b[1;33m-page PDF. The process should be relatively quick, even on CPU\u001b[0m\u001b[1;33m...\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭─────────────────────────────────────────── Docling Markdown Preview ────────────────────────────────────────────╮\n",
+       "│ ## From Local to Global: A Graph RAG Approach to Query-Focused Summarization                                    │\n",
+       "│                                                                                                                 │\n",
+       "│ Darren Edge 1†                                                                                                  │\n",
+       "│                                                                                                                 │\n",
+       "│ Ha Trinh 1†                                                                                                     │\n",
+       "│                                                                                                                 │\n",
+       "│ Newman Cheng 2                                                                                                  │\n",
+       "│                                                                                                                 │\n",
+       "│ Joshua Bradley 2                                                                                                │\n",
+       "│                                                                                                                 │\n",
+       "│ Alex Chao 3                                                                                                     │\n",
+       "│                                                                                                                 │\n",
+       "│ Apurva Mody 3                                                                                                   │\n",
+       "│                                                                                                                 │\n",
+       "│ Steven Truitt 2                                                                                                 │\n",
+       "│                                                                                                                 │\n",
+       "│ ## Jonathan Larson 1                                                                                            │\n",
+       "│                                                                                                                 │\n",
+       "│ 1 Microsoft Research 2 Microsoft Strategic Missions and Technologies 3 Microsoft Office of the CTO              │\n",
+       "│                                                                                                                 │\n",
+       "│ { daedge,trinhha,newmancheng,joshbradley,achao,moapurva,steventruitt,jolarso } @microsoft.com                   │\n",
+       "│                                                                                                                 │\n",
+       "│ † These authors contributed equally to this work                                                                │\n",
+       "│                                                                                                                 │\n",
+       "│ ## Abstract                                                                                                     │\n",
+       "│                                                                                                                 │\n",
+       "│ The use of retrieval-augmented gen...                                                                           │\n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "╭─────────────────────────────────────────── Docling Markdown Preview ────────────────────────────────────────────╮\n",
+       "│ ## From Local to Global: A Graph RAG Approach to Query-Focused Summarization                                    │\n",
+       "│                                                                                                                 │\n",
+       "│ Darren Edge 1†                                                                                                  │\n",
+       "│                                                                                                                 │\n",
+       "│ Ha Trinh 1†                                                                                                     │\n",
+       "│                                                                                                                 │\n",
+       "│ Newman Cheng 2                                                                                                  │\n",
+       "│                                                                                                                 │\n",
+       "│ Joshua Bradley 2                                                                                                │\n",
+       "│                                                                                                                 │\n",
+       "│ Alex Chao 3                                                                                                     │\n",
+       "│                                                                                                                 │\n",
+       "│ Apurva Mody 3                                                                                                   │\n",
+       "│                                                                                                                 │\n",
+       "│ Steven Truitt 2                                                                                                 │\n",
+       "│                                                                                                                 │\n",
+       "│ ## Jonathan Larson 1                                                                                            │\n",
+       "│                                                                                                                 │\n",
+       "│ 1 Microsoft Research 2 Microsoft Strategic Missions and Technologies 3 Microsoft Office of the CTO              │\n",
+       "│                                                                                                                 │\n",
+       "│ { daedge,trinhha,newmancheng,joshbradley,achao,moapurva,steventruitt,jolarso } @microsoft.com                   │\n",
+       "│                                                                                                                 │\n",
+       "│ † These authors contributed equally to this work                                                                │\n",
+       "│                                                                                                                 │\n",
+       "│ ## Abstract                                                                                                     │\n",
+       "│                                                                                                                 │\n",
+       "│ The use of retrieval-augmented gen...                                                                           │\n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from rich.console import Console\n",
+    "from rich.panel import Panel\n",
+    "\n",
+    "from docling.document_converter import DocumentConverter\n",
+    "\n",
+    "console = Console()\n",
+    "\n",
+    "# This URL points to the Microsoft GraphRAG Research Paper (arXiv: 2404.16130), ~15 pages\n",
+    "source_url = \"https://arxiv.org/pdf/2404.16130\"\n",
+    "\n",
+    "console.print(\n",
+    "    \"[bold yellow]Parsing a ~15-page PDF. The process should be relatively quick, even on CPU...[/bold yellow]\"\n",
+    ")\n",
+    "converter = DocumentConverter()\n",
+    "result = converter.convert(source_url)\n",
+    "\n",
+    "# Optional: preview the parsed Markdown\n",
+    "md_preview = result.document.export_to_markdown()\n",
+    "console.print(Panel(md_preview[:500] + \"...\", title=\"Docling Markdown Preview\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 3: Hierarchical Chunking\n",
+    "We convert the `Document` into smaller chunks for embedding & indexing. The built-in `HierarchicalChunker` preserves structure. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Total chunks from PDF: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">106</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Total chunks from PDF: \u001b[1;36m106\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from docling.chunking import HierarchicalChunker\n",
+    "\n",
+    "chunker = HierarchicalChunker()\n",
+    "doc_chunks = list(chunker.chunk(result.document))\n",
+    "\n",
+    "all_chunks = []\n",
+    "for idx, c in enumerate(doc_chunks):\n",
+    "    chunk_text = c.text\n",
+    "    all_chunks.append((f\"chunk_{idx}\", chunk_text))\n",
+    "\n",
+    "console.print(f\"Total chunks from PDF: {len(all_chunks)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 3: Create Azure Search Index and Push Chunk Embeddings\n",
+    "We’ll define a vector index in Azure AI Search, then embed each chunk using Azure OpenAI and upload in batches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Index <span style=\"color: #008000; text-decoration-color: #008000\">'docling-rag-sample-2'</span> created.\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Index \u001b[32m'docling-rag-sample-2'\u001b[0m created.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from azure.core.credentials import AzureKeyCredential\n",
+    "from azure.search.documents.indexes import SearchIndexClient\n",
+    "from azure.search.documents.indexes.models import (\n",
+    "    AzureOpenAIVectorizer,\n",
+    "    AzureOpenAIVectorizerParameters,\n",
+    "    HnswAlgorithmConfiguration,\n",
+    "    SearchableField,\n",
+    "    SearchField,\n",
+    "    SearchFieldDataType,\n",
+    "    SearchIndex,\n",
+    "    SimpleField,\n",
+    "    VectorSearch,\n",
+    "    VectorSearchProfile,\n",
+    ")\n",
+    "from rich.console import Console\n",
+    "\n",
+    "console = Console()\n",
+    "\n",
+    "VECTOR_DIM = 1536  # Adjust based on your chosen embeddings model\n",
+    "\n",
+    "index_client = SearchIndexClient(\n",
+    "    AZURE_SEARCH_ENDPOINT, AzureKeyCredential(AZURE_SEARCH_KEY)\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def create_search_index(index_name: str):\n",
+    "    # Define fields\n",
+    "    fields = [\n",
+    "        SimpleField(name=\"chunk_id\", type=SearchFieldDataType.String, key=True),\n",
+    "        SearchableField(name=\"content\", type=SearchFieldDataType.String),\n",
+    "        SearchField(\n",
+    "            name=\"content_vector\",\n",
+    "            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n",
+    "            searchable=True,\n",
+    "            filterable=False,\n",
+    "            sortable=False,\n",
+    "            facetable=False,\n",
+    "            vector_search_dimensions=VECTOR_DIM,\n",
+    "            vector_search_profile_name=\"default\",\n",
+    "        ),\n",
+    "    ]\n",
+    "    # Vector search config with an AzureOpenAIVectorizer\n",
+    "    vector_search = VectorSearch(\n",
+    "        algorithms=[HnswAlgorithmConfiguration(name=\"default\")],\n",
+    "        profiles=[\n",
+    "            VectorSearchProfile(\n",
+    "                name=\"default\",\n",
+    "                algorithm_configuration_name=\"default\",\n",
+    "                vectorizer_name=\"default\",\n",
+    "            )\n",
+    "        ],\n",
+    "        vectorizers=[\n",
+    "            AzureOpenAIVectorizer(\n",
+    "                vectorizer_name=\"default\",\n",
+    "                parameters=AzureOpenAIVectorizerParameters(\n",
+    "                    resource_url=AZURE_OPENAI_ENDPOINT,\n",
+    "                    deployment_name=AZURE_OPENAI_EMBEDDINGS,\n",
+    "                    model_name=\"text-embedding-3-small\",\n",
+    "                    api_key=AZURE_OPENAI_API_KEY,\n",
+    "                ),\n",
+    "            )\n",
+    "        ],\n",
+    "    )\n",
+    "\n",
+    "    # Create or update the index\n",
+    "    new_index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)\n",
+    "    try:\n",
+    "        index_client.delete_index(index_name)\n",
+    "    except:\n",
+    "        pass\n",
+    "\n",
+    "    index_client.create_or_update_index(new_index)\n",
+    "    console.print(f\"Index '{index_name}' created.\")\n",
+    "\n",
+    "\n",
+    "create_search_index(AZURE_SEARCH_INDEX_NAME)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Embed and Upsert to Azure AI Search\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Uploaded batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> -&gt; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">50</span>; all_succeeded: <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, first_doc_status_code: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">201</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Uploaded batch \u001b[1;36m0\u001b[0m -> \u001b[1;36m50\u001b[0m; all_succeeded: \u001b[3;92mTrue\u001b[0m, first_doc_status_code: \u001b[1;36m201\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Uploaded batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">50</span> -&gt; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span>; all_succeeded: <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, first_doc_status_code: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">201</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Uploaded batch \u001b[1;36m50\u001b[0m -> \u001b[1;36m100\u001b[0m; all_succeeded: \u001b[3;92mTrue\u001b[0m, first_doc_status_code: \u001b[1;36m201\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Uploaded batch <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span> -&gt; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">106</span>; all_succeeded: <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, first_doc_status_code: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">201</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Uploaded batch \u001b[1;36m100\u001b[0m -> \u001b[1;36m106\u001b[0m; all_succeeded: \u001b[3;92mTrue\u001b[0m, first_doc_status_code: \u001b[1;36m201\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">All chunks uploaded to Azure Search.\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "All chunks uploaded to Azure Search.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from azure.search.documents import SearchClient\n",
+    "from openai import AzureOpenAI\n",
+    "\n",
+    "search_client = SearchClient(\n",
+    "    AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_INDEX_NAME, AzureKeyCredential(AZURE_SEARCH_KEY)\n",
+    ")\n",
+    "openai_client = AzureOpenAI(\n",
+    "    api_key=AZURE_OPENAI_API_KEY,\n",
+    "    api_version=AZURE_OPENAI_API_VERSION,\n",
+    "    azure_endpoint=AZURE_OPENAI_ENDPOINT,\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def embed_text(text: str):\n",
+    "    \"\"\"\n",
+    "    Helper to generate embeddings with Azure OpenAI.\n",
+    "    \"\"\"\n",
+    "    response = openai_client.embeddings.create(\n",
+    "        input=text, model=AZURE_OPENAI_EMBEDDINGS\n",
+    "    )\n",
+    "    return response.data[0].embedding\n",
+    "\n",
+    "\n",
+    "upload_docs = []\n",
+    "for chunk_id, chunk_text in all_chunks:\n",
+    "    embedding_vector = embed_text(chunk_text)\n",
+    "    upload_docs.append(\n",
+    "        {\n",
+    "            \"chunk_id\": chunk_id,\n",
+    "            \"content\": chunk_text,\n",
+    "            \"content_vector\": embedding_vector,\n",
+    "        }\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "BATCH_SIZE = 50\n",
+    "for i in range(0, len(upload_docs), BATCH_SIZE):\n",
+    "    subset = upload_docs[i : i + BATCH_SIZE]\n",
+    "    resp = search_client.upload_documents(documents=subset)\n",
+    "\n",
+    "    all_succeeded = all(r.succeeded for r in resp)\n",
+    "    console.print(\n",
+    "        f\"Uploaded batch {i} -> {i+len(subset)}; all_succeeded: {all_succeeded}, \"\n",
+    "        f\"first_doc_status_code: {resp[0].status_code}\"\n",
+    "    )\n",
+    "\n",
+    "console.print(\"All chunks uploaded to Azure Search.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Part 4: RAG Query with Azure OpenAI\n",
+    "Combine retrieval from Azure Search with Chat Completions (aka. grounding your LLM)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">╭──────────────────────────────────────────────────</span> RAG Prompt <span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">───────────────────────────────────────────────────╮</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ You are an AI assistant helping answering questions about Microsoft GraphRAG.                                   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Use ONLY the text below to answer the user's question.                                                          │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ If the answer isn't in the text, say you don't know.                                                            │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Context:                                                                                                        │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Community summaries vs. source texts. When comparing community summaries to source texts using Graph RAG,       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ community summaries generally provided a small but consistent improvement in answer comprehensiveness and       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ diversity, except for root-level summaries. Intermediate-level summaries in the Podcast dataset and low-level   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ community summaries in the News dataset achieved comprehensiveness win rates of 57% and 64%, respectively.      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Diversity win rates were 57% for Podcast intermediate-level summaries and 60% for News low-level community      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ summaries. Table 3 also illustrates the scalability advantages of Graph RAG compared to source text             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ summarization: for low-level community summaries ( C3 ), Graph RAG required 26-33% fewer context tokens, while  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ for root-level community summaries ( C0 ), it required over 97% fewer tokens. For a modest drop in performance  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ compared with other global methods, root-level Graph RAG offers a highly efficient method for the iterative     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ question answering that characterizes sensemaking activity, while retaining advantages in comprehensiveness     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ (72% win rate) and diversity (62% win rate) over na¨ıve RAG.                                                    │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ We have presented a global approach to Graph RAG, combining knowledge graph generation, retrieval-augmented     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ generation (RAG), and query-focused summarization (QFS) to support human sensemaking over entire text corpora.  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Initial evaluations show substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ diversity of answers, as well as favorable comparisons to a global but graph-free approach using map-reduce     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ source text summarization. For situations requiring many global queries over the same dataset, summaries of     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ root-level communities in the entity-based graph index provide a data index that is both superior to na¨ıve RAG │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ and achieves competitive performance to other global methods at a fraction of the token cost.                   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Trade-offs of building a graph index . We consistently observed Graph RAG achieve the best headto-head results  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ against other methods, but in many cases the graph-free approach to global summarization of source texts        │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ performed competitively. The real-world decision about whether to invest in building a graph index depends on   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ multiple factors, including the compute budget, expected number of lifetime queries per dataset, and value      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ obtained from other aspects of the graph index (including the generic community summaries and the use of other  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ graph-related RAG approaches).                                                                                  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Future work . The graph index, rich text annotations, and hierarchical community structure supporting the       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ current Graph RAG approach offer many possibilities for refinement and adaptation. This includes RAG approaches │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ that operate in a more local manner, via embedding-based matching of user queries and graph annotations, as     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ well as the possibility of hybrid RAG schemes that combine embedding-based matching against community reports   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ before employing our map-reduce summarization mechanisms. This 'roll-up' operation could also be extended       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ across more levels of the community hierarchy, as well as implemented as a more exploratory 'drill down'        │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ mechanism that follows the information scent contained in higher-level community summaries.                     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Advanced RAG systems include pre-retrieval, retrieval, post-retrieval strategies designed to overcome the       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ drawbacks of Na¨ıve RAG, while Modular RAG systems include patterns for iterative and dynamic cycles of         │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ interleaved retrieval and generation (Gao et al., 2023). Our implementation of Graph RAG incorporates multiple  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ concepts related to other systems. For example, our community summaries are a kind of self-memory (Selfmem,     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Cheng et al., 2024) for generation-augmented retrieval (GAR, Mao et al., 2020) that facilitates future          │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ generation cycles, while our parallel generation of community answers from these summaries is a kind of         │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ iterative (Iter-RetGen, Shao et al., 2023) or federated (FeB4RAG, Wang et al., 2024) retrieval-generation       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ strategy. Other systems have also combined these concepts for multi-document summarization (CAiRE-COVID, Su et  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ al., 2020) and multi-hop question answering (ITRG, Feng et al., 2023; IR-CoT, Trivedi et al., 2022; DSP,        │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Khattab et al., 2022). Our use of a hierarchical index and summarization also bears resemblance to further      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ approaches, such as generating a hierarchical index of text chunks by clustering the vectors of text embeddings │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ (RAPTOR, Sarthi et al., 2024) or generating a 'tree of clarifications' to answer multiple interpretations of    │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ambiguous questions (Kim et al., 2023). However, none of these iterative or hierarchical approaches use the     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ kind of self-generated graph index that enables Graph RAG.                                                      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ source enables large language models (LLMs) to answer questions over private and/or previously unseen document  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ collections. However, RAG fails on global questions directed at an entire text corpus, such as 'What are the    │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ main themes in the dataset?', since this is inherently a queryfocused summarization (QFS) task, rather than an  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ explicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ typical RAGsystems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ question answering over private text corpora that scales with both the generality of user questions and the     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ quantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two        │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ stages: first to derive an entity knowledge graph from the source documents, then to pregenerate community      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ summaries for all groups of closely-related entities. Given a question, each community summary is used to       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ generate a partial response, before all partial responses are again summarized in a final response to the user. │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ leads to substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and diversity of    │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ generated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ forthcoming at https://aka . ms/graphrag .                                                                      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Given the multi-stage nature of our Graph RAG mechanism, the multiple conditions we wanted to compare, and the  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ lack of gold standard answers to our activity-based sensemaking questions, we decided to adopt a head-to-head   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ comparison approach using an LLM evaluator. We selected three target metrics capturing qualities that are       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ desirable for sensemaking activities, as well as a control metric (directness) used as a indicator of validity. │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Since directness is effectively in opposition to comprehensiveness and diversity, we would not expect any       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ method to win across all four metrics.                                                                          │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Figure 1: Graph RAG pipeline using an LLM-derived graph index of source document text. This index spans nodes   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ (e.g., entities), edges (e.g., relationships), and covariates (e.g., claims) that have been detected,           │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ extracted, and summarized by LLM prompts tailored to the domain of the dataset. Community detection (e.g.,      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Leiden, Traag et al., 2019) is used to partition the graph index into groups of elements (nodes, edges,         │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ covariates) that the LLM can summarize in parallel at both indexing time and query time. The 'global answer' to │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ a given query is produced using a final round of query-focused summarization over all community summaries       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ reporting relevance to that query.                                                                              │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Retrieval-augmented generation (RAG, Lewis et al., 2020) is an established approach to answering user questions │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ over entire datasets, but it is designed for situations where these answers are contained locally within        │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ regions of text whose retrieval provides sufficient grounding for the generation task. Instead, a more          │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ appropriate task framing is query-focused summarization (QFS, Dang, 2006), and in particular, query-focused     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ abstractive summarization that generates natural language summaries and not just concatenated excerpts (Baumel  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ et al., 2018; Laskar et al., 2020; Yao et al., 2017) . In recent years, however, such distinctions between      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ summarization tasks that are abstractive versus extractive, generic versus query-focused, and single-document   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ versus multi-document, have become less relevant. While early applications of the transformer architecture      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ showed substantial improvements on the state-of-the-art for all such summarization tasks (Goodwin et al., 2020; │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Laskar et al., 2022; Liu and Lapata, 2019), these tasks are now trivialized by modern LLMs, including the GPT   │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ (Achiam et al., 2023; Brown et al., 2020), Llama (Touvron et al., 2023), and Gemini (Anil et al., 2023) series, │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ all of which can use in-context learning to summarize any content provided in their context window.             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ ---                                                                                                             │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ community descriptions provide complete coverage of the underlying graph index and the input documents it       │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ represents. Query-focused summarization of an entire corpus is then made possible using a map-reduce approach:  │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ first using each community summary to answer the query independently and in parallel, then summarizing all      │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ relevant partial answers into a final global answer.                                                            │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Question: What are the main advantages of using the Graph RAG approach for query-focused summarization compared │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ to traditional RAG methods?                                                                                     │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│ Answer:                                                                                                         │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;31m╭─\u001b[0m\u001b[1;31m─────────────────────────────────────────────────\u001b[0m RAG Prompt \u001b[1;31m──────────────────────────────────────────────────\u001b[0m\u001b[1;31m─╮\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m                                                                                                               \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mYou are an AI assistant helping answering questions about Microsoft GraphRAG.\u001b[0m\u001b[1;31m                                  \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mUse ONLY the text below to answer the user's question.\u001b[0m\u001b[1;31m                                                         \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mIf the answer isn't in the text, say you don't know.\u001b[0m\u001b[1;31m                                                           \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m                                                                                                               \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mContext:\u001b[0m\u001b[1;31m                                                                                                       \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mCommunity summaries vs. source texts. When comparing community summaries to source texts using Graph RAG, \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcommunity summaries generally provided a small but consistent improvement in answer comprehensiveness and \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mdiversity, except for root-level summaries. Intermediate-level summaries in the Podcast dataset and low-level \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcommunity summaries in the News dataset achieved comprehensiveness win rates of 57% and 64%, respectively. \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mDiversity win rates were 57% for Podcast intermediate-level summaries and 60% for News low-level community \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msummaries. Table 3 also illustrates the scalability advantages of Graph RAG compared to source text \u001b[0m\u001b[1;31m           \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msummarization: for low-level community summaries ( C3 ), Graph RAG required 26-33% fewer context tokens, while \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mfor root-level community summaries ( C0 ), it required over 97% fewer tokens. For a modest drop in performance \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcompared with other global methods, root-level Graph RAG offers a highly efficient method for the iterative \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mquestion answering that characterizes sensemaking activity, while retaining advantages in comprehensiveness \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m(72% win rate) and diversity (62% win rate) over na¨ıve RAG.\u001b[0m\u001b[1;31m                                                   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mWe have presented a global approach to Graph RAG, combining knowledge graph generation, retrieval-augmented \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgeneration (RAG), and query-focused summarization (QFS) to support human sensemaking over entire text corpora. \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mInitial evaluations show substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mdiversity of answers, as well as favorable comparisons to a global but graph-free approach using map-reduce \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msource text summarization. For situations requiring many global queries over the same dataset, summaries of \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mroot-level communities in the entity-based graph index provide a data index that is both superior to na¨ıve RAG\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mand achieves competitive performance to other global methods at a fraction of the token cost.\u001b[0m\u001b[1;31m                  \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mTrade-offs of building a graph index . We consistently observed Graph RAG achieve the best headto-head results \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31magainst other methods, but in many cases the graph-free approach to global summarization of source texts \u001b[0m\u001b[1;31m      \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mperformed competitively. The real-world decision about whether to invest in building a graph index depends on \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mmultiple factors, including the compute budget, expected number of lifetime queries per dataset, and value \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mobtained from other aspects of the graph index (including the generic community summaries and the use of other \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgraph-related RAG approaches).\u001b[0m\u001b[1;31m                                                                                 \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mFuture work . The graph index, rich text annotations, and hierarchical community structure supporting the \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcurrent Graph RAG approach offer many possibilities for refinement and adaptation. This includes RAG approaches\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mthat operate in a more local manner, via embedding-based matching of user queries and graph annotations, as \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mwell as the possibility of hybrid RAG schemes that combine embedding-based matching against community reports \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mbefore employing our map-reduce summarization mechanisms. This 'roll-up' operation could also be extended \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31macross more levels of the community hierarchy, as well as implemented as a more exploratory 'drill down' \u001b[0m\u001b[1;31m      \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mmechanism that follows the information scent contained in higher-level community summaries.\u001b[0m\u001b[1;31m                    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mAdvanced RAG systems include pre-retrieval, retrieval, post-retrieval strategies designed to overcome the \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mdrawbacks of Na¨ıve RAG, while Modular RAG systems include patterns for iterative and dynamic cycles of \u001b[0m\u001b[1;31m       \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31minterleaved retrieval and generation (Gao et al., 2023). Our implementation of Graph RAG incorporates multiple \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mconcepts related to other systems. For example, our community summaries are a kind of self-memory (Selfmem, \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mCheng et al., 2024) for generation-augmented retrieval (GAR, Mao et al., 2020) that facilitates future \u001b[0m\u001b[1;31m        \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgeneration cycles, while our parallel generation of community answers from these summaries is a kind of \u001b[0m\u001b[1;31m       \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31miterative (Iter-RetGen, Shao et al., 2023) or federated (FeB4RAG, Wang et al., 2024) retrieval-generation \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mstrategy. Other systems have also combined these concepts for multi-document summarization (CAiRE-COVID, Su et \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mal., 2020) and multi-hop question answering (ITRG, Feng et al., 2023; IR-CoT, Trivedi et al., 2022; DSP, \u001b[0m\u001b[1;31m      \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mKhattab et al., 2022). Our use of a hierarchical index and summarization also bears resemblance to further \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mapproaches, such as generating a hierarchical index of text chunks by clustering the vectors of text embeddings\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m(RAPTOR, Sarthi et al., 2024) or generating a 'tree of clarifications' to answer multiple interpretations of \u001b[0m\u001b[1;31m  \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mambiguous questions (Kim et al., 2023). However, none of these iterative or hierarchical approaches use the \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mkind of self-generated graph index that enables Graph RAG.\u001b[0m\u001b[1;31m                                                     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mThe use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msource enables large language models (LLMs) to answer questions over private and/or previously unseen document \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcollections. However, RAG fails on global questions directed at an entire text corpus, such as 'What are the \u001b[0m\u001b[1;31m  \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mmain themes in the dataset?', since this is inherently a queryfocused summarization (QFS) task, rather than an \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mexplicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mtypical RAGsystems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mquestion answering over private text corpora that scales with both the generality of user questions and the \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mquantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two \u001b[0m\u001b[1;31m      \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mstages: first to derive an entity knowledge graph from the source documents, then to pregenerate community \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msummaries for all groups of closely-related entities. Given a question, each community summary is used to \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgenerate a partial response, before all partial responses are again summarized in a final response to the user.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mFor a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mleads to substantial improvements over a na¨ıve RAG baseline for both the comprehensiveness and diversity of \u001b[0m\u001b[1;31m  \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mgenerated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mforthcoming at https://aka . ms/graphrag .\u001b[0m\u001b[1;31m                                                                     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mGiven the multi-stage nature of our Graph RAG mechanism, the multiple conditions we wanted to compare, and the \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mlack of gold standard answers to our activity-based sensemaking questions, we decided to adopt a head-to-head \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcomparison approach using an LLM evaluator. We selected three target metrics capturing qualities that are \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mdesirable for sensemaking activities, as well as a control metric (directness) used as a indicator of validity.\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mSince directness is effectively in opposition to comprehensiveness and diversity, we would not expect any \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mmethod to win across all four metrics.\u001b[0m\u001b[1;31m                                                                         \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mFigure 1: Graph RAG pipeline using an LLM-derived graph index of source document text. This index spans nodes \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m(e.g., entities), edges (e.g., relationships), and covariates (e.g., claims) that have been detected, \u001b[0m\u001b[1;31m         \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mextracted, and summarized by LLM prompts tailored to the domain of the dataset. Community detection (e.g., \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mLeiden, Traag et al., 2019) is used to partition the graph index into groups of elements (nodes, edges, \u001b[0m\u001b[1;31m       \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcovariates) that the LLM can summarize in parallel at both indexing time and query time. The 'global answer' to\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31ma given query is produced using a final round of query-focused summarization over all community summaries \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mreporting relevance to that query.\u001b[0m\u001b[1;31m                                                                             \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mRetrieval-augmented generation (RAG, Lewis et al., 2020) is an established approach to answering user questions\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mover entire datasets, but it is designed for situations where these answers are contained locally within \u001b[0m\u001b[1;31m      \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mregions of text whose retrieval provides sufficient grounding for the generation task. Instead, a more \u001b[0m\u001b[1;31m        \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mappropriate task framing is query-focused summarization (QFS, Dang, 2006), and in particular, query-focused \u001b[0m\u001b[1;31m   \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mabstractive summarization that generates natural language summaries and not just concatenated excerpts (Baumel \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31met al., 2018; Laskar et al., 2020; Yao et al., 2017) . In recent years, however, such distinctions between \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31msummarization tasks that are abstractive versus extractive, generic versus query-focused, and single-document \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mversus multi-document, have become less relevant. While early applications of the transformer architecture \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mshowed substantial improvements on the state-of-the-art for all such summarization tasks (Goodwin et al., 2020;\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mLaskar et al., 2022; Liu and Lapata, 2019), these tasks are now trivialized by modern LLMs, including the GPT \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m(Achiam et al., 2023; Brown et al., 2020), Llama (Touvron et al., 2023), and Gemini (Anil et al., 2023) series,\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mall of which can use in-context learning to summarize any content provided in their context window.\u001b[0m\u001b[1;31m            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m---\u001b[0m\u001b[1;31m                                                                                                            \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mcommunity descriptions provide complete coverage of the underlying graph index and the input documents it \u001b[0m\u001b[1;31m     \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mrepresents. Query-focused summarization of an entire corpus is then made possible using a map-reduce approach: \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mfirst using each community summary to answer the query independently and in parallel, then summarizing all \u001b[0m\u001b[1;31m    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mrelevant partial answers into a final global answer.\u001b[0m\u001b[1;31m                                                           \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m                                                                                                               \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mQuestion: What are the main advantages of using the Graph RAG approach for query-focused summarization compared\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mto traditional RAG methods?\u001b[0m\u001b[1;31m                                                                                    \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31mAnswer:\u001b[0m\u001b[1;31m                                                                                                        \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m│\u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m                                                                                                               \u001b[0m\u001b[1;31m \u001b[0m\u001b[1;31m│\u001b[0m\n",
+       "\u001b[1;31m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">╭─────────────────────────────────────────────────</span> RAG Response <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">──────────────────────────────────────────────────╮</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ The main advantages of using the Graph RAG approach for query-focused summarization compared to traditional RAG │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ methods include:                                                                                                │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 1. **Improved Comprehensiveness and Diversity**: Graph RAG shows substantial improvements over a naïve RAG      │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ baseline in terms of the comprehensiveness and diversity of answers. This is particularly beneficial for global │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ sensemaking questions over large datasets.                                                                      │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 2. **Scalability**: Graph RAG provides scalability advantages, achieving efficient summarization with           │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ significantly fewer context tokens required. For instance, it requires 26-33% fewer tokens for low-level        │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ community summaries and over 97% fewer tokens for root-level summaries compared to source text summarization.   │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 3. **Efficiency in Iterative Question Answering**: Root-level Graph RAG offers a highly efficient method for    │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ iterative question answering, which is crucial for sensemaking activities, with only a modest drop in           │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ performance compared to other global methods.                                                                   │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 4. **Global Query Handling**: It supports handling global queries effectively, as it combines knowledge graph   │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ generation, retrieval-augmented generation, and query-focused summarization, making it suitable for sensemaking │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ over entire text corpora.                                                                                       │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 5. **Hierarchical Indexing and Summarization**: The use of a hierarchical index and summarization allows for    │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ efficient processing and summarizing of community summaries into a final global answer, facilitating a          │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ comprehensive coverage of the underlying graph index and input documents.                                       │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│                                                                                                                 │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ 6. **Reduced Token Cost**: For situations requiring many global queries over the same dataset, Graph RAG        │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">│ achieves competitive performance to other global methods at a fraction of the token cost.                       │</span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;32m╭─\u001b[0m\u001b[1;32m────────────────────────────────────────────────\u001b[0m RAG Response \u001b[1;32m─────────────────────────────────────────────────\u001b[0m\u001b[1;32m─╮\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mThe main advantages of using the Graph RAG approach for query-focused summarization compared to traditional RAG\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mmethods include:\u001b[0m\u001b[1;32m                                                                                               \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m                                                                                                               \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m1. **Improved Comprehensiveness and Diversity**: Graph RAG shows substantial improvements over a naïve RAG \u001b[0m\u001b[1;32m    \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mbaseline in terms of the comprehensiveness and diversity of answers. This is particularly beneficial for global\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32msensemaking questions over large datasets.\u001b[0m\u001b[1;32m                                                                     \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m                                                                                                               \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m2. **Scalability**: Graph RAG provides scalability advantages, achieving efficient summarization with \u001b[0m\u001b[1;32m         \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32msignificantly fewer context tokens required. For instance, it requires 26-33% fewer tokens for low-level \u001b[0m\u001b[1;32m      \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mcommunity summaries and over 97% fewer tokens for root-level summaries compared to source text summarization.\u001b[0m\u001b[1;32m  \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m                                                                                                               \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m3. **Efficiency in Iterative Question Answering**: Root-level Graph RAG offers a highly efficient method for \u001b[0m\u001b[1;32m  \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32miterative question answering, which is crucial for sensemaking activities, with only a modest drop in \u001b[0m\u001b[1;32m         \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mperformance compared to other global methods.\u001b[0m\u001b[1;32m                                                                  \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m                                                                                                               \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m4. **Global Query Handling**: It supports handling global queries effectively, as it combines knowledge graph \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mgeneration, retrieval-augmented generation, and query-focused summarization, making it suitable for sensemaking\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mover entire text corpora.\u001b[0m\u001b[1;32m                                                                                      \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m                                                                                                               \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m5. **Hierarchical Indexing and Summarization**: The use of a hierarchical index and summarization allows for \u001b[0m\u001b[1;32m  \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mefficient processing and summarizing of community summaries into a final global answer, facilitating a \u001b[0m\u001b[1;32m        \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mcomprehensive coverage of the underlying graph index and input documents.\u001b[0m\u001b[1;32m                                      \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m                                                                                                               \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m6. **Reduced Token Cost**: For situations requiring many global queries over the same dataset, Graph RAG \u001b[0m\u001b[1;32m      \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32machieves competitive performance to other global methods at a fraction of the token cost.\u001b[0m\u001b[1;32m                      \u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
+       "\u001b[1;32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from azure.search.documents.models import VectorizableTextQuery\n",
+    "\n",
+    "\n",
+    "def generate_chat_response(prompt: str, system_message: str = None):\n",
+    "    \"\"\"\n",
+    "    Generates a single-turn chat response using Azure OpenAI Chat.\n",
+    "    If you need multi-turn conversation or follow-up queries, you'll have to\n",
+    "    maintain the messages list externally.\n",
+    "    \"\"\"\n",
+    "    messages = []\n",
+    "    if system_message:\n",
+    "        messages.append({\"role\": \"system\", \"content\": system_message})\n",
+    "    messages.append({\"role\": \"user\", \"content\": prompt})\n",
+    "\n",
+    "    completion = openai_client.chat.completions.create(\n",
+    "        model=AZURE_OPENAI_CHAT_MODEL, messages=messages, temperature=0.7\n",
+    "    )\n",
+    "    return completion.choices[0].message.content\n",
+    "\n",
+    "\n",
+    "user_query = \"What are the main advantages of using the Graph RAG approach for query-focused summarization compared to traditional RAG methods?\"\n",
+    "user_embed = embed_text(user_query)\n",
+    "\n",
+    "vector_query = VectorizableTextQuery(\n",
+    "    text=user_query,  # passing in text for a hybrid search\n",
+    "    k_nearest_neighbors=5,\n",
+    "    fields=\"content_vector\",\n",
+    ")\n",
+    "\n",
+    "search_results = search_client.search(\n",
+    "    search_text=user_query, vector_queries=[vector_query], select=[\"content\"], top=10\n",
+    ")\n",
+    "\n",
+    "retrieved_chunks = []\n",
+    "for result in search_results:\n",
+    "    snippet = result[\"content\"]\n",
+    "    retrieved_chunks.append(snippet)\n",
+    "\n",
+    "context_str = \"\\n---\\n\".join(retrieved_chunks)\n",
+    "rag_prompt = f\"\"\"\n",
+    "You are an AI assistant helping answering questions about Microsoft GraphRAG.\n",
+    "Use ONLY the text below to answer the user's question.\n",
+    "If the answer isn't in the text, say you don't know.\n",
+    "\n",
+    "Context:\n",
+    "{context_str}\n",
+    "\n",
+    "Question: {user_query}\n",
+    "Answer:\n",
+    "\"\"\"\n",
+    "\n",
+    "final_answer = generate_chat_response(rag_prompt)\n",
+    "\n",
+    "console.print(Panel(rag_prompt, title=\"RAG Prompt\", style=\"bold red\"))\n",
+    "console.print(Panel(final_answer, title=\"RAG Response\", style=\"bold green\"))"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/mkdocs.yml b/mkdocs.yml
index 8f8d86d9..0f3e9dd0 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -76,15 +76,17 @@ nav:
       - "Multimodal export": examples/export_multimodal.py
       - "Force full page OCR": examples/full_page_ocr.py
       - "Accelerator options": examples/run_with_accelerator.py
-      - "Simple translation": examples/translate.py      
+      - "Simple translation": examples/translate.py
     - ✂️ Chunking:
-      - "Hybrid chunking": examples/hybrid_chunking.ipynb
-    - 💬 RAG / QA:
+      - examples/hybrid_chunking.ipynb
+    - 🤖 RAG with AI dev frameworks:
       - examples/rag_haystack.ipynb
-      - examples/rag_llamaindex.ipynb
       - examples/rag_langchain.ipynb
+      - examples/rag_llamaindex.ipynb
+    - 🗂️ More examples:
       - examples/rag_weaviate.ipynb
       - RAG with Granite [↗]: https://github.com/ibm-granite-community/granite-snack-cookbook/blob/main/recipes/RAG/Granite_Docling_RAG.ipynb
+      - examples/rag_azuresearch.ipynb
       - examples/retrieval_qdrant.ipynb
   - Integrations:
     - Integrations: integrations/index.md