Remove aurora / pgvector / ECS embeddings for V2 (#561)

* remove aurora vectore store * remove ECS, cron schedule, vpc * remove embedding * fix api publish stack * remove useBedrockKnowledgeBaseForRag option * remove EmbeddingParams / SearchParams * remove EmbeddingParams / SearchParams from frontend * mv EDGE_SEARCH_PARAMS to KB * update github workflow * remove KB_ENABLED condition from frontend * hide prev bots before v2 * fix: pass ci * fix: usecase find_all_bots * fix: kb attribute required on KBeditPage * doc: Update readme * doc: add tip * doc: fix * update bin.sh * fix: link to the doc on bin.sh * change default version to v2 (easy deployment) * add replica opiton for opensaerch serverless * doc: add warning on migration guide * fix: shorten vector collection name to meet reguration (max 32 chars) * fix: description on s3 datasource * fix
aws-samples · Oct 15, 2024 · 7974b85 · 7974b85
1 parent 101dc79
commit 7974b85
Show file tree

Hide file tree

Showing 86 changed files with 605 additions and 11,348 deletions.
diff --git a/.github/workflows/backend.yml b/.github/workflows/backend.yml
@@ -5,6 +5,7 @@ on:
     branches:
       - main
       - v1
+      - v2
   workflow_dispatch: {}
 
 jobs:
@@ -60,7 +61,3 @@ jobs:
         run: |
           cd backend
           docker build -t backend .
-      - name: Build backend/embedding/Dockerfile
-        run: |
-          cd backend
-          docker build -t embedding -f embedding/Dockerfile .
diff --git a/.github/workflows/cdk.yml b/.github/workflows/cdk.yml
@@ -5,6 +5,7 @@ on:
     branches:
       - main
       - v1
+      - v2
   workflow_dispatch: {}
 
 jobs:
@@ -18,9 +19,6 @@ jobs:
           node-version: "18.x"
       - run: |
           cd cdk
-          cd custom-resources/setup-pgvector
-          npm ci
-          cd ../..
           npm ci
           npx cdk synth
   cdk-test:
@@ -33,8 +31,5 @@ jobs:
           node-version: "18.x"
       - run: |
           cd cdk
-          cd custom-resources/setup-pgvector
-          npm ci
-          cd ../..
           npm ci
           npm run test
diff --git a/.github/workflows/deploy-redoc.yml b/.github/workflows/deploy-redoc.yml
@@ -5,6 +5,7 @@ on:
     branches:
       - main
       - v1
+      - v2
 jobs:
   deploy-docs:
     runs-on: ubuntu-latest

diff --git a/.github/workflows/frontend.yml b/.github/workflows/frontend.yml
@@ -5,6 +5,7 @@ on:
     branches:
       - main
       - v1
+      - v2
   workflow_dispatch: {}
 
 jobs:

diff --git a/.github/workflows/update-contributor-highlights.yml b/.github/workflows/update-contributor-highlights.yml
@@ -156,4 +156,4 @@ jobs:
           author: github-actions <[email protected]>
           committer: github-actions <[email protected]>
           signoff: true
-          base: v1
+          base: v2
diff --git a/README.md b/README.md
@@ -3,10 +3,7 @@
 ![](https://github.com/aws-samples/bedrock-claude-chat/actions/workflows/cdk.yml/badge.svg)
 
 > [!Warning]
-> A major update to v2 is planned soon. v2 will not be backward compatible with v1, and **existing RAG bots will no longer be usable.** For more details, please refer to the [migration guide](./docs/migration/V1_TO_V2.md).
-
-> [!Warning]
-> If you are using old version (e.g. `v0.4.x`) and wish to use the latest version, refer [migration guide](./docs/migration/V0_TO_V1.md). Without any care, **ALL DATA IN Aurora cluster WILL BE DESTROYED, and NO LONGER USERS CANNOT USE EXISTING BOTS WITH KNOWLEDGE AND CREATE NEW BOTS**.
+> **V2 released. To update, please carefully review the [migration guide](./docs/migration/V1_TO_V2.md).** Without any care, **BOTS FROM V1 WILL BECOME UNUSABLE.**
 
 This repository is a sample chatbot using the Anthropic company's LLM [Claude](https://www.anthropic.com/), one of the foundational models provided by [Amazon Bedrock](https://aws.amazon.com/bedrock/) for generative AI.
 
@@ -22,7 +19,7 @@ Not only text but also images are available with [Anthropic's Claude 3](https://
 
 ### Bot Personalization
 
-Add your own instruction and give external knowledge as URL or files (a.k.a [RAG](./docs/RAG.md)). The bot can be shared among application users. The customized bot also can be published as stand-alone API (See the [detail](./docs/PUBLISH_API.md)).
+Add your own instruction and give external knowledge as URL or files (a.k.a [RAG](https://aws.amazon.com/what-is/retrieval-augmented-generation/). The bot can be shared among application users. The customized bot also can be published as stand-alone API (See the [detail](./docs/PUBLISH_API.md)).
 
 ![](./docs/imgs/bot_creation.png)
 ![](./docs/imgs/bot_chat.png)
@@ -86,7 +83,7 @@ chmod +x bin.sh
 ./bin.sh
 ```
 
-- You will be asked if a new user or using v1. If you are not a continuing user from v0, please enter `y`.
+- You will be asked if a new user or using v2. If you are not a continuing user from v0, please enter `y`.
 
 ### Optional Parameters
 
@@ -131,10 +128,11 @@ It's an architecture built on AWS managed services, eliminating the need for inf
 - [Amazon CloudFront](https://aws.amazon.com/cloudfront/) + [S3](https://aws.amazon.com/s3/): Frontend application delivery ([React](https://react.dev/), [Tailwind CSS](https://tailwindcss.com/))
 - [AWS WAF](https://aws.amazon.com/waf/): IP address restriction
 - [Amazon Cognito](https://aws.amazon.com/cognito/): User authentication
-- [Amazon Bedrock](https://aws.amazon.com/bedrock/): Managed service to utilize foundational models via APIs. Claude is used for chat response and Cohere for vector embedding
-- [Amazon EventBridge Pipes](https://aws.amazon.com/eventbridge/pipes/): Receiving event from DynamoDB stream and launching ECS task to embed external knowledge
-- [Amazon Elastic Container Service](https://aws.amazon.com/ecs/): Run crawling, parsing and embedding tasks. [Cohere Multilingual](https://txt.cohere.com/multilingual/) is the model used for embedding.
-- [Amazon Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/): Scalable vector store with [pgvector](https://github.com/pgvector/pgvector) plugin
+- [Amazon Bedrock](https://aws.amazon.com/bedrock/): Managed service to utilize foundational models via APIs
+- [Amazon Bedrock Knowledge Bases](https://aws.amazon.com/bedrock/knowledge-bases/): Provides a managed interface for Retrieval-Augmented Generation ([RAG](https://aws.amazon.com/what-is/retrieval-augmented-generation/)), offering services for embedding and parsing documents
+- [Amazon EventBridge Pipes](https://aws.amazon.com/eventbridge/pipes/): Receiving event from DynamoDB stream and launching Step Functions to embed external knowledge
+- [AWS Step Functions](https://aws.amazon.com/step-functions/): Orchestrating ingestion pipeline to embed external knowledge into Bedrock Knowledge Bases
+- [Amazon OpenSearch Serverless](https://aws.amazon.com/opensearch-service/features/serverless/): Serves as the backend database for Bedrock Knowledge Bases, providing full-text search and vector search capabilities, enabling accurate retrieval of relevant information
 - [Amazon Athena](https://aws.amazon.com/athena/): Query service to analyze S3 bucket
 
 ![](docs/imgs/arch.png)
@@ -231,30 +229,6 @@ DEFAULT_GENERATION_CONFIG = {
 
 If using cli and CDK, please `cdk destroy`. If not, access [CloudFormation](https://console.aws.amazon.com/cloudformation/home) and then delete `BedrockChatStack` and `FrontendWafStack` manually. Please note that `FrontendWafStack` is in `us-east-1` region.
 
-### Stopping Vector DB for RAG
-
-By setting [cdk.json](./cdk/cdk.json) in the following CRON format, you can stop and restart Aurora Serverless resources created by the [VectorStore construct](./cdk/lib/constructs/vectorstore.ts). Applying this setting can reduce operating costs. By default, Aurora Serverless is always running. Note that it will be executed in UTC time.
-
-```json
-...
-"rdbSchedules": {
-  "stop": {
-    "minute": "50",
-    "hour": "10",
-    "day": "*",
-    "month": "*",
-    "year": "*"
-  },
-  "start": {
-    "minute": "40",
-    "hour": "2",
-    "day": "*",
-    "month": "*",
-    "year": "*"
-  }
-}
-```
-
 ### Language Settings
 
 This asset automatically detects the language using [i18next-browser-languageDetector](https://github.com/i18next/i18next-browser-languageDetector). You can switch languages from the application menu. Alternatively, you can use Query String to set the language as shown below.
@@ -273,14 +247,6 @@ By default, this sample does not restrict the domains for sign-up email addresse
 "allowedSignUpEmailDomains": ["example.com"],
 ```
 
-### Customize Number of NAT Gateway
-
-By default, this sample deploys 2 NAT gateways, but you can change the number of NAT gateways if you don't need 2 NAT gateways to reduce costs. Open `cdk.json` and change this parameter 'number of NAT gateways'.
-
-```ts
-"natgatewayCount": 2
-```
-
 ### External Identity Provider
 
 This sample supports external identity provider. Currently we support [Google](./docs/idp/SET_UP_GOOGLE.md) and [custom OIDC provider](./docs/idp/SET_UP_CUSTOM_OIDC.md).
@@ -301,6 +267,19 @@ If you want newly created users to automatically join groups, you can specify th
 
 By default, newly created users will be joined to the `CreatingBotAllowed` group.
 
+### Configure RAG Replicas
+
+`enableRagReplicas` is an option in [cdk.json](./cdk/cdk.json) that controls the replica settings for the RAG database, specifically the Knowledge Bases using Amazon OpenSearch Serverless.
+
+- **Default**: true
+- **true**: Enhances availability by enabling additional replicas, making it suitable for production environments but increasing costs.
+- **false**: Reduces costs by using fewer replicas, making it suitable for development and testing.
+
+This is an account/region-level setting, affecting the entire application rather than individual bots.
+
+> [!Note]
+> As of June 2024, Amazon OpenSearch Serverless supports 0.5 OCU, lowering entry costs for small-scale workloads. Production deployments can start with 2 OCUs, while dev/test workloads can use 1 OCU. OpenSearch Serverless automatically scales based on workload demands. For more detail, visit [announcement](https://aws.amazon.com/jp/about-aws/whats-new/2024/06/amazon-opensearch-serverless-entry-cost-half-collection-types/).
+
 ### Local Development
 
 See [LOCAL DEVELOPMENT](./docs/LOCAL_DEVELOPMENT.md).
@@ -316,10 +295,6 @@ Please also take a look at the following guidelines before contributing:
 - [Local Development](./docs/LOCAL_DEVELOPMENT.md)
 - [CONTRIBUTING](./CONTRIBUTING.md)
 
-### RAG (Retrieval Augmented Generation)
-
-See [here](./docs/RAG.md).
-
 ## Contacts
 
 - [Takehiro Suzuki](https://github.com/statefb)

diff --git a/backend/app/bedrock.py b/backend/app/bedrock.py
@@ -5,7 +5,7 @@
 import re
 from pathlib import Path
 
-from app.config import BEDROCK_PRICING, DEFAULT_EMBEDDING_CONFIG
+from app.config import BEDROCK_PRICING
 from app.config import DEFAULT_GENERATION_CONFIG as DEFAULT_CLAUDE_GENERATION_CONFIG
 from app.config import DEFAULT_MISTRAL_GENERATION_CONFIG
 from app.repositories.models.conversation import ContentModel, MessageModel
@@ -309,51 +309,3 @@ def get_model_id(model: type_model_name) -> str:
         return "mistral.mixtral-8x7b-instruct-v0:1"
     elif model == "mistral-large":
         return "mistral.mistral-large-2402-v1:0"
-
-
-def calculate_query_embedding(question: str) -> list[float]:
-    model_id = DEFAULT_EMBEDDING_CONFIG["model_id"]
-
-    # Currently only supports "cohere.embed-multilingual-v3"
-    assert model_id == "cohere.embed-multilingual-v3"
-
-    payload = json.dumps({"texts": [question], "input_type": "search_query"})
-    accept = "application/json"
-    content_type = "application/json"
-
-    response = client.invoke_model(
-        accept=accept, contentType=content_type, body=payload, modelId=model_id
-    )
-    output = json.loads(response.get("body").read())
-    embedding = output.get("embeddings")[0]
-
-    return embedding
-
-
-def calculate_document_embeddings(documents: list[str]) -> list[list[float]]:
-    def _calculate_document_embeddings(documents: list[str]) -> list[list[float]]:
-        payload = json.dumps({"texts": documents, "input_type": "search_document"})
-        accept = "application/json"
-        content_type = "application/json"
-
-        response = client.invoke_model(
-            accept=accept, contentType=content_type, body=payload, modelId=model_id
-        )
-        output = json.loads(response.get("body").read())
-        embeddings = output.get("embeddings")
-
-        return embeddings
-
-    BATCH_SIZE = 10
-    model_id = DEFAULT_EMBEDDING_CONFIG["model_id"]
-
-    # Currently only supports "cohere.embed-multilingual-v3"
-    assert model_id == "cohere.embed-multilingual-v3"
-
-    embeddings = []
-    for i in range(0, len(documents), BATCH_SIZE):
-        # Split documents into batches to avoid exceeding the payload size limit
-        batch = documents[i : i + BATCH_SIZE]
-        embeddings += _calculate_document_embeddings(batch)
-
-    return embeddings
diff --git a/backend/app/bot_remove.py b/backend/app/bot_remove.py
@@ -1,55 +1,23 @@
-import json
 import os
 from typing import Any
 
 import boto3
-import pg8000
-from app.repositories.api_publication import delete_api_key, find_usage_plan_by_id
 from app.repositories.api_publication import (
+    delete_api_key,
     delete_stack_by_bot_id,
     find_stack_by_bot_id,
+    find_usage_plan_by_id,
 )
 from app.repositories.common import RecordNotFoundError, decompose_bot_id
-from aws_lambda_powertools.utilities import parameters
-from app.repositories.custom_bot import find_public_bot_by_id
 
-DB_SECRETS_ARN = os.environ.get("DB_SECRETS_ARN", "")
 DOCUMENT_BUCKET = os.environ.get("DOCUMENT_BUCKET", "documents")
 BEDROCK_REGION = os.environ.get("BEDROCK_REGION", "us-east-1")
 
 s3_client = boto3.client("s3", BEDROCK_REGION)
 
 
-def delete_from_postgres(bot_id: str):
-    """Delete data related to `bot_id` from vector store (i.e. PostgreSQL)."""
-
-    secrets: Any = parameters.get_secret(DB_SECRETS_ARN)  # type: ignore
-    db_info = json.loads(secrets)
-
-    conn = pg8000.connect(
-        database=db_info["dbname"],
-        host=db_info["host"],
-        port=db_info["port"],
-        user=db_info["username"],
-        password=db_info["password"],
-    )
-
-    try:
-        with conn.cursor() as cursor:
-            delete_query = "DELETE FROM items WHERE botid = %s"
-            cursor.execute(delete_query, (bot_id,))
-        conn.commit()
-        print(f"Successfully deleted records for bot_id: {bot_id}")
-    except Exception as e:
-        conn.rollback()
-        print(f"Error deleting records for bot_id: {bot_id}")
-        print(e)
-    finally:
-        conn.close()
-
-
-def delete_kb_stack_by_bot_id(bot_id: str):
-    client = boto3.client("cloudformation")
+def delete_custom_bot_stack_by_bot_id(bot_id: str):
+    client = boto3.client("cloudformation", BEDROCK_REGION)
     stack_name = f"BrChatKbStack{bot_id}"
     try:
         response = client.delete_stack(StackName=stack_name)
@@ -106,16 +74,9 @@ def handler(event, context):
     bot_id = decompose_bot_id(sk)
 
     delete_from_s3(user_id, bot_id)
+    delete_custom_bot_stack_by_bot_id(bot_id)
 
-    try:
-        print(f"Remove Bedrock Knowledge Base Stack.")
-        # Remove Knowledge Base Stack
-        delete_kb_stack_by_bot_id(bot_id)
-    except RecordNotFoundError:
-        print(f"Remove records from PostgreSQL.")
-        delete_from_postgres(bot_id)
-
-    # Check if cloudformation stack exists
+    # Check if api published stack exists
     try:
         stack = find_stack_by_bot_id(bot_id)
     except RecordNotFoundError:

diff --git a/backend/app/config.py b/backend/app/config.py
@@ -36,20 +36,6 @@ class EmbeddingConfig(TypedDict):
     "stop_sequences": ["[INST]", "[/INST]"],
 }
 
-# Configure embedding parameter.
-DEFAULT_EMBEDDING_CONFIG: EmbeddingConfig = {
-    # DO NOT change `model_id` (currently other models are not supported)
-    "model_id": "cohere.embed-multilingual-v3",
-    # NOTE: consider that cohere allows up to 2048 tokens per request
-    "chunk_size": 1000,
-    "chunk_overlap": 200,
-    "enable_partition_pdf": False,
-}
-
-# Configure search parameter to fetch relevant documents from vector store.
-DEFAULT_SEARCH_CONFIG = {
-    "max_results": 20,
-}
 
 # Used for price estimation.
 # NOTE: The following is based on 2024-03-07
-Original file line number
+Diff line change
@@ Expand Up / @@ -5,6 +5,7 @@ on: @@
         branches:
           - main
           - v1
+          - v2
     jobs:
       deploy-docs:
         runs-on: ubuntu-latest
@@ Expand Down @@