Skip to content

Commit

Permalink
Remove aurora / pgvector / ECS embeddings for V2 (#561)
Browse files Browse the repository at this point in the history
* remove aurora vectore store

* remove ECS, cron schedule, vpc

* remove embedding

* fix api publish stack

* remove useBedrockKnowledgeBaseForRag option

* remove EmbeddingParams / SearchParams

* remove EmbeddingParams / SearchParams from frontend

* mv EDGE_SEARCH_PARAMS to KB

* update github workflow

* remove KB_ENABLED condition from frontend

* hide prev bots before v2

* fix: pass ci

* fix: usecase find_all_bots

* fix: kb attribute required on KBeditPage

* doc: Update readme

* doc: add tip

* doc: fix

* update bin.sh

* fix: link to the doc on bin.sh

* change default version to v2 (easy deployment)

* add replica opiton for opensaerch serverless

* doc: add warning on migration guide

* fix: shorten vector collection name to meet reguration (max 32 chars)

* fix: description on s3 datasource

* fix
  • Loading branch information
statefb authored Oct 15, 2024
1 parent 101dc79 commit 7974b85
Show file tree
Hide file tree
Showing 86 changed files with 605 additions and 11,348 deletions.
5 changes: 1 addition & 4 deletions .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
branches:
- main
- v1
- v2
workflow_dispatch: {}

jobs:
Expand Down Expand Up @@ -60,7 +61,3 @@ jobs:
run: |
cd backend
docker build -t backend .
- name: Build backend/embedding/Dockerfile
run: |
cd backend
docker build -t embedding -f embedding/Dockerfile .
7 changes: 1 addition & 6 deletions .github/workflows/cdk.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
branches:
- main
- v1
- v2
workflow_dispatch: {}

jobs:
Expand All @@ -18,9 +19,6 @@ jobs:
node-version: "18.x"
- run: |
cd cdk
cd custom-resources/setup-pgvector
npm ci
cd ../..
npm ci
npx cdk synth
cdk-test:
Expand All @@ -33,8 +31,5 @@ jobs:
node-version: "18.x"
- run: |
cd cdk
cd custom-resources/setup-pgvector
npm ci
cd ../..
npm ci
npm run test
1 change: 1 addition & 0 deletions .github/workflows/deploy-redoc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
branches:
- main
- v1
- v2
jobs:
deploy-docs:
runs-on: ubuntu-latest
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/frontend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
branches:
- main
- v1
- v2
workflow_dispatch: {}

jobs:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/update-contributor-highlights.yml
Original file line number Diff line number Diff line change
Expand Up @@ -156,4 +156,4 @@ jobs:
author: github-actions <[email protected]>
committer: github-actions <[email protected]>
signoff: true
base: v1
base: v2
67 changes: 21 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,7 @@
![](https://github.com/aws-samples/bedrock-claude-chat/actions/workflows/cdk.yml/badge.svg)

> [!Warning]
> A major update to v2 is planned soon. v2 will not be backward compatible with v1, and **existing RAG bots will no longer be usable.** For more details, please refer to the [migration guide](./docs/migration/V1_TO_V2.md).
> [!Warning]
> If you are using old version (e.g. `v0.4.x`) and wish to use the latest version, refer [migration guide](./docs/migration/V0_TO_V1.md). Without any care, **ALL DATA IN Aurora cluster WILL BE DESTROYED, and NO LONGER USERS CANNOT USE EXISTING BOTS WITH KNOWLEDGE AND CREATE NEW BOTS**.
> **V2 released. To update, please carefully review the [migration guide](./docs/migration/V1_TO_V2.md).** Without any care, **BOTS FROM V1 WILL BECOME UNUSABLE.**
This repository is a sample chatbot using the Anthropic company's LLM [Claude](https://www.anthropic.com/), one of the foundational models provided by [Amazon Bedrock](https://aws.amazon.com/bedrock/) for generative AI.

Expand All @@ -22,7 +19,7 @@ Not only text but also images are available with [Anthropic's Claude 3](https://

### Bot Personalization

Add your own instruction and give external knowledge as URL or files (a.k.a [RAG](./docs/RAG.md)). The bot can be shared among application users. The customized bot also can be published as stand-alone API (See the [detail](./docs/PUBLISH_API.md)).
Add your own instruction and give external knowledge as URL or files (a.k.a [RAG](https://aws.amazon.com/what-is/retrieval-augmented-generation/). The bot can be shared among application users. The customized bot also can be published as stand-alone API (See the [detail](./docs/PUBLISH_API.md)).

![](./docs/imgs/bot_creation.png)
![](./docs/imgs/bot_chat.png)
Expand Down Expand Up @@ -86,7 +83,7 @@ chmod +x bin.sh
./bin.sh
```

- You will be asked if a new user or using v1. If you are not a continuing user from v0, please enter `y`.
- You will be asked if a new user or using v2. If you are not a continuing user from v0, please enter `y`.

### Optional Parameters

Expand Down Expand Up @@ -131,10 +128,11 @@ It's an architecture built on AWS managed services, eliminating the need for inf
- [Amazon CloudFront](https://aws.amazon.com/cloudfront/) + [S3](https://aws.amazon.com/s3/): Frontend application delivery ([React](https://react.dev/), [Tailwind CSS](https://tailwindcss.com/))
- [AWS WAF](https://aws.amazon.com/waf/): IP address restriction
- [Amazon Cognito](https://aws.amazon.com/cognito/): User authentication
- [Amazon Bedrock](https://aws.amazon.com/bedrock/): Managed service to utilize foundational models via APIs. Claude is used for chat response and Cohere for vector embedding
- [Amazon EventBridge Pipes](https://aws.amazon.com/eventbridge/pipes/): Receiving event from DynamoDB stream and launching ECS task to embed external knowledge
- [Amazon Elastic Container Service](https://aws.amazon.com/ecs/): Run crawling, parsing and embedding tasks. [Cohere Multilingual](https://txt.cohere.com/multilingual/) is the model used for embedding.
- [Amazon Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/): Scalable vector store with [pgvector](https://github.com/pgvector/pgvector) plugin
- [Amazon Bedrock](https://aws.amazon.com/bedrock/): Managed service to utilize foundational models via APIs
- [Amazon Bedrock Knowledge Bases](https://aws.amazon.com/bedrock/knowledge-bases/): Provides a managed interface for Retrieval-Augmented Generation ([RAG](https://aws.amazon.com/what-is/retrieval-augmented-generation/)), offering services for embedding and parsing documents
- [Amazon EventBridge Pipes](https://aws.amazon.com/eventbridge/pipes/): Receiving event from DynamoDB stream and launching Step Functions to embed external knowledge
- [AWS Step Functions](https://aws.amazon.com/step-functions/): Orchestrating ingestion pipeline to embed external knowledge into Bedrock Knowledge Bases
- [Amazon OpenSearch Serverless](https://aws.amazon.com/opensearch-service/features/serverless/): Serves as the backend database for Bedrock Knowledge Bases, providing full-text search and vector search capabilities, enabling accurate retrieval of relevant information
- [Amazon Athena](https://aws.amazon.com/athena/): Query service to analyze S3 bucket

![](docs/imgs/arch.png)
Expand Down Expand Up @@ -231,30 +229,6 @@ DEFAULT_GENERATION_CONFIG = {

If using cli and CDK, please `cdk destroy`. If not, access [CloudFormation](https://console.aws.amazon.com/cloudformation/home) and then delete `BedrockChatStack` and `FrontendWafStack` manually. Please note that `FrontendWafStack` is in `us-east-1` region.

### Stopping Vector DB for RAG

By setting [cdk.json](./cdk/cdk.json) in the following CRON format, you can stop and restart Aurora Serverless resources created by the [VectorStore construct](./cdk/lib/constructs/vectorstore.ts). Applying this setting can reduce operating costs. By default, Aurora Serverless is always running. Note that it will be executed in UTC time.

```json
...
"rdbSchedules": {
"stop": {
"minute": "50",
"hour": "10",
"day": "*",
"month": "*",
"year": "*"
},
"start": {
"minute": "40",
"hour": "2",
"day": "*",
"month": "*",
"year": "*"
}
}
```

### Language Settings

This asset automatically detects the language using [i18next-browser-languageDetector](https://github.com/i18next/i18next-browser-languageDetector). You can switch languages from the application menu. Alternatively, you can use Query String to set the language as shown below.
Expand All @@ -273,14 +247,6 @@ By default, this sample does not restrict the domains for sign-up email addresse
"allowedSignUpEmailDomains": ["example.com"],
```

### Customize Number of NAT Gateway

By default, this sample deploys 2 NAT gateways, but you can change the number of NAT gateways if you don't need 2 NAT gateways to reduce costs. Open `cdk.json` and change this parameter 'number of NAT gateways'.

```ts
"natgatewayCount": 2
```

### External Identity Provider

This sample supports external identity provider. Currently we support [Google](./docs/idp/SET_UP_GOOGLE.md) and [custom OIDC provider](./docs/idp/SET_UP_CUSTOM_OIDC.md).
Expand All @@ -301,6 +267,19 @@ If you want newly created users to automatically join groups, you can specify th

By default, newly created users will be joined to the `CreatingBotAllowed` group.

### Configure RAG Replicas

`enableRagReplicas` is an option in [cdk.json](./cdk/cdk.json) that controls the replica settings for the RAG database, specifically the Knowledge Bases using Amazon OpenSearch Serverless.

- **Default**: true
- **true**: Enhances availability by enabling additional replicas, making it suitable for production environments but increasing costs.
- **false**: Reduces costs by using fewer replicas, making it suitable for development and testing.

This is an account/region-level setting, affecting the entire application rather than individual bots.

> [!Note]
> As of June 2024, Amazon OpenSearch Serverless supports 0.5 OCU, lowering entry costs for small-scale workloads. Production deployments can start with 2 OCUs, while dev/test workloads can use 1 OCU. OpenSearch Serverless automatically scales based on workload demands. For more detail, visit [announcement](https://aws.amazon.com/jp/about-aws/whats-new/2024/06/amazon-opensearch-serverless-entry-cost-half-collection-types/).
### Local Development

See [LOCAL DEVELOPMENT](./docs/LOCAL_DEVELOPMENT.md).
Expand All @@ -316,10 +295,6 @@ Please also take a look at the following guidelines before contributing:
- [Local Development](./docs/LOCAL_DEVELOPMENT.md)
- [CONTRIBUTING](./CONTRIBUTING.md)

### RAG (Retrieval Augmented Generation)

See [here](./docs/RAG.md).

## Contacts

- [Takehiro Suzuki](https://github.com/statefb)
Expand Down
50 changes: 1 addition & 49 deletions backend/app/bedrock.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import re
from pathlib import Path

from app.config import BEDROCK_PRICING, DEFAULT_EMBEDDING_CONFIG
from app.config import BEDROCK_PRICING
from app.config import DEFAULT_GENERATION_CONFIG as DEFAULT_CLAUDE_GENERATION_CONFIG
from app.config import DEFAULT_MISTRAL_GENERATION_CONFIG
from app.repositories.models.conversation import ContentModel, MessageModel
Expand Down Expand Up @@ -309,51 +309,3 @@ def get_model_id(model: type_model_name) -> str:
return "mistral.mixtral-8x7b-instruct-v0:1"
elif model == "mistral-large":
return "mistral.mistral-large-2402-v1:0"


def calculate_query_embedding(question: str) -> list[float]:
model_id = DEFAULT_EMBEDDING_CONFIG["model_id"]

# Currently only supports "cohere.embed-multilingual-v3"
assert model_id == "cohere.embed-multilingual-v3"

payload = json.dumps({"texts": [question], "input_type": "search_query"})
accept = "application/json"
content_type = "application/json"

response = client.invoke_model(
accept=accept, contentType=content_type, body=payload, modelId=model_id
)
output = json.loads(response.get("body").read())
embedding = output.get("embeddings")[0]

return embedding


def calculate_document_embeddings(documents: list[str]) -> list[list[float]]:
def _calculate_document_embeddings(documents: list[str]) -> list[list[float]]:
payload = json.dumps({"texts": documents, "input_type": "search_document"})
accept = "application/json"
content_type = "application/json"

response = client.invoke_model(
accept=accept, contentType=content_type, body=payload, modelId=model_id
)
output = json.loads(response.get("body").read())
embeddings = output.get("embeddings")

return embeddings

BATCH_SIZE = 10
model_id = DEFAULT_EMBEDDING_CONFIG["model_id"]

# Currently only supports "cohere.embed-multilingual-v3"
assert model_id == "cohere.embed-multilingual-v3"

embeddings = []
for i in range(0, len(documents), BATCH_SIZE):
# Split documents into batches to avoid exceeding the payload size limit
batch = documents[i : i + BATCH_SIZE]
embeddings += _calculate_document_embeddings(batch)

return embeddings
51 changes: 6 additions & 45 deletions backend/app/bot_remove.py
Original file line number Diff line number Diff line change
@@ -1,55 +1,23 @@
import json
import os
from typing import Any

import boto3
import pg8000
from app.repositories.api_publication import delete_api_key, find_usage_plan_by_id
from app.repositories.api_publication import (
delete_api_key,
delete_stack_by_bot_id,
find_stack_by_bot_id,
find_usage_plan_by_id,
)
from app.repositories.common import RecordNotFoundError, decompose_bot_id
from aws_lambda_powertools.utilities import parameters
from app.repositories.custom_bot import find_public_bot_by_id

DB_SECRETS_ARN = os.environ.get("DB_SECRETS_ARN", "")
DOCUMENT_BUCKET = os.environ.get("DOCUMENT_BUCKET", "documents")
BEDROCK_REGION = os.environ.get("BEDROCK_REGION", "us-east-1")

s3_client = boto3.client("s3", BEDROCK_REGION)


def delete_from_postgres(bot_id: str):
"""Delete data related to `bot_id` from vector store (i.e. PostgreSQL)."""

secrets: Any = parameters.get_secret(DB_SECRETS_ARN) # type: ignore
db_info = json.loads(secrets)

conn = pg8000.connect(
database=db_info["dbname"],
host=db_info["host"],
port=db_info["port"],
user=db_info["username"],
password=db_info["password"],
)

try:
with conn.cursor() as cursor:
delete_query = "DELETE FROM items WHERE botid = %s"
cursor.execute(delete_query, (bot_id,))
conn.commit()
print(f"Successfully deleted records for bot_id: {bot_id}")
except Exception as e:
conn.rollback()
print(f"Error deleting records for bot_id: {bot_id}")
print(e)
finally:
conn.close()


def delete_kb_stack_by_bot_id(bot_id: str):
client = boto3.client("cloudformation")
def delete_custom_bot_stack_by_bot_id(bot_id: str):
client = boto3.client("cloudformation", BEDROCK_REGION)
stack_name = f"BrChatKbStack{bot_id}"
try:
response = client.delete_stack(StackName=stack_name)
Expand Down Expand Up @@ -106,16 +74,9 @@ def handler(event, context):
bot_id = decompose_bot_id(sk)

delete_from_s3(user_id, bot_id)
delete_custom_bot_stack_by_bot_id(bot_id)

try:
print(f"Remove Bedrock Knowledge Base Stack.")
# Remove Knowledge Base Stack
delete_kb_stack_by_bot_id(bot_id)
except RecordNotFoundError:
print(f"Remove records from PostgreSQL.")
delete_from_postgres(bot_id)

# Check if cloudformation stack exists
# Check if api published stack exists
try:
stack = find_stack_by_bot_id(bot_id)
except RecordNotFoundError:
Expand Down
14 changes: 0 additions & 14 deletions backend/app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,20 +36,6 @@ class EmbeddingConfig(TypedDict):
"stop_sequences": ["[INST]", "[/INST]"],
}

# Configure embedding parameter.
DEFAULT_EMBEDDING_CONFIG: EmbeddingConfig = {
# DO NOT change `model_id` (currently other models are not supported)
"model_id": "cohere.embed-multilingual-v3",
# NOTE: consider that cohere allows up to 2048 tokens per request
"chunk_size": 1000,
"chunk_overlap": 200,
"enable_partition_pdf": False,
}

# Configure search parameter to fetch relevant documents from vector store.
DEFAULT_SEARCH_CONFIG = {
"max_results": 20,
}

# Used for price estimation.
# NOTE: The following is based on 2024-03-07
Expand Down
Loading

0 comments on commit 7974b85

Please sign in to comment.