RAG CHATBOT

RAG chatbot is a backend application to provide a seamless experience for users to interact with their documents. The app create embeddings from the content of file (.pdf, .txt, .docx) and stores them in vector database for easy querying among embeddings enabling quick and accurate information retrieval later on. The backend uses MongoDB for data storage, and pinecone as vector database.

Project Features

Processing file: Split the contents of text-based files and stores embeddings in vector DB.
Chat with file: Users can query with file content. The app provide accurate response.
Chat history: Users can view chat history.

Tech Stack

Backend: NodeJs, Express, Typescript, Langchain
Database: PineconeDB, MongoDB
Model - gemini-pro, text-embedding-004 by google-genai

Libraries Used

@langchain/core, @langchain/google-genai: AI-powered language generation and interaction with Google Generative AI.
@pinecone-database/pinecone: Connect to Pinecone for storing and retrieving vector embeddings.
cors: Cross-Origin Resource Sharing configuration middleware.
dotenv: Loads environment variables from .env file.
express: Fast, minimal web server framework.
nodemon: Automatically restarts server on file changes..
multer: Middleware for handling file uploads
pdf-parse: Extract contents of .pdf files
mammoth: Extract contents of .docx files
typescript: Used to write TypeScript code.
ts-node: Execute typescript code.

Setup and Installation

Prerequisites

NodeJS
MongoDB
Pinecone
Git (for version control)

Environment Variables

Create a .env file in the backend directory and copy the content from .env.example into it.
Get Pinecone DB api from https://app.pinecone.io
Run your mongodb locally or get a uri string from mongodb atlas.
rest all variables can be same

Steps

Clone the Repository:

git clone https://github.com/ayushjaiz/rag-chatbot
cd rag-chatbot

Install dependencies:
```
npm install
```
Run the Backend Application:
```
npm run build
npm run start
```

API Endpoints and Sample Requests

POST `/api/documents/process`

Process document and store embeddings in database

Request:

POST /api/documents/process

Body

file: [Upload your text-based file here]

Response:

{
    "assetId": "850bae1c-ef2f-48e7-af53-dcc67c086247",
    "message": "Document processed successfully"
}

POST `/api/chat/start`

Accepts as assetId and create chat session

Request:

POST /api/chat/start

Body

{
  "assetId": "850bae1c-ef2f-48e7-af53-dcc67c086247"
}

Response:

{
    "chatThreadId": "84578f88-30ce-4454-a0ef-ff3973788cae"
}

POST `/api/chat/message`

Sends a user message to an active chat session.

Request:

POST /api/chat/message

Body

{
    "chatThreadId": "84578f88-30ce-4454-a0ef-ff3973788cae",
    "message": "What is SQL?"
}

Response(Stream):

data: SQL stands for Structured Query Language, and it is used to communicate with the Database

data: . This is a standard language used to perform tasks such as retrieval, updates, insertion and deletion of data from a database.

GET `/chat/history`

Access chat history via chatThreadId

Request:

POST /api/chat/history

body

{
  "chatThreadId": "84578f88-30ce-4454-a0ef-ff3973788cae"
}

Response:

{
  "chatHistory": {
      "_id": "67307da4bc6d8095c8643e7b",
      "chatThreadId": "84578f88-30ce-4454-a0ef-ff3973788cae",
      "assetId": "850bae1c-ef2f-48e7-af53-dcc67c086247",
      "startedAt": "2024-11-10T09:32:20.564Z",
      "messages": [
          {
              "timeString": "2024-11-10T09:33:47.966Z",
              "role": "USER",
              "message": "What is SQL?",
              "_id": "67307dfbbc6d8095c8643e7f"
          },
          {
              "timeString": "2024-11-10T09:33:47.966Z",
              "role": "AGENT",
              "message": "SQL stands for Structured Query Language, and it is used to communicate with the Database. This is a standard language used to perform tasks such as retrieval, updates, insertion and deletion of data from a database.",
              "_id": "67307dfbbc6d8095c8643e80"
          }
      ],
      "__v": 0
  }
}

Development Choices

Why Node.js?

Excellent package ecosystem
Strong async/await support
Easy deployment options

Why Typescript?

Prevent from errors during development phase
Type security
Faster code development

Why MongoDB?

Flexible schema for review data
Easy to scale
Free tier available on MongoDB Atlas

Why Pinecone?

Easy documentation
Free tier avaliable on Pinecone Console

Deployment

This app is deployed on render: https://rag-chatbot-0fjv.onrender.com

Potential Improvements

Divide backend into microservices
Uploading files to Cloudinary/S3

Acknowledgements

This project was completed with the assistance of various online resources. I utilized the following tools and sources to support the development of this application:

Google + Stack Overflow - for bugs and documentation of libraries
Mongoose and Pinecone docs
Some youtube tutorials understanding langchain and vectordb

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.env.local		.env.local
.gitignore		.gitignore
Readme.md		Readme.md
package.json		package.json
postman_collection		postman_collection
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG CHATBOT

Table of Contents

Project Features

Tech Stack

Libraries Used

Setup and Installation

Prerequisites

Environment Variables

Steps

API Endpoints and Sample Requests

POST `/api/documents/process`

Request:

Body

Response:

POST `/api/chat/start`

Request:

Body

Response:

POST `/api/chat/message`

Request:

Body

Response(Stream):

GET `/chat/history`

Request:

body

Response:

Development Choices

Why Node.js?

Why Typescript?

Why MongoDB?

Why Pinecone?

Deployment

Potential Improvements

Acknowledgements

About

Languages

ayushjaiz/rag-chatbot

Folders and files

Latest commit

History

Repository files navigation

RAG CHATBOT

Table of Contents

Project Features

Tech Stack

Libraries Used

Setup and Installation

Prerequisites

Environment Variables

Steps

API Endpoints and Sample Requests

POST /api/documents/process

Request:

Body

Response:

POST /api/chat/start

Request:

Body

Response:

POST /api/chat/message

Request:

Body

Response(Stream):

GET /chat/history

Request:

body

Response:

Development Choices

Why Node.js?

Why Typescript?

Why MongoDB?

Why Pinecone?

Deployment

Potential Improvements

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages

POST `/api/documents/process`

POST `/api/chat/start`

POST `/api/chat/message`

GET `/chat/history`