Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



12 Commits

Repository files navigation


RAG chatbot is a backend application to provide a seamless experience for users to interact with their documents. The app create embeddings from the content of file (.pdf, .txt, .docx) and stores them in vector database for easy querying among embeddings enabling quick and accurate information retrieval later on. The backend uses MongoDB for data storage, and pinecone as vector database.

Table of Contents

Project Features

  • Processing file: Split the contents of text-based files and stores embeddings in vector DB.
  • Chat with file: Users can query with file content. The app provide accurate response.
  • Chat history: Users can view chat history.

Tech Stack

  • Backend: NodeJs, Express, Typescript, Langchain
  • Database: PineconeDB, MongoDB
  • Model - gemini-pro, text-embedding-004 by google-genai

Libraries Used

  • @langchain/core, @langchain/google-genai: AI-powered language generation and interaction with Google Generative AI.
  • @pinecone-database/pinecone: Connect to Pinecone for storing and retrieving vector embeddings.
  • cors: Cross-Origin Resource Sharing configuration middleware.
  • dotenv: Loads environment variables from .env file.
  • express: Fast, minimal web server framework.
  • nodemon: Automatically restarts server on file changes..
  • multer: Middleware for handling file uploads
  • pdf-parse: Extract contents of .pdf files
  • mammoth: Extract contents of .docx files
  • typescript: Used to write TypeScript code.
  • ts-node: Execute typescript code.

Setup and Installation


  • NodeJS
  • MongoDB
  • Pinecone
  • Git (for version control)

Environment Variables

  • Create a .env file in the backend directory and copy the content from .env.example into it.
  • Get Pinecone DB api from
  • Run your mongodb locally or get a uri string from mongodb atlas.
  • rest all variables can be same


  1. Clone the Repository:
    git clone
    cd rag-chatbot
  2. Install dependencies:
    npm install
  3. Run the Backend Application:
    npm run build
    npm run start

API Endpoints and Sample Requests

POST /api/documents/process

Process document and store embeddings in database


POST /api/documents/process


file: [Upload your text-based file here]


    "assetId": "850bae1c-ef2f-48e7-af53-dcc67c086247",
    "message": "Document processed successfully"

POST /api/chat/start

Accepts as assetId and create chat session


POST /api/chat/start


  "assetId": "850bae1c-ef2f-48e7-af53-dcc67c086247"


    "chatThreadId": "84578f88-30ce-4454-a0ef-ff3973788cae"

POST /api/chat/message

Sends a user message to an active chat session.


POST /api/chat/message


    "chatThreadId": "84578f88-30ce-4454-a0ef-ff3973788cae",
    "message": "What is SQL?"


data: SQL stands for Structured Query Language, and it is used to communicate with the Database
data: . This is a standard language used to perform tasks such as retrieval, updates, insertion and deletion of data from a database.

GET /chat/history

Access chat history via chatThreadId


POST /api/chat/history


  "chatThreadId": "84578f88-30ce-4454-a0ef-ff3973788cae"


  "chatHistory": {
      "_id": "67307da4bc6d8095c8643e7b",
      "chatThreadId": "84578f88-30ce-4454-a0ef-ff3973788cae",
      "assetId": "850bae1c-ef2f-48e7-af53-dcc67c086247",
      "startedAt": "2024-11-10T09:32:20.564Z",
      "messages": [
              "timeString": "2024-11-10T09:33:47.966Z",
              "role": "USER",
              "message": "What is SQL?",
              "_id": "67307dfbbc6d8095c8643e7f"
              "timeString": "2024-11-10T09:33:47.966Z",
              "role": "AGENT",
              "message": "SQL stands for Structured Query Language, and it is used to communicate with the Database. This is a standard language used to perform tasks such as retrieval, updates, insertion and deletion of data from a database.",
              "_id": "67307dfbbc6d8095c8643e80"
      "__v": 0

Development Choices

Why Node.js?

  • Excellent package ecosystem
  • Strong async/await support
  • Easy deployment options

Why Typescript?

  • Prevent from errors during development phase
  • Type security
  • Faster code development

Why MongoDB?

  • Flexible schema for review data
  • Easy to scale
  • Free tier available on MongoDB Atlas

Why Pinecone?

  • Easy documentation
  • Free tier avaliable on Pinecone Console


This app is deployed on render:

Potential Improvements

  • Divide backend into microservices
  • Uploading files to Cloudinary/S3


This project was completed with the assistance of various online resources. I utilized the following tools and sources to support the development of this application:

  • Google + Stack Overflow - for bugs and documentation of libraries
  • Mongoose and Pinecone docs
  • Some youtube tutorials understanding langchain and vectordb