Skip to content

a mini application in Next.js where a user can upload a PDF file and then ask questions based on its content using the OpenAI API/ Gemini

Notifications You must be signed in to change notification settings

mr-crypter/Mini-PDF-Q-A-App-using-RAG

Repository files navigation

Mini PDF Q&A (Next.js + Pinecone + Gemini)

An end‑to‑end example that lets you:

  • Upload a PDF, extract its text, chunk it, embed with Gemini, and store in Pinecone
  • Ask a question, retrieve the most relevant chunks from Pinecone, and get a grounded answer with Gemini

Tech stack

  • Next.js (App Router)
  • Pinecone JS SDK
  • Google Generative AI SDK (Gemini) for embeddings and answers
  • pdf-parse for PDF text extraction

Prerequisites

  • Node.js 18+
  • Pinecone account with an index created
  • Google AI Studio API key (Gemini)

Environment variables Create a .env.local file in the project root:

GEMINI_API_KEY=your_gemini_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=your_pinecone_index_name

Important: Your Pinecone index dimension must match your embedding model. This project uses Gemini model text-embedding-004 which is 768 dimensions. Ensure your index is created with dimension 768.

Install and run

npm install
npm run dev

Open http://localhost:3000

How it works

  • Upload flow (POST /api/upload)
    • Receives a PDF via multipart form-data
    • Extracts text with pdf-parse
    • Splits into ~500‑character chunks
    • Generates embeddings via Gemini text-embedding-004
    • Upserts vectors to Pinecone with text stored in metadata
  • Ask flow (POST /api/ask)
    • Embeds the question
    • Queries Pinecone (topK=3, includeMetadata=true)
    • Concatenates matched chunks as context
    • Calls Gemini (gemini-2.5-flash) to answer using the context

Endpoints

  • POST /api/upload
    • Body: multipart form-data with field "file" (application/pdf)
    • Response: { message: string }
  • POST /api/ask
    • Body: { "question": string }
    • Response: { "answer": string, "context": PineconeMatch[] }

Quick API examples

# Upload a PDF
curl -F "file=@./sample.pdf" http://localhost:3000/api/upload

# Ask a question
curl -X POST http://localhost:3000/api/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"What does the document say about X?"}'

Configuration knobs

  • Chunk size: src/app/api/upload/route.ts (regex for chunking)
  • Retrieval topK: src/app/api/ask/route.ts
  • Models: src/lib/embeddings.ts (text-embedding-004) and src/app/api/ask/route.ts (gemini-2.5-flash)

Project structure

src/
  app/
    api/
      upload/route.ts   # Upload + index PDF chunks to Pinecone
      ask/route.ts      # Retrieve + answer with Gemini
    page.tsx            # Simple UI for upload + ask
    globals.css         # Theme + Tailwind config
  lib/
    embeddings.ts       # Gemini embeddings (text-embedding-004)
    pdf.ts              # PDF text extraction
  services/
    pinecone.ts         # Pinecone client

Troubleshooting

  • PineconeBadRequestError: Vector dimension 0 or mismatch
    • Ensure chunks are non-empty
    • Ensure embeddings.ts returns a non-empty vector array
    • Confirm your Pinecone index is created with dimension 768
  • 401/permission errors
    • Verify GEMINI_API_KEY and PINECONE_API_KEY are set and valid

Deployment

  • Vercel works great: set env vars in project settings
  • Or run locally in production mode:
npm run build
npm start

About

a mini application in Next.js where a user can upload a PDF file and then ask questions based on its content using the OpenAI API/ Gemini

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published