GitHub - mr-crypter/Mini-PDF-Q-A-App-using-RAG: a mini application in Next.js where a user can upload a PDF file and then ask questions based on its content using the OpenAI API/ Gemini

Mini PDF Q&A (Next.js + Pinecone + Gemini)

An end‑to‑end example that lets you:

Upload a PDF, extract its text, chunk it, embed with Gemini, and store in Pinecone
Ask a question, retrieve the most relevant chunks from Pinecone, and get a grounded answer with Gemini

Tech stack

Next.js (App Router)
Pinecone JS SDK
Google Generative AI SDK (Gemini) for embeddings and answers
pdf-parse for PDF text extraction

Prerequisites

Node.js 18+
Pinecone account with an index created
Google AI Studio API key (Gemini)

Environment variables Create a .env.local file in the project root:

GEMINI_API_KEY=your_gemini_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=your_pinecone_index_name

Important: Your Pinecone index dimension must match your embedding model. This project uses Gemini model text-embedding-004 which is 768 dimensions. Ensure your index is created with dimension 768.

Install and run

npm install
npm run dev

Open http://localhost:3000

How it works

Upload flow (POST /api/upload)
- Receives a PDF via multipart form-data
- Extracts text with pdf-parse
- Splits into ~500‑character chunks
- Generates embeddings via Gemini text-embedding-004
- Upserts vectors to Pinecone with text stored in metadata
Ask flow (POST /api/ask)
- Embeds the question
- Queries Pinecone (topK=3, includeMetadata=true)
- Concatenates matched chunks as context
- Calls Gemini (gemini-2.5-flash) to answer using the context

Endpoints

POST /api/upload
- Body: multipart form-data with field "file" (application/pdf)
- Response: { message: string }
POST /api/ask
- Body: { "question": string }
- Response: { "answer": string, "context": PineconeMatch[] }

Quick API examples

# Upload a PDF
curl -F "file=@./sample.pdf" http://localhost:3000/api/upload

# Ask a question
curl -X POST http://localhost:3000/api/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"What does the document say about X?"}'

Configuration knobs

Chunk size: src/app/api/upload/route.ts (regex for chunking)
Retrieval topK: src/app/api/ask/route.ts
Models: src/lib/embeddings.ts (text-embedding-004) and src/app/api/ask/route.ts (gemini-2.5-flash)

Project structure

src/
  app/
    api/
      upload/route.ts   # Upload + index PDF chunks to Pinecone
      ask/route.ts      # Retrieve + answer with Gemini
    page.tsx            # Simple UI for upload + ask
    globals.css         # Theme + Tailwind config
  lib/
    embeddings.ts       # Gemini embeddings (text-embedding-004)
    pdf.ts              # PDF text extraction
  services/
    pinecone.ts         # Pinecone client

Troubleshooting

PineconeBadRequestError: Vector dimension 0 or mismatch
- Ensure chunks are non-empty
- Ensure embeddings.ts returns a non-empty vector array
- Confirm your Pinecone index is created with dimension 768
401/permission errors
- Verify GEMINI_API_KEY and PINECONE_API_KEY are set and valid

Deployment

Vercel works great: set env vars in project settings
Or run locally in production mode:

npm run build
npm start

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

mr-crypter/Mini-PDF-Q-A-App-using-RAG

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages