A professional, interactive CLI tool for parsing PDFs, generating embeddings with Google Gemini, and pushing data to a Pinecone vector database. Built for modern AI data workflows and designed to impress!
- Interactive CLI with ASCII art banner
- Create Pinecone Indexes on the fly
- Parse and upsert PDFs from a folder (no manual path entry)
- Query your vector database with natural language
- Google Gemini Embeddings for high-quality vectorization
- Modern, modular code (Node.js, ES Modules)
- Clone the repo
git clone https://github.com/Lilsax/pinecone-data-pipeline.git cd vector-data-base - Install dependencies
npm install
- Set up your environment variables
- Create a
.envfile in the root directory:PINECONE_API_KEY=your-pinecone-key GOOGLE_API_KEY=your-google-api-key
- Create a
- Add your PDFs to the
files/directory. - Run the CLI
node index.js
- Create Index: Create a new Pinecone index interactively.
- Parse/Upsert PDF: Select a PDF from the
files/folder and push its embeddings to Pinecone. - Query: Enter a natural language query to search your vector database.
- Showcases real-world AI data engineering
- Demonstrates modern Node.js best practices
- Ready for production or portfolio
- Easy to extend for your own use cases