Skip to content

A modern CLI pipeline for parsing PDFs, generating AI embeddings with Google Gemini, and ingesting data into Pinecone vector databases. Features an interactive interface, easy PDF selection, and production-ready code for real-world AI data workflows.

Notifications You must be signed in to change notification settings

Lilsax/pinecone-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vector Database Ingestion Pipeline CLI

Vector DB CLI

A professional, interactive CLI tool for parsing PDFs, generating embeddings with Google Gemini, and pushing data to a Pinecone vector database. Built for modern AI data workflows and designed to impress!

Features

  • Interactive CLI with ASCII art banner
  • Create Pinecone Indexes on the fly
  • Parse and upsert PDFs from a folder (no manual path entry)
  • Query your vector database with natural language
  • Google Gemini Embeddings for high-quality vectorization
  • Modern, modular code (Node.js, ES Modules)

Quick Start

  1. Clone the repo
    git clone https://github.com/Lilsax/pinecone-data-pipeline.git
    cd vector-data-base
  2. Install dependencies
    npm install
  3. Set up your environment variables
    • Create a .env file in the root directory:
      PINECONE_API_KEY=your-pinecone-key
      GOOGLE_API_KEY=your-google-api-key
  4. Add your PDFs to the files/ directory.
  5. Run the CLI
    node index.js

Usage

  • Create Index: Create a new Pinecone index interactively.
  • Parse/Upsert PDF: Select a PDF from the files/ folder and push its embeddings to Pinecone.
  • Query: Enter a natural language query to search your vector database.

Why This Project?

  • Showcases real-world AI data engineering
  • Demonstrates modern Node.js best practices
  • Ready for production or portfolio
  • Easy to extend for your own use cases

About

A modern CLI pipeline for parsing PDFs, generating AI embeddings with Google Gemini, and ingesting data into Pinecone vector databases. Features an interactive interface, easy PDF selection, and production-ready code for real-world AI data workflows.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published