An end-to-end pipeline for semantic product search using multimodal CLIP embeddings and vector search.
This system enables matching an input image or text query against a product catalog by leveraging:
- CLIP (Contrastive Language-Image Pre-training): Generates 512-dimensional joint embeddings for images and text.
- Pinecone: A high-performance vector database used for efficient nearest-neighbors search using embeddings.
- MongoDB: Stores and retrieves structured product metadata.
- ONNX-Runtime: Optimizes CLIP for faster inference
The matching process follows these steps:
- Input Processing:
- An input image or text query is provided to the CLIP model on Gradio.
- CLIP generates a 512-dimensional embedding vector representing the input.
- Vector Search:
- The generated embedding is used to query the Pinecone vector database.
- Pinecone returns the nearest embeddings from the indexed product catalog using cosine similarity.
- Metadata Lookup:
- For each of the top matches, corresponding product details (name, price, category, etc.) are retrieved from MongoDB.
.
├── src/
├── app.py # Gradio demo interface
├── ingest_data.py # Script to process images & metadata
├── matcher.py # Semantic matching module
├── mongo_db.py # MongoDB client for metadata
├── mongodb_logger.py # MongoDB logger
├── vector_db.py # Pinecone client for vector database
├── images/ # Product images (for ingestion)
├── example_query_images/ # Example query images
├── media/ # Demo GIFs and other assets
├── metadata/ # JSON metadata files
├── models/ # For storing quantized onnx models (FP32 & FP16)
├── quantization/ # Scripts for model quantization
├── environment.yml # Conda environment file
├── README.md # Readme
├── .env # Environment file with API keys for Pinecone & MongoDB
└── requirements.txt # Python dependencies
Follow these steps to set up and run the project:
-
Clone the Repository and Install Dependencies:
git clone https://github.com/ashwin-ned/product-matching.git cd product-matching conda env create -f environment.yml or install with pip: pip install -r requirements.txt -
Prepare Images and Metadata:
- Place all product images in the
images/directory. - Create a single JSON file named
products.jsonin themetadata/directory. This file should contain an array of product entries.
Example
products.jsonentry:{ "id": "002", "name": "Vanilla & Coconut Shower Gel", "category": "Personal Care", "price": 5.49, "description": "Refreshing shower gel with vanilla fragrance and coconut extracts.", "images": ["002_1.jpg", "002_2.jpg"] } - Place all product images in the
-
Configure Environment Variables: Create a
.envfile in the project'ssrc/folder (or the root and adjust paths in scripts if necessary) with the following keys:# Pinecone Configuration PINECONE_API_KEY="your_pinecone_api_key" PINECONE_ENV="your_pinecone_env" # e.g., us-west1-gcp, aws-us-east-1, etc. PINECONE_INDEX="your_pinecone_index" # Or your desired Pinecone index name # Configuration via environment variables MONGO_URI="your_mongo_db_connection_string" MONGO_DB_NAME="your_mongo_db_name" MONGO_COLLECTION="your_mongo_db_collection_name" LOGGER_MONGO_URI="your_logger_connection_string" LOGGER_DB_NAME="your_logger_db_name" LOGGER_MONGO_COLLECTION="your_logger_collection_name"
Warning: Make sure that the values in the
.envvariable keys are not separated by a newline '\n'. (Can result in invalid header value error.) -
Quantize the Model: To optimize the model for inference, run the quantization script in
./quantization/quantize_clip.py. (optionally test that the models are quantized correctly with./quantization/test_inference.py):python quantization/quantize_clip.py
-
Ingest Data: Run the data ingestion script. This script reads images from
./images/and metadata from./metadata/products.json, generates CLIP embeddings, upserts them to Pinecone, and stores metadata in MongoDB.cd src && python ingest_data.py --images_dir ../images/ --metadata_file ../metadata/products.json
-
Run the Gradio Demo: Launch the web interface:
python app.py
This opens a local web interface where you can query by image or text.
Once the Gradio demo is running:
- Image Query: Upload a photo of a product. The application will display the top match (check button) or top-K matching products from the catalog or the top match.
- Text Query: Type a description of a product. The application will perform the same embedding and search process to find and display the top matches.
