Skip to content

mithunparab/pytorch_siglip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch SigLIP

This repository contains implementations and experiments for training and fine-tuning the SigLIP model, a multimodal model for image-text tasks. The project is divided into two main directories:

  • pytorch_siglip_finetuning/: Contains scripts and utilities for fine-tuning a pre-trained SigLIP model using Distributed Data Parallel (DDP). Includes support for multilingual captions and zero-shot classification.

  • pytorch_siglip_from_scratch/: Provides an implementation for training the SigLIP model from scratch using Distributed Data Parallel (DDP), including custom dataset handling, model components, and training utilities.

Dataset

Flickr 8k Dataset dataset is used for training and evaluation. The dataset consists of images and their corresponding captions in multiple languages. The dataset should be organized as follows:

flickr8k/
├── Images/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
└── captions.txt

For captions translation, we used the IndicTrans2 model to translate the captions into Hindi, Marathi, and Hinglish.


Refer to the individual README.md files in each directory for detailed documentation.

About

Implementation of SigLIP

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published