Skip to content

mntalha/MaterialVision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MaterialVision: Interactive Materials Science Retrieval System

Python 3.9+ PyTorch Streamlit License: MIT

MaterialVision is an advanced multimodal retrieval system that bridges materials science textual descriptions with STEM (Scanning Transmission Electron Microscopy) imaging data. It features an interactive web application for text-to-image and image-to-text retrieval using state-of-the-art vision-language models.

πŸ“± Web Application Preview

MaterialVision App Interface

Interactive web interface featuring text-to-image search, image upload, model comparison, and real-time similarity analysis.


πŸš€ Quick Start

πŸ–₯️ Web Application

# Clone and setup
git clone https://github.com/your-username/MaterialVision.git
cd MaterialVision
pip install -r requirements.txt

# Launch web app
cd webapp && streamlit run app.py

🐍 Python API

from models import load_clipp_scibert

# Load model
model, tokenizer, dataset = load_clipp_scibert('models/CLIPP_allenai/checkpoints/best_clipp.pth', 'cuda')

# Generate embeddings
text = "The chemical formula is LiGeS. The mbj_bandgap value is 0.0."
tokens = tokenizer(text, return_tensors="pt", max_length=512)
embeddings = model.get_text_features(tokens['input_ids'], tokens['attention_mask'])

🎯 Key Features

Feature Description
πŸ” Text-to-Image Search Find STEM images using materials descriptions
πŸ–ΌοΈ Image-to-Text Retrieval Upload images to find matching descriptions
βš–οΈ Multi-Model Comparison Compare results across 4 different models
πŸ“Š Real-time Analytics Performance metrics and similarity heatmaps
πŸ”¬ Bandgap Filtering Filter materials by electronic properties
πŸ“ˆ t-SNE Visualization Explore embedding space alignment

πŸ—οΈ Model Architectures

Model Text Encoder Vision Encoder Best For
CLIPP-SciBERT SciBERT ViT-Base/16 Scientific text understanding
CLIPP-DistilBERT DistilBERT ViT-Base/16 Fast inference
Apple MobileCLIP MobileBERT MobileViT Mobile/edge deployment
BLIP (Salesforce) BERT ViT-Large/16 Best overall performance

πŸ“Š Performance Results

🎯 Validation Set Performance

Model Text→Image Image→Text
Top-1 Top-5 Top-10 Top-1 Top-5 Top-10
BLIP (Salesforce) πŸ₯‡ 46.8% 72.9% 80.9% 45.3% 73.6% 80.1%
Apple MobileCLIP 38.0% 67.0% 76.7% 35.9% 65.4% 77.6%
CLIPP-SciBERT 36.9% 65.1% 74.9% 36.6% 66.2% 74.9%
CLIPP-DistilBERT 12.5% 36.6% 49.8% 14.2% 37.4% 50.6%

πŸ‹οΈ Training Set Performance

Model Text→Image Image→Text
Top-1 Top-5 Top-10 Top-1 Top-5 Top-10
Apple MobileCLIP πŸ₯‡ 63.0% 93.8% 97.8% 60.5% 92.4% 97.4%
BLIP (Salesforce) 57.1% 90.5% 96.9% 56.6% 90.4% 96.4%
CLIPP-SciBERT 44.9% 80.5% 90.3% 47.2% 81.9% 90.9%
CLIPP-DistilBERT 14.6% 39.3% 52.6% 14.3% 40.3% 54.5%

πŸ” Key Performance Insights

  • πŸ“ˆ Top-1 Accuracy: BLIP achieves best exact-match performance (46.1% validation)
  • 🎯 Top-5 Accuracy: Most models achieve 65%+ recall within top-5 candidates
  • πŸ† Top-10 Accuracy: BLIP leads with 80.5% validation, Apple MobileCLIP dominates training (97.6%)
  • ⚑ Speed vs Accuracy: DistilBERT fastest but lower accuracy; BLIP best accuracy-performance balance

πŸ“š Understanding Evaluation Metrics

Metric Definition Interpretation
Top-1 Accuracy Correct result ranked #1 Exact match precision - how often the perfect match appears first
Top-5 Accuracy Correct result in top 5 Practical retrieval - good results within reasonable candidates
Top-10 Accuracy Correct result in top 10 System recall - ability to find relevant matches in broader search

Example: For query "silicon carbide semiconductor"

  • Top-1 = 46.8%: Perfect match appears first 47% of the time
  • Top-5 = 72.9%: Perfect match appears in top-5 results 73% of the time
  • Top-10 = 80.9%: Perfect match appears in top-10 results 81% of the time
πŸ“ˆ Complete Performance Breakdown by Model

πŸ”¬ BLIP (Salesforce) - Best Overall Performance

Validation Set:
β”œβ”€β”€ Textβ†’Image: Top-1: 46.8%  Top-5: 72.9%  Top-10: 80.9%
└── Imageβ†’Text: Top-1: 45.3%  Top-5: 73.6%  Top-10: 80.1%

Training Set:  
β”œβ”€β”€ Textβ†’Image: Top-1: 57.1%  Top-5: 90.5%  Top-10: 96.9%
└── Imageβ†’Text: Top-1: 56.6%  Top-5: 90.4%  Top-10: 96.4%

πŸ“± Apple MobileCLIP - Best Training Performance

Validation Set:
β”œβ”€β”€ Textβ†’Image: Top-1: 38.0%  Top-5: 67.0%  Top-10: 76.7%
└── Imageβ†’Text: Top-1: 35.9%  Top-5: 65.4%  Top-10: 77.6%

Training Set:
β”œβ”€β”€ Textβ†’Image: Top-1: 63.0%  Top-5: 93.8%  Top-10: 97.8%
└── Imageβ†’Text: Top-1: 60.5%  Top-5: 92.4%  Top-10: 97.4%

πŸ§ͺ CLIPP-SciBERT - Scientific Text Specialist

Validation Set:
β”œβ”€β”€ Textβ†’Image: Top-1: 36.9%  Top-5: 65.1%  Top-10: 74.9%
└── Imageβ†’Text: Top-1: 36.6%  Top-5: 66.2%  Top-10: 74.9%

Training Set:
β”œβ”€β”€ Textβ†’Image: Top-1: 44.9%  Top-5: 80.5%  Top-10: 90.3%
└── Imageβ†’Text: Top-1: 47.2%  Top-5: 81.9%  Top-10: 90.9%

⚑ CLIPP-DistilBERT - Fast Inference

Validation Set:
β”œβ”€β”€ Textβ†’Image: Top-1: 12.5%  Top-5: 36.6%  Top-10: 49.8%
└── Imageβ†’Text: Top-1: 14.2%  Top-5: 37.4%  Top-10: 50.6%

Training Set:
β”œβ”€β”€ Textβ†’Image: Top-1: 14.6%  Top-5: 39.3%  Top-10: 52.6%
└── Imageβ†’Text: Top-1: 14.3%  Top-5: 40.3%  Top-10: 54.5%

πŸ“Š Performance Summary

  • πŸ₯‡ Best Top-1: BLIP (46.8% validation)
  • πŸ₯ˆ Best Top-5: BLIP (73.3% average validation)
  • πŸ₯‰ Best Top-10: BLIP (80.5% validation), Apple MobileCLIP (97.6% training)
  • ⚑ Fastest: DistilBERT (lowest computational cost)
  • πŸ”¬ Most Balanced: SciBERT (good accuracy + scientific vocabulary)

πŸ› οΈ Installation

Option 1: Using pip (Recommended)

# Clone repository
git clone https://github.com/your-username/MaterialVision.git
cd MaterialVision

# Install dependencies
pip install -r requirements.txt

# Launch web app
cd webapp && streamlit run app.py

Option 2: Using conda

# Clone repository
git clone https://github.com/your-username/MaterialVision.git
cd MaterialVision

# Create conda environment
conda env create -f environment.yml
conda activate clipp

# Launch web app
cd webapp && streamlit run app.py

πŸ”§ Manual Installation

# Core dependencies
pip install torch torchvision transformers>=4.30.0
pip install streamlit pandas numpy pillow
pip install open-clip-torch timm scikit-learn
pip install matplotlib seaborn tqdm

πŸ“‹ System Requirements

  • Python: 3.9+
  • GPU: CUDA-compatible GPU recommended
  • RAM: 8GB+ (16GB recommended)
  • Storage: 5GB+ for models and data

πŸ’» Usage Examples

πŸ–₯️ Web Interface

  1. Text-to-Image Search

    • Enter materials description: "Silicon carbide semiconductor with 2.3 eV bandgap"
    • View top matching STEM images with similarity scores
    • Compare results across different models
  2. Image-to-Text Retrieval

    • Drag & drop STEM images
    • Get matching material descriptions
    • Explore chemical formulas and properties
  3. Bandgap Filtering

    • Use sliders to set bandgap range (0.0-10.0 eV)
    • Filter materials by electronic properties
    • Export filtered datasets as CSV

🐍 Python API

Basic Embedding Generation

from models import load_clipp_scibert
import torch

# Load model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model, tokenizer, dataset = load_clipp_scibert(
    'models/CLIPP_allenai/checkpoints/best_clipp.pth', device
)

# Generate text embeddings
texts = ["Silicon carbide semiconductor", "Iron oxide magnetic material"]
embeddings = []

model.eval()
with torch.no_grad():
    for text in texts:
        tokens = tokenizer(text, padding=True, truncation=True, 
                         return_tensors="pt", max_length=512).to(device)
        text_features = model.get_text_features(
            tokens['input_ids'], tokens['attention_mask']
        )
        embeddings.append(text_features.cpu().numpy())

Chemical Formula Processing

import re

def parse_chemical_formula(formula):
    """Convert Fe2O3 -> 2 Fe 3 O format"""
    pattern = r'([A-Z][a-z]?)(\d*)'
    matches = re.findall(pattern, formula)
    
    result_parts = []
    for element, count in matches:
        count = "1" if not count else count
        result_parts.extend([count, element])
    
    return ' '.join(result_parts)

# Example
parsed = parse_chemical_formula("Fe2O3")  # "2 Fe 3 O"

Similarity Search with Filtering

import torch.nn.functional as F

def search_by_bandgap(query_text, min_bg, max_bg, top_k=5):
    # Generate query embedding
    tokens = tokenizer(query_text, return_tensors="pt", max_length=512)
    query_embed = model.get_text_features(tokens['input_ids'], tokens['attention_mask'])
    
    # Filter by bandgap range
    filtered_indices = [
        i for i, bg in enumerate(bandgaps) 
        if bg and min_bg <= bg <= max_bg
    ]
    
    # Compute similarities
    filtered_embeds = corpus_embeddings[filtered_indices]
    similarities = F.cosine_similarity(query_embed, filtered_embeds)
    
    # Get top-k results
    topk = torch.topk(similarities, min(top_k, len(similarities)))
    return filtered_indices[topk.indices], topk.values

πŸ“ Project Structure

MaterialVision/
β”œβ”€β”€ πŸ“„ README.md
β”œβ”€β”€ πŸ“„ requirements.txt          # Python dependencies
β”œβ”€β”€ πŸ“„ environment.yml           # Conda environment
β”œβ”€β”€ πŸ“„ LICENSE
β”œβ”€β”€ πŸ—‚οΈ webapp/
β”‚   β”œβ”€β”€ 🐍 app.py               # Streamlit web application
β”‚   β”œβ”€β”€ 🐍 models.py            # Model loading utilities  
β”‚   β”œβ”€β”€ πŸ““ simple_text_embedding.ipynb
β”‚   β”œβ”€β”€ πŸ““ test_model_loading.ipynb
β”‚   └── πŸ“ embeddings/          # Cached embeddings
β”œβ”€β”€ πŸ—‚οΈ models/
β”‚   β”œβ”€β”€ πŸ“ CLIPP_allenai/       # SciBERT-based model
β”‚   β”œβ”€β”€ πŸ“ CLIPP_bert/          # DistilBERT-based model
β”‚   β”œβ”€β”€ πŸ“ Apple_MobileCLIP/    # Apple MobileCLIP
β”‚   └── πŸ“ Salesforce/          # BLIP model
β”œβ”€β”€ πŸ—‚οΈ data/
β”‚   β”œβ”€β”€ πŸ“Š alpaca_mbj_bandgap_train.csv
β”‚   β”œβ”€β”€ πŸ“Š alpaca_mbj_bandgap_test.csv
β”‚   β”œβ”€β”€ πŸ“ train/               # Training images
β”‚   └── πŸ“ test/                # Validation images
└── πŸ—‚οΈ tests/                   # Development notebooks

πŸ“ˆ Advanced Features

πŸ”¬ Embedding Visualization

Each model includes t-SNE visualization for analyzing multimodal alignment:

  • Dimensionality Reduction: Projects high-dimensional embeddings to 2D
  • Alignment Quality: Shows how well text-image pairs cluster together
  • Model Comparison: Visual assessment across architectures
  • Pair Highlighting: Connected lines show corresponding embeddings

Generated visualizations:

  • clipp_scibert_tsne.png
  • clipp_distilbert_tsne.png
  • mobileclip_apple_tsne.png
  • salesforce_blip_tsne.png

πŸ“Š Performance Monitoring

Real-time metrics in the web application:

  • Similarity Heatmaps: Understand model behavior
  • Top-k Accuracy: Monitor retrieval performance
  • Comparison Charts: Side-by-side model analysis

πŸ”§ Model Architecture Details

πŸ—οΈ CLIPP Architecture
class CLIPPModel:
    def __init__(self):
        self.vision_encoder = ViT_Base_16()      # 224x224 patches
        self.text_encoder = SciBERT()            # Scientific vocabulary
        self.projection_dim = 256                # Joint embedding space
        self.temperature = 0.07                  # Contrastive learning
    
    def forward(self, images, texts):
        img_features = self.vision_encoder(images)
        txt_features = self.text_encoder(texts)
        
        # Project to shared space
        img_embeds = self.img_projection(img_features)
        txt_embeds = self.txt_projection(txt_features)
        
        return F.normalize(img_embeds), F.normalize(txt_embeds)
πŸ—οΈ BLIP Architecture
class BLIPModel:
    def __init__(self):
        self.vision_encoder = ViT_Large_16()     # Larger vision model
        self.text_encoder = BERT()               # Standard BERT
        self.cross_attention = MultiheadAttention() # Cross-modal fusion
        
    def forward(self, images, texts):
        # Dual-encoder + cross-attention
        vision_embeds = self.vision_encoder(images)
        text_embeds = self.text_encoder(texts)
        
        # Cross-modal attention for better alignment
        fused_embeds = self.cross_attention(vision_embeds, text_embeds)
        return fused_embeds

πŸ”¬ Applications

Materials Discovery

  • Interactive Search: Find materials by description or image
  • Property Filtering: Search by bandgap, composition, structure
  • Model Comparison: Evaluate different retrieval approaches
  • Dataset Export: Download filtered results for analysis

Research Use Cases

  • Literature Review: Find visual examples of described materials
  • Image Classification: Identify unknown materials from STEM images
  • Property Prediction: Infer properties from visual similarity
  • Dataset Augmentation: Generate paired text-image data

Educational Applications

  • Materials Science Teaching: Visual learning with real examples
  • Student Projects: Hands-on experience with ML models
  • Research Training: Understanding multimodal AI systems

πŸ› οΈ Development

Adding New Models

# Extend models.py
def load_custom_model(checkpoint_path, device):
    model = YourCustomModel()
    checkpoint = torch.load(checkpoint_path, map_location=device)
    model.load_state_dict(checkpoint['model_state_dict'])
    return model, tokenizer, dataset

# Add to app.py
MODEL_PATHS['YourModel'] = 'path/to/checkpoint.pth'

Custom Datasets

# Process new datasets
def process_new_dataset(data_path, model_name):
    df = pd.read_csv(data_path)
    model, tokenizer, _ = load_model(model_name)
    
    embeddings = []
    for text in df['descriptions']:
        embedding = generate_embedding(model, tokenizer, text)
        embeddings.append(embedding)
    
    df['embeddings'] = embeddings
    return df

Testing

# Run tests
python -m pytest tests/
streamlit run webapp/app.py  # Manual UI testing

🀝 Contributing

We welcome contributions! Areas for improvement:

  • πŸ”¬ New model architectures for better materials understanding
  • πŸ“Š Dataset expansion with additional properties
  • ⚑ Performance optimizations for faster inference
  • 🎨 UI/UX improvements for better user experience
  • πŸ“š Documentation and tutorials

Development Workflow

# Fork and clone
git clone https://github.com/your-username/MaterialVision.git
cd MaterialVision

# Create feature branch
git checkout -b feature/your-improvement

# Make changes and test
pytest tests/
streamlit run webapp/app.py

# Submit pull request
git push origin feature/your-improvement

πŸ“š Resources & Documentation

πŸ““ Jupyter Notebooks

πŸ”— External Resources

πŸ“Š Datasets

  • Training: 1,000+ STEM images with materials descriptions
  • Validation: 500+ test samples for evaluation
  • Properties: Chemical formulas, bandgaps, structures

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“ž Support

πŸ™ Acknowledgments

  • Hugging Face for transformer implementations
  • Streamlit for the web framework
  • OpenAI for the CLIP architecture
  • Materials Science Community for domain expertise

πŸ”¬ Happy Materials Discovery! πŸ§ͺ

About

Includes a project where prompts and images of prompts are being matched.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published