A machine learning-powered Chrome extension that detects phishing emails in real-time. Built as part of coursework at San José State University (CMPE 255 — Data Mining).
Phishing attacks remain one of the most common cybersecurity threats. This extension analyzes email content directly in the browser using a pre-trained ML model to classify emails as safe or phishing — giving users instant visual feedback.
- Real-Time Detection — Scans email content on the active page
- ML-Powered — Uses a scikit-learn model trained on 18,000+ labeled phishing/legitimate emails
- Chrome Extension — Manifest V3, lightweight popup UI
- Pre-trained Model — Serialized
phishing_model.pklandvectorizer.pklready to use - Visual Alerts — Clear popup interface showing phishing probability
PhishingDetectorExtension/
├── manifest.json # Chrome Extension config (Manifest V3)
├── popup.html # Extension popup UI
├── popup.js # Popup logic & API calls
├── background.js # Service worker for background tasks
├── main.js # Content script injected into web pages
├── model/
│ ├── phishing_model.pkl # Trained ML classification model
│ └── vectorizer.pkl # TF-IDF text vectorizer
├── src/
│ └── Main.java # Model training pipeline
└── Phishing_Email.csv # Dataset (18,000+ samples)
| Layer | Technology |
|---|---|
| Extension | Chrome Manifest V3, JavaScript, HTML/CSS |
| ML Model | Python, scikit-learn, TF-IDF Vectorization |
| Training | Java (data preprocessing), Jupyter Notebook |
| Dataset | 18,000+ labeled phishing/legitimate emails (CSV) |
- Clone this repo:
git clone https://github.com/amanimran786/PhishingDetectorExtension.git
- Open Chrome →
chrome://extensions/ - Enable Developer Mode (toggle in top-right)
- Click "Load unpacked"
- Select the
PhishingDetectorExtensionfolder - The extension icon will appear in your toolbar
- Navigate to a page with email content
- Click the extension icon in the Chrome toolbar
- The popup displays the phishing risk assessment
- Algorithm: scikit-learn classifier with TF-IDF feature extraction
- Dataset:
Phishing_Email.csv— 18,000+ labeled samples - Features: Text content vectorized using TF-IDF
- Output: Binary classification (Phishing / Legitimate) with confidence score
Aman Imran — B.S. Software Engineering, San José State University
This project is open source and available for educational purposes.