🔍 Ollama Web Scraper

An intelligent fullstack web scraping and parsing application that combines Selenium-based scraping, FastAPI + Chainlit backend, and a React + Tailwind frontend.

The project integrates Ollama for AI-powered content extraction and supports CI/CD deployment with Fly.io and GitHub Pages.

🎥 Video Demo

bandicam.2025-09-10.16-17-33-775.mp4

✨ Features

Backend (FastAPI + Chainlit + Ollama)

- 🤖 AI-Powered Extraction: Extract structured information using natural language queries.

- 🔒 Anti-Detection Scraping: Stealth Selenium options, rotating user agents, and human-like patterns.

- 🌐 Multi-Method Scraping: Requests + Selenium for maximum success rate.

- 📊 Chunk Processing: Splits large DOMs into manageable pieces for AI parsing.

- 🎯 Special Handling for GitHub and popular sites.

- 🔄 CORS Enabled: Frontend-backend integration supported.

Frontend (React + Tailwind)

- ⚡ Interactive UI: Enter a URL to scrape, then parse results with natural language.

- 🖼️ Responsive Design: Built with Tailwind (CDN mode).

- 🔔 Success/Error Feedback: Real-time status updates after scrape/parse.

- 🌍 Deployed on GitHub Pages with CI/CD.

DevOps

- 🚀 Backend Deployment: Fly.io

- 🌐 Frontend Deployment: GitHub Pages

- 🔄 GitHub Actions CI/CD: Automated build & deploy on push to main.

🏗️ Project Structure

Ollama-Web-Scraper/

├── backend/                     # FastAPI + Chainlit backend

│   ├── main.py                  # Entry point with routes + CORS

│   ├── utils/

│   │   ├── web_scrape.py        # Scraping logic (Selenium + requests)

│   │   └── parse.py             # AI parsing with Ollama

│   ├── requirements.txt         # Backend dependencies

│   └── fly.toml                 # Fly.io config


├── frontend/                    # React + Vite frontend

│   ├── index.html

│   ├── src/

│   │   ├── App.jsx              # UI (Scrape + Parse forms + results)

│   │   └── main.jsx

│   ├── package.json

│   └── vite.config.js

│

├── .github/workflows/CICD.yml   # GitHub Actions CI/CD

├── README.md                    # This file

⚙️ Installation

Backend

git clone https://github.com/bassemalyyy/Ollama-Web-Scraper.git

cd Ollama-Web-Scraper/backend

Create virtual environment

python -m venv myenv

myenv\Scripts\activate   # Windows

source myenv/bin/activate # Linux/Mac

Install dependencies

pip install -r requirements.txt

Install Ollama + model

ollama pull llama3.2

Run backend

chainlit run main.py

Frontend

cd ../frontend

Install dependenciess

npm install

Run dev server

npm run dev

🚀 Deployment

- Backend → Fly.io (flyctl deploy)

- Frontend → GitHub Pages (npm run deploy)

Both are automated via GitHub Actions (.github/workflows/CICD.yml).

🎯 Usage

1. Open the frontend (React app).

2. Enter a website URL → press Scrape.

3. Enter a natural language query → press Parse.

4. Results are displayed on the page.

Example Workflow

- User: Scrape https://github.com/username

- Backend: ✅ Successfully scraped content.

- User: Parse repository names and descriptions

- Bot:

     🎯 Results:

    - awesome-project → A cool Python project

    - web-scraper → AI-powered web scraping tool

📋 Requirements

Backend

chainlit

fastapi

selenium

webdriver-manager

beautifulsoup4

langchain

langchain-ollama

requests

Frontend

react

vite

gh-pages

🛡️ Anti-Detection Scraping

- Rotating user agents

- Stealth Selenium flags

- Random delays for human-like browsing

- JS execution control

- Content quality detection (detects auth pages / bot blocks)

🐛 Troubleshooting

- White page on GitHub Pages → Check vite.config.js → set correct base path.

- No results from Parse → Enable debug mode in backend to see raw scraped content.

- Ollama errors → Ensure ollama serve is running.

- ChromeDriver issues → webdriver-manager auto-handles version, but ensure Chrome is installed.

🙏 Acknowledgments

- Chainlit → For interactive backend UI

- Ollama → Local LLMs

- Selenium + BeautifulSoup → Reliable scraping stack

- React + Vite + Tailwind → Frontend

- Fly.io & GitHub Actions → Smooth deployment

🔥 Built with ❤️ by Bassem M. Aly

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.chainlit		.chainlit
.github/workflows		.github/workflows
__pycache__		__pycache__
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
chromedriver.exe		chromedriver.exe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Ollama Web Scraper

🎥 Video Demo

✨ Features

Backend (FastAPI + Chainlit + Ollama)

Frontend (React + Tailwind)

DevOps

🏗️ Project Structure

⚙️ Installation

Backend

Create virtual environment

Install dependencies

Install Ollama + model

Run backend

Frontend

Install dependenciess

Run dev server

🚀 Deployment

🎯 Usage

Example Workflow

📋 Requirements

🛡️ Anti-Detection Scraping

🐛 Troubleshooting

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

bassemalyyy/Ollama-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

🔍 Ollama Web Scraper

🎥 Video Demo

✨ Features

Backend (FastAPI + Chainlit + Ollama)

Frontend (React + Tailwind)

DevOps

🏗️ Project Structure

⚙️ Installation

Backend

Create virtual environment

Install dependencies

Install Ollama + model

Run backend

Frontend

Install dependenciess

Run dev server

🚀 Deployment

🎯 Usage

Example Workflow

📋 Requirements

🛡️ Anti-Detection Scraping

🐛 Troubleshooting

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages