Linsol Products Scraper

A high-performance Python scraper designed to extract product data from Linsol Australia. This tool navigates the WordPress REST API and individual product pages to generate a Shopify-ready CSV for seamless store imports.

Features

Dual-Layer Extraction: Combines fast API data retrieval with deep HTML parsing via BeautifulSoup.
Shopify Formatted: Automatically maps source data to Shopify's CSV schema (Handles, Tags, Variants, and SEO).
Resilient Networking: Built-in retry logic to handle temporary network fluctuations or server timeouts.
Development Logging: Saves raw JSON and HTML locally for easy debugging and data verification.
Automated Taxonomy Mapping: Resolves WordPress category/range IDs into human-readable tags.

How to Use the Scraper & Setup Guide (For non techies)

Open the repository: https://github.com/byte-dev404/Linsol-Products-Scraper
Click the star icon if you want to bookmark the project.
Click the green Code button and either:
- Download the ZIP file and extract it, or
- Copy the repository URL and clone it:
```
git clone https://github.com/byte-dev404/Linsol-Products-Scraper.git
```
Install an IDE such as VS Code: https://code.visualstudio.com/Download
Install the latest version of Python for your operating system: https://www.python.org/downloads/
Open the cloned or extracted project folder in your IDE.
Open a terminal:
- Press Ctrl + Shift + `
- Or use the menu: Terminal → New Terminal
Set Up a Virtual Environment (Recommended):
1. Create a virtual environment.
```
python -m venv venv
```
1. Activate virtual environment:
  - For Windows:
```
source venv/Scripts/activate
```
  - For Mac/Linux:
```
source venv/bin/activate
```
Install required dependencies:
```
pip install -r requirements.txt
```
Run the scraper:

python Scraper.py

If you see ✅ Scraper started! in the terminal, the scraper has started successfully.

Getting Started (for seasoned devs)

Prerequisites

Python 3.8+
pip (Python package manager)

Installation

Clone the Repository

git clone https://github.com/byte-dev404/Linsol-Products-Scraper.git
cd Linsol-Products-Scraper

Set Up a Virtual Environment (Recommended)

python -m venv venv
# Activate on Windows:
source venv/Scripts/activate
# Activate on Mac/Linux:
source venv/bin/activate

Install Dependencies

pip install -r requirements.txt

Usage

Run the main script to start the scraping process:

python Scraper.py

The terminal will display real-time progress, including the current page being processed and successful extraction counts.

Project Structure

File/Folder	Description
`Scraper.py`	The core logic for fetching, parsing, and saving data.
`Products data.csv`	The final Shopify-compatible output file.
`Listings pages/`	Directory containing raw JSON responses from the API.
`Products pages/`	Directory containing raw HTML files for each product.
`requirements.txt`	List of required Python libraries (`requests`, `beautifulsoup4`).

How It Works

The Workflow

Initialization: Fetches total product counts and pagination metadata from the Linsol API headers.
Taxonomy Mapping: Builds a lookup table for categories, product ranges, and spaces.
Discovery: Loops through the listing API to find all product slugs and basic metadata.
Deep Scraping: Visits each product's individual URL to extract:
- High-resolution gallery images.
- Technical specifications and feature lists.
- Meta titles and descriptions for SEO.
Transformation: Cleans and formats the data into the Shopify CSV structure, ensuring multiple images are handled as subsequent rows under the same handle.

License & Usage

Non-Commercial: Free to use for personal or educational projects.
Commercial: Please provide credit by linking to this repository or my GitHub profile.

Contributing

Contributions are welcome! To maintain project quality:

Open an Issue first to discuss the bug or feature.
Ensure your code follows PEP 8 guidelines.
Submit a Pull Request (PR) with a clear description of changes.

Contact

If the scraper breaks, needs customization, or if you want help scraping another website, feel free to reach out.

Email: [email protected]

LinkedIn: https://www.linkedin.com/in/vishwas-batra/

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Listings pages		Listings pages
Products pages		Products pages
.gitignore		.gitignore
Products data (Correct SKU and Search Scraping).csv		Products data (Correct SKU and Search Scraping).csv
Products data.csv		Products data.csv
Readme.md		Readme.md
Scraper.py		Scraper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linsol Products Scraper

Features

How to Use the Scraper & Setup Guide (For non techies)

Getting Started (for seasoned devs)

Prerequisites

Installation

Usage

Project Structure

How It Works

The Workflow

License & Usage

Contributing

Contact

About

Uh oh!

Releases

Packages

Languages

byte-dev404/Linsol-Products-Scraper

Folders and files

Latest commit

History

Repository files navigation

Linsol Products Scraper

Features

How to Use the Scraper & Setup Guide (For non techies)

Getting Started (for seasoned devs)

Prerequisites

Installation

Usage

Project Structure

How It Works

The Workflow

License & Usage

Contributing

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages