Skip to content

A Python scraper for Linsol Australia. Extracts deep product data via WordPress API and BeautifulSoup into a Shopify-ready CSV with SEO, specs, and images.

Notifications You must be signed in to change notification settings

byte-dev404/Linsol-Products-Scraper

Repository files navigation

Linsol Products Scraper

A high-performance Python scraper designed to extract product data from Linsol Australia. This tool navigates the WordPress REST API and individual product pages to generate a Shopify-ready CSV for seamless store imports.

Features

  • Dual-Layer Extraction: Combines fast API data retrieval with deep HTML parsing via BeautifulSoup.

  • Shopify Formatted: Automatically maps source data to Shopify's CSV schema (Handles, Tags, Variants, and SEO).

  • Resilient Networking: Built-in retry logic to handle temporary network fluctuations or server timeouts.

  • Development Logging: Saves raw JSON and HTML locally for easy debugging and data verification.

  • Automated Taxonomy Mapping: Resolves WordPress category/range IDs into human-readable tags.

How to Use the Scraper & Setup Guide (For non techies)

  1. Open the repository: https://github.com/byte-dev404/Linsol-Products-Scraper

  2. Click the star icon if you want to bookmark the project.

  3. Click the green Code button and either:

    • Download the ZIP file and extract it, or

    • Copy the repository URL and clone it:

      git clone https://github.com/byte-dev404/Linsol-Products-Scraper.git
  4. Install an IDE such as VS Code: https://code.visualstudio.com/Download

  5. Install the latest version of Python for your operating system: https://www.python.org/downloads/

  6. Open the cloned or extracted project folder in your IDE.

  7. Open a terminal:

    • Press Ctrl + Shift + `
    • Or use the menu: Terminal → New Terminal
  8. Set Up a Virtual Environment (Recommended):

    1. Create a virtual environment.
    python -m venv venv
    1. Activate virtual environment:

      • For Windows:
      source venv/Scripts/activate
      • For Mac/Linux:
      source venv/bin/activate
  9. Install required dependencies:

    pip install -r requirements.txt
  10. Run the scraper:

python Scraper.py

If you see ✅ Scraper started! in the terminal, the scraper has started successfully.

Getting Started (for seasoned devs)

Prerequisites

  • Python 3.8+
  • pip (Python package manager)

Installation

  1. Clone the Repository
git clone https://github.com/byte-dev404/Linsol-Products-Scraper.git
cd Linsol-Products-Scraper
  1. Set Up a Virtual Environment (Recommended)
python -m venv venv
# Activate on Windows:
source venv/Scripts/activate
# Activate on Mac/Linux:
source venv/bin/activate
  1. Install Dependencies
pip install -r requirements.txt

Usage

Run the main script to start the scraping process:

python Scraper.py

The terminal will display real-time progress, including the current page being processed and successful extraction counts.

Project Structure

File/Folder Description
Scraper.py The core logic for fetching, parsing, and saving data.
Products data.csv The final Shopify-compatible output file.
Listings pages/ Directory containing raw JSON responses from the API.
Products pages/ Directory containing raw HTML files for each product.
requirements.txt List of required Python libraries (requests, beautifulsoup4).

How It Works

The Workflow

  1. Initialization: Fetches total product counts and pagination metadata from the Linsol API headers.

  2. Taxonomy Mapping: Builds a lookup table for categories, product ranges, and spaces.

  3. Discovery: Loops through the listing API to find all product slugs and basic metadata.

  4. Deep Scraping: Visits each product's individual URL to extract:

    • High-resolution gallery images.
    • Technical specifications and feature lists.
    • Meta titles and descriptions for SEO.
  5. Transformation: Cleans and formats the data into the Shopify CSV structure, ensuring multiple images are handled as subsequent rows under the same handle.

License & Usage

  • Non-Commercial: Free to use for personal or educational projects.
  • Commercial: Please provide credit by linking to this repository or my GitHub profile.

Contributing

Contributions are welcome! To maintain project quality:

  1. Open an Issue first to discuss the bug or feature.
  2. Ensure your code follows PEP 8 guidelines.
  3. Submit a Pull Request (PR) with a clear description of changes.

Contact

If the scraper breaks, needs customization, or if you want help scraping another website, feel free to reach out.

Email: [email protected]

LinkedIn: https://www.linkedin.com/in/vishwas-batra/

About

A Python scraper for Linsol Australia. Extracts deep product data via WordPress API and BeautifulSoup into a Shopify-ready CSV with SEO, specs, and images.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages