A high-performance Python scraper designed to extract product data from Linsol Australia. This tool navigates the WordPress REST API and individual product pages to generate a Shopify-ready CSV for seamless store imports.
-
Dual-Layer Extraction: Combines fast API data retrieval with deep HTML parsing via BeautifulSoup.
-
Shopify Formatted: Automatically maps source data to Shopify's CSV schema (Handles, Tags, Variants, and SEO).
-
Resilient Networking: Built-in retry logic to handle temporary network fluctuations or server timeouts.
-
Development Logging: Saves raw JSON and HTML locally for easy debugging and data verification.
-
Automated Taxonomy Mapping: Resolves WordPress category/range IDs into human-readable tags.
-
Open the repository: https://github.com/byte-dev404/Linsol-Products-Scraper
-
Click the star icon if you want to bookmark the project.
-
Click the green Code button and either:
-
Download the ZIP file and extract it, or
-
Copy the repository URL and clone it:
git clone https://github.com/byte-dev404/Linsol-Products-Scraper.git
-
-
Install an IDE such as VS Code: https://code.visualstudio.com/Download
-
Install the latest version of Python for your operating system: https://www.python.org/downloads/
-
Open the cloned or extracted project folder in your IDE.
-
Open a terminal:
- Press
Ctrl + Shift + ` - Or use the menu: Terminal → New Terminal
- Press
-
Set Up a Virtual Environment (Recommended):
- Create a virtual environment.
python -m venv venv
-
Activate virtual environment:
- For Windows:
source venv/Scripts/activate- For Mac/Linux:
source venv/bin/activate
-
Install required dependencies:
pip install -r requirements.txt
-
Run the scraper:
python Scraper.pyIf you see ✅ Scraper started! in the terminal, the scraper has started successfully.
- Python 3.8+
- pip (Python package manager)
- Clone the Repository
git clone https://github.com/byte-dev404/Linsol-Products-Scraper.git
cd Linsol-Products-Scraper- Set Up a Virtual Environment (Recommended)
python -m venv venv
# Activate on Windows:
source venv/Scripts/activate
# Activate on Mac/Linux:
source venv/bin/activate- Install Dependencies
pip install -r requirements.txtRun the main script to start the scraping process:
python Scraper.pyThe terminal will display real-time progress, including the current page being processed and successful extraction counts.
| File/Folder | Description |
|---|---|
Scraper.py |
The core logic for fetching, parsing, and saving data. |
Products data.csv |
The final Shopify-compatible output file. |
Listings pages/ |
Directory containing raw JSON responses from the API. |
Products pages/ |
Directory containing raw HTML files for each product. |
requirements.txt |
List of required Python libraries (requests, beautifulsoup4). |
-
Initialization: Fetches total product counts and pagination metadata from the Linsol API headers.
-
Taxonomy Mapping: Builds a lookup table for categories, product ranges, and spaces.
-
Discovery: Loops through the listing API to find all product slugs and basic metadata.
-
Deep Scraping: Visits each product's individual URL to extract:
- High-resolution gallery images.
- Technical specifications and feature lists.
- Meta titles and descriptions for SEO.
-
Transformation: Cleans and formats the data into the Shopify CSV structure, ensuring multiple images are handled as subsequent rows under the same handle.
- Non-Commercial: Free to use for personal or educational projects.
- Commercial: Please provide credit by linking to this repository or my GitHub profile.
Contributions are welcome! To maintain project quality:
- Open an Issue first to discuss the bug or feature.
- Ensure your code follows PEP 8 guidelines.
- Submit a Pull Request (PR) with a clear description of changes.
If the scraper breaks, needs customization, or if you want help scraping another website, feel free to reach out.
Email: [email protected]