This repository serves as a PoC of web scraping
This app scrapes the website of Torfs with the webscraping tool Puppeteer for items. Users can choose which category they want to scrape by filling in the appropriate URL. For testing purposes, a development mode is available. When enabled, the user can choose how many pages they want to scrape instead of everything. This speeds up the process and is useful for testing.
The app displays all scraped items in a table containing the names, types, amount of colors, prices images and a link to the original item. The user can also download this data as a JSON-file.
- Clone this repository and navigate to the folder
git clone https://github.com/SandroBarillaPXL/expertlab-sprint2-scraping
cd expertlab-sprint2-scraping
- Install the dependencies
npm install
- Start the backend API-server, accessible at
http://localhost:3000
node scripts/api.js
- Start the frontend with a simple HTTP server of your choice, like the "live server" extension in Visual Studio Code for local use.
💡 Note: Puppeteer requires a Chromium browser to be installed on your system.
- Clone this repository and navigate to the directory
git clone https://github.com/SandroBarillaPXL/expertlab-sprint2-scraping
cd expertlab-sprint2-scraping
docker build -t <username>/<imagename-frontend>:<tag> -f docker/Dockerfile-fe .
docker build -t <username>/<imagename-backend>:<tag> -f docker/Dockerfile-be .
- Run the Docker containers
docker run -d -p 3000:3000 <username>/<imagename-backend>:<tag>
docker run -d -p <port>:80 <username>/<imagename-frontend>:<tag>
Alternatively, you can use the docker-compose.yml
file to run the containers. By default, the app is available at http://localhost:8500
.
docker compose -f ./docker/docker-compose.yml up -d
- https://serpapi.com/blog/web-scraping-in-javascript-complete-tutorial-for-beginner/
- https://www.freecodecamp.org/news/web-scraping-in-javascript-with-puppeteer/
- https://javascript.plainenglish.io/scraping-for-images-using-puppeteer-9a3700bd5a2d/
- https://www.scrapingbee.com/blog/web-scraping-javascript/
- https://pptr.dev/
- https://pptr.dev/guides/docker/
- ChatGPT conversation 1
- ChatGPT conversation 2
- ChatGPT conversation 3