This project is a Python-based Amazon web scraper that extracts product details like title, price, rating, reviews, and links based on user input. The data is then analyzed to recommend the best product using numerical analysis and visualization techniques.
- ✅ Scrapes product data from Amazon using BeautifulSoup and requests
- ✅ Allows users to input search keywords and number of pages to scrape
- ✅ Performs data analysis on extracted product data (Price, Rating, Reviews)
- ✅ Uses Seaborn & Matplotlib for visualization
- ✅ Recommends the best product based on a calculated score
amazon-scraper/
│── amazon_scraper.py # Web scraper script
│── form_server.py # Flask server for user input
│── templates/
│ ├── index.html # Web form UI
│── static/
│ ├── style.css # Styling for web pages
│── amazon_products.csv # Scraped data (Generated)
│── README.md # Project Documentation
│── requirements.txt # Dependencies - Python (BeautifulSoup, requests, pandas)
- Flask (for the web interface)
- Matplotlib & Seaborn (for data visualization)
- GitHub (for version control)
- Run the Flask server → Opens a webpage to enter a search keyword & number of pages
- Scrapes Amazon → Extracts product details and saves them in a CSV file
- Performs Data Analysis → Determines the best product using a score metric
- Displays Results → Shows analysis graphs and the best product link
Ensure you have Python 3+ and install dependencies:
pip install -r requirements.txtRun the following command in your terminal:
python form_server.py- Open your browser and go to:
http://localhost:5000- Enter the search keyword (e.g., "laptop") and the number of pages to scrape.
- Click the "Scrape" button to start the process.
- The scraper will extract product details and save them in a CSV file.
5. Once completed, the page will display:
- ✅ A link to the scraped data
- 📊 Visualizations of the analysis
- 🏆 A link to the best-recommended product
- X-axis: Number of Reviews
- Y-axis: Rating
- Bubble Size: Product Price
- Purpose: Shows the relationship between customer ratings and the number of reviews.
- X-axis: Price
- Y-axis: Score (Calculated as (Rating * Reviews) / Price)
- Color Gradient: Score Intensity
- Purpose: Identifies the most cost-effective product based on rating and popularity.
- The best product is determined using the formula:
$$Score = (Rating * Reviews) / Price$$ - The product with the highest score is recommended.
- The analysis graphs are generated using Matplotlib & Seaborn.
- The graphs are displayed on the results page along with the best product link.
- ✅ Implement Scrapy for faster scraping
- ✅ Enhance error handling for blocked requests
- ✅ Deploy on cloud (Heroku/AWS) for remote access
This project is for educational purposes only. Amazon does not allow automated scraping, so use it responsibly.

