Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites using its HTML structure. In this project I have started out by introducing the common steps invloved in webscraping using Beautiful Soup.
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
The second part of the project deals with extracting reviews from the Goodreads site. Specifically novels falling under the historical section was targeted.
Once the extraction is complete, we'll be analysing the extarcted reviews. A comparison is done to illustrate the most frequently used words in the reviews. And then a collage is constructed to showcase this data.
- Python
- BeautifulSoup
- requests
- WordCloud