This project is all about scraping news from various well known news providing websites, this project anticipates the virality of the news headlines. The (news-scrape.csv) is the dataset for the same and the total number of news are around 9000. The websites which is scraped are InShorts, BBC News, ABC News, Washington Post, Daily Mail, Google News, FOX News
Install all the required python libraries.
Beautiful Soup
pip install bs4
lxml
pip install lxml
Requests
pip install requests
Pandas (For handling of data)
pip install pandas
The code which has extension (.py) can be executed directly but for the code which has extension (.ipynb) requires a virtual environment to run it (Jupyter Notebook) .
Important: It is not necessary that final length of the data will be same as above mentioned, it is based upon number of available content on the website