Skip to content

Crawling Top 800 Companies Data from Forbes.com using Scrapy

Notifications You must be signed in to change notification settings

HistoriFy/ForbesScrapy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

ForbesScrapy

Crawling Top 800 Best Employers Data from Forbes.com using Scrapy.

Introduction

Details about the all 800 entries present in World's Best Employers Data hosted by Forbes was crawled using Scrapy, sorted out rank wise and then stored in a json file named forbes1.json

The same spider is then used to crawl relevant data of top 20 companies by rank through their profile links fetched from the original list. Result is again stored in another json file named company.json (Just with a different parser function this time)

Spider is named spider1.py in the spiders folder.

To simply crawl data at once, run the following command in terminal by going in your parent folder directory first:

scrapy crawl spider1

Data will be stored in the parent folder.

Requirements

Please run the project folder in a virtual enviroment with the requirements.txt installed first to avoid any issues.

Libraries specifically used are:

  • Scrapy - v2.6.2

    pip install scrapy
  • Fake-Useragent - v1.1.1

    pip install fake-useragent

    Or if you have multiple Python / pip versions installed, use pip3:

    pip3 install fake-useragent

About

Crawling Top 800 Companies Data from Forbes.com using Scrapy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages