ForbesScrapy

Crawling Top 800 Best Employers Data from Forbes.com using Scrapy.

Introduction

Details about the all 800 entries present in World's Best Employers Data hosted by Forbes was crawled using Scrapy, sorted out rank wise and then stored in a json file named forbes1.json

The same spider is then used to crawl relevant data of top 20 companies by rank through their profile links fetched from the original list. Result is again stored in another json file named company.json (Just with a different parser function this time)

Spider is named spider1.py in the spiders folder.

To simply crawl data at once, run the following command in terminal by going in your parent folder directory first:

scrapy crawl spider1

Data will be stored in the parent folder.

Requirements

Please run the project folder in a virtual enviroment with the requirements.txt installed first to avoid any issues.

Libraries specifically used are:

Scrapy - v2.6.2
```
pip install scrapy
```
Fake-Useragent - v1.1.1
```
pip install fake-useragent
```
Or if you have multiple Python / pip versions installed, use pip3:
```
pip3 install fake-useragent
```

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Forbes		Forbes
Output		Output
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ForbesScrapy

Introduction

Requirements

About

Releases

Packages

Languages

HistoriFy/ForbesScrapy

Folders and files

Latest commit

History

Repository files navigation

ForbesScrapy

Introduction

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages