LinkedIn Job Scraper and Matcher

WEB scraping with Selinium and BeautifulSoup, NLP with Spacy and WEB app with Flask.

Project Motivation

If you're looking for a job, say a data science role, you're probably using Linkedin Jobs. But with hundreds of jobs posted every day, it can be hard to find the ones that best match your skills.

The main purpose of this project is to help you find the best matching jobs automatically.

In this project we:

Built a Linkedin job scraper using Selenium, Requests and BeautifulSoup.
Built a text analysis of your resume and LinkedIn jobs using Spacy.
Developed a Flask app to display data visualisations, including a word cloud of in-demand skills. The app highlights job-specific skills and keywords, compares them to your skills and generates a list of the most relevant job matches.

Installation

This project requires Python 3 and the following Python libraries installed:

WEB scraping libraries: Selenium, Requests, BeautifulSoup
NLP libraries: Spacy, NLTK
Web app and visualization: Flask, Plotly, Matplotlib, Wordcloud
Other libraries: pandas, numpy, json
Install the trained English pipeline from Spacy.io as follows: python -m spacy download en_core_web_lg

The full list of requirements can be found in the requirements.txt file.

File Descriptions

FLASK_app folder: contains our responsive Flask WEB application.
- run.py: main file to run the web application.
- scraping_linkedin.py: Code for scraping Linkedin jobs with Selenium and Requests, and BeautifulSoup for parsing html content.
- Spacy_text_analayzer.py: Code to analyse text with Spacy, search for keywords and skills, compare them with your own and return the most relevant job matches.
- plotly_figures.py: Returns the configuration (data and layout) of Plotly figures.
- templates folder: Contains 9 html pages.
- static folder: Contains our customized CSS file and Bootstrap (compiled and minified CSS bundles and JS plugins).
chromedriver folder: contains the chromedriver executable used by Selenium to control Chrome.
data folder: contains the following files:
- user_credentials.txt: Contains your LinkedIn credentials (email address and password).
- Skills_in_Demand.txt: List of skills in demand (you can update this list).
- Skill_patterns.jsonl: Contains the skill patterns in json format and will be used to create an entity ruler in the Spacy model.
- Job_Ids.csv and linkedin_jobs_scraped.json: Scraped LinkedIn job IDs and job details (description, seniority level, number of candidates, etc.).
notebooks folder: contains the project notebooks.
resume folder: Enter your resume (pdf format) here to analyse it with Spacy and get a list of your skills.

Instructions

Save your LinkedIn credentials (email address and password) in user_credentials.txt.
Run the following command in the FLASK_app's directory to scrape LinkedIn jobs.

python scraping_linkedin.py "data scientist" "Montreal, Quebec, Canada" 120

You can replace "Data Scientist" and "Montreal, Quebec, Canada" with the job title and the location, respectively.

120 is a timer set in seconds which allows for supplementary loading time for the webpage. The timer can be adjusted depending on your Internet speed.
Run the following command in the FLASK_app's directory to run the WEB application.

python run.py
Go to http://127.0.0.1:3001/

Flask application

THe Dashboard page displays the distribution of seniority level and the number of days since the job posting. Additionally, it showcases a word cloud containing in-demand skills. This will help you define what you should be looking for to further broaden your skills.
The Resume_Analyzer page uploads your resume (pdf format), displays your skills and assesses them against the most in-demand skills.
The best matching jobs are showcased within a carousel that emphasises the matching scores.
The LinkedIn job role is presented on the display_Job page with an emphasis on the match score and highlighting essential skills required for the position that are not listed in your resume.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
docs		docs
matching_resume_service		matching_resume_service
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LinkedIn Job Scraper and Matcher

WEB scraping with Selinium and BeautifulSoup, NLP with Spacy and WEB app with Flask.

Table of Contents

Project Motivation

Installation

File Descriptions

Instructions

Flask application

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ivangolt/matching_resume_service

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Job Scraper and Matcher

WEB scraping with Selinium and BeautifulSoup, NLP with Spacy and WEB app with Flask.

Table of Contents

Project Motivation

Installation

File Descriptions

Instructions

Flask application

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages