If you're looking for a job, say a data science role, you're probably using Linkedin Jobs. But with hundreds of jobs posted every day, it can be hard to find the ones that best match your skills.
The main purpose of this project is to help you find the best matching jobs automatically.
In this project we:
- Built a Linkedin job scraper using
Selenium,RequestsandBeautifulSoup. - Built a text analysis of your resume and LinkedIn jobs using
Spacy. - Developed a
Flaskapp to display data visualisations, including a word cloud of in-demand skills. The app highlights job-specific skills and keywords, compares them to your skills and generates a list of the most relevant job matches.
This project requires Python 3 and the following Python libraries installed:
-
WEB scraping libraries:
Selenium,Requests,BeautifulSoup -
NLP libraries:
Spacy,NLTK -
Web app and visualization:
Flask,Plotly,Matplotlib,Wordcloud -
Other libraries:
pandas,numpy,json -
Install the trained English pipeline from Spacy.io as follows:
python -m spacy download en_core_web_lg
The full list of requirements can be found in the requirements.txt file.
- FLASK_app folder: contains our responsive Flask WEB application.
run.py: main file to run the web application.scraping_linkedin.py: Code for scraping Linkedin jobs withSeleniumandRequests, andBeautifulSoupfor parsing html content.Spacy_text_analayzer.py: Code to analyse text withSpacy, search for keywords and skills, compare them with your own and return the most relevant job matches.plotly_figures.py: Returns the configuration (data and layout) ofPlotlyfigures.templatesfolder: Contains 9 html pages.staticfolder: Contains our customizedCSSfile andBootstrap(compiled and minifiedCSSbundles andJSplugins).
- chromedriver folder: contains the chromedriver executable used by
Seleniumto control Chrome. - data folder: contains the following files:
user_credentials.txt: Contains your LinkedIn credentials (email address and password).Skills_in_Demand.txt: List of skills in demand (you can update this list).Skill_patterns.jsonl: Contains the skill patterns in json format and will be used to create an entity ruler in theSpacymodel.Job_Ids.csvandlinkedin_jobs_scraped.json: Scraped LinkedIn job IDs and job details (description, seniority level, number of candidates, etc.).
- notebooks folder: contains the project notebooks.
- resume folder: Enter your resume (pdf format) here to analyse it with Spacy and get a list of your skills.
-
Save your LinkedIn credentials (email address and password) in
user_credentials.txt. -
Run the following command in the FLASK_app's directory to scrape LinkedIn jobs.
python scraping_linkedin.py "data scientist" "Montreal, Quebec, Canada" 120You can replace "Data Scientist" and "Montreal, Quebec, Canada" with the job title and the location, respectively.
120 is a timer set in seconds which allows for supplementary loading time for the webpage. The timer can be adjusted depending on your Internet speed.
-
Run the following command in the FLASK_app's directory to run the WEB application.
python run.py -
Go to http://127.0.0.1:3001/
-
THe
Dashboardpage displays the distribution of seniority level and the number of days since the job posting. Additionally, it showcases a word cloud containing in-demand skills. This will help you define what you should be looking for to further broaden your skills. -
The
Resume_Analyzerpage uploads your resume (pdf format), displays your skills and assesses them against the most in-demand skills. -
The best matching jobs are showcased within a carousel that emphasises the matching scores.
-
The LinkedIn job role is presented on the
display_Jobpage with an emphasis on the match score and highlighting essential skills required for the position that are not listed in your resume.



