CSE-120-Task-Web-Scraping

Description

The repository is about implementing web scraping libraries that are native to Node.js to extract information about professional opportunities from various online websites, eventually stored in JSON files. Two libraries will be used to scrape data on three website examples.

Processes of Execution of the Project

Puppeteer Library

Open a node.js file in an IDE (ie. VS Code)
Make sure the package.json file is in the same directory as the node.js file
In the package.json file, put "dependencies": { "puppeteer": "^19.6.2" },
Make sure Puppeteer is installed with the command "npm install puppeteer"
Navigate to the current working directory through CLI/terminal
type "node {name}.js" in the terminal
Expected Behaviors:
1. The file will launch and open a Google Chrome browser with the corresponding website content and close immediately
2. The information is scraped from outside sources and displays the nested structures of relevant data in the terminal window
3. It will create a new JSON file containing all the data if there isn't one in the directory. Otherwise, it will rewrite the data
4. NOTE: one file only extracts data from one website and stores it in a separate JSON file.

Cheerio Library

Open a node.js file in an IDE (ie. VS Code)
Make sure the package.json file is in the same directory as the node.js file
In the package.json file, put "dependencies": { "cheerio": "^ 1.0.0-rc.12", "axios": "^ 1.5.1" },
Make sure Cheerio is installed with the command "npm install cheerio"
Navigate to the current working directory through CLI/terminal
type "node {name}.js" in the terminal
Expected Behaviors:
1. The information is scraped from outside sources and displays the nested structures of relevant data in the terminal window
2. It will create a new JSON file containing all the data if there isn't one in the directory. Otherwise, it will rewrite the data
3. NOTE: one file only extracts data from one website and stores it in a separate JSON file.

Website Example Details

Example 1: NASA Jet Propulsion Laboratory Internship: https://www.jpl.nasa.gov/edu/intern/apply

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

name --> title of internship
link --> application link to internship
academic level --> academic level (undergraduate/graduate)
session --> time of internship program

Example 2: Top 142 STEM Scholarship in October 2023: https://scholarships360.org/scholarships/stem-scholarships/

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

nameText --> title of scholarship
linkText --> application link to scholarship
platOfferText --> scholarship platform
awardText --> scholarship award amount
deadlineText --> deadline of the application

Example 3: The Muse Job Search Website: https://www.themuse.com/search/

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

titleText --> title of job
appLinkText --> application link to the job
desLinkText --> job description link about the job
NameLocateText --> name of the company AND location of the company

Ethical Considerations

DID NOT extract a tremendous amount of data which potentially affects the performance of website servers.
Extracted the information only for educational purposes.
The JSON data contained only public information, there was no personal/sensitive data.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
index1_cheerio.js		index1_cheerio.js
index1_puppeteer.js		index1_puppeteer.js
index2_cheerio.js		index2_cheerio.js
index2_puppeteer.js		index2_puppeteer.js
index3_cheerio.js		index3_cheerio.js
index3_puppeteer.js		index3_puppeteer.js
package.json		package.json
results1_cheerio.json		results1_cheerio.json
results1_puppeteer.json		results1_puppeteer.json
results2_cheerio.json		results2_cheerio.json
results2_puppeteer.json		results2_puppeteer.json
results3_cheerio.json		results3_cheerio.json
results3_puppeteer.json		results3_puppeteer.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSE-120-Task-Web-Scraping

Description

Processes of Execution of the Project

Puppeteer Library

Cheerio Library

Website Example Details

Example 1: NASA Jet Propulsion Laboratory Internship: https://www.jpl.nasa.gov/edu/intern/apply

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

Example 2: Top 142 STEM Scholarship in October 2023: https://scholarships360.org/scholarships/stem-scholarships/

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

Example 3: The Muse Job Search Website: https://www.themuse.com/search/

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

Ethical Considerations

About

Releases

Packages

Languages

StevenG777/UCM-CSE120-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

CSE-120-Task-Web-Scraping

Description

Processes of Execution of the Project

Puppeteer Library

Cheerio Library

Website Example Details

Example 1: NASA Jet Propulsion Laboratory Internship: https://www.jpl.nasa.gov/edu/intern/apply

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

Example 2: Top 142 STEM Scholarship in October 2023: https://scholarships360.org/scholarships/stem-scholarships/

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

Example 3: The Muse Job Search Website: https://www.themuse.com/search/

Relevant Data Extracted (Format: JSON variable name --> actual meaning:

Ethical Considerations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages