- Technologies: Python
- Description: Developed a Python-based web crawler with a focus on minimizing runtime complexity. Utilized
os
andjson
modules for efficient data handling and reduced I/O operations. This project was created by Ethan Li and Bowen Zhang.
-
Open the Terminal:
- Ensure your command line is in the directory containing the project's Python files.
-
Prepare the Configuration:
- Create a file named
crawler_config.txt
in the same directory. This file should contain the seed URL for the crawler without quotation marks. - Example seed URL:
http://people.scs.carleton.ca/~davidmckenney/fruits2/N-0.html
- Create a file named
-
Execute the Crawler:
- In the terminal, type
python crawler.py
and press Enter. - The crawler will start processing the seed URL, and the output will be saved in
crawler_output.txt
.
- In the terminal, type
-
Prepare the Configuration:
- Ensure you have a file named
search_config.txt
in the directory. This file should contain:- The search phrase on the first line.
- The boost value (
True
orFalse
) on the second line.
- Example configuration:
apple tomato tomato tomato True
- Ensure you have a file named
-
Execute the Search:
- In the terminal, ensure you are in the directory with
search.py
andsearchdata.py
. - Type
python search.py
and press Enter. - The search results will be stored in
search_results.json
in the same directory.
- In the terminal, ensure you are in the directory with