A ProductHunt.com miner in Python3.
Execute the following commands:
$ git clone https://github.com/collab-uniba/PH_miner.git
$ git submodule init
$ git submodule update-
Register two apps using the dashboard,
PH_minerandPH_updater. -
For the first app, in the root folder, create the file
credentials_miner.ymlwith the following structure:
api:
key: CLIENT_KEY
secret: CLIENT_SECRET
redirect_uri: APP_REDIRECT_URI
dev_token: DEVELOPER_TOKEN-
For the second app, follow the same steps as above to create the file
credentials_updater.yml. -
Create the folder
db/cfg/, then create therein the filedbsetup.ymlto setup the connection to the MySQL database:
mysql:
host: 127.0.0.1
user: root
passwd: *******
db: producthunt
recycle: 3600NOTE: If you're using a MySQL database, the default parameter pool_recycle for resetting the database connection
is fine, since the wait_timeout is set to 28800 by default. But, if you're using Maria DB, then wait_timeout is set
by default to 600 seconds. Edit the my.cnf file and change it to anything larger than the value chosen for pool_recycle.
- Install packages via pip:
$ pip install -r requirements.txt- Enable execution via crontab:
$ crontab -eAdd the following lines. Make sure to enter the correct path.
SHELL=bash
# New products are uploaded at 12.01 PST (just past midnight, 9am next morning in CET timezone):
# minute hour day-of-month month day-of-week command
35 8 * * * /path/.../to/PH_miner/cronjob.sh /var/log/ph_miner.log 2>&1
05 20 * * * /path/.../to/PH_miner/cronjob.sh --update -c credentials_updater.yml >> /var/log/ph_miner_updates.log 2>&1
*/30 * * * * /path/.../to/PH_miner/cronjob.sh --newest -c credentials_updater.yml >> /var/log/ph_miner.log 2>&1- Enable the rotation of the log files:
$ sudo ln -s /fullpath/to/../ph_miner.logrotate /etc/logrotate.d/ph_miner - Install Chromium browser and the chromedriver
This step depends on the OS. On Ubuntu boxes, run:
$ sudo apt-get install chromium-browser chromium-chromedriver
$ sudo ln -s /usr/lib/chromium-browser/chromedriver /usr/bin/chromedriver- Product Hunt API
- ph_py - ProductHunt.com API wrapper in Python
- Scrapy - A scraping and web-crawling framework
- Selenium - A suite of tools for automating web browsers
- ChromeDriver - Tool to connect to Chromium web browser
- Beautiful Soup 4 - HTML parser
The project is licensed under the MIT license.