Concurrent Web Scraping with Selenium Grid and Docker Swarm

Concurrent Web Scraping with Selenium Grid and Docker Swarm
https://github.com/coding-to-music/selenium-grid-docker-swarm

Want to learn how to build this project?
Check out the blog post.
https://testdriven.io/blog/concurrent-web-scraping-with-selenium-grid-and-docker-swarm/

Want to use this project?
Fork/Clone
https://github.com/coding-to-music/selenium-grid-docker-swarm

Create and activate a virtual environment

apt-get install python-virtualenv

virtualenv -p python3 myApp

optionally use --no-site-packages
virtualenv  --no-site-packages -p python3 myApp

source myApp/bin/activate

$ cd myapp/
$ source bin/activate
(myapp)debian@hostname:~/myapp$

Install the requirements

pip -r requirements.txt

Sign up for Digital Ocean and generate an access token

Add the token to your environment:

(env)$ export DIGITAL_OCEAN_ACCESS_TOKEN=[your_token]
Spin up four droplets and deploy Docker Swarm:

(env)$ sh project/create.sh
Run the scraper:

(env)$ docker-machine env node-1
(env)$ eval $(docker-machine env node-1)
(env)$ NODE=$(docker service ps --format "{{.Node}}" selenium_hub)
(env)$ for i in {1..8}; do {
         python project/script.py ${i} $(docker-machine ip $NODE) &
       };
       done
Bring down the resources:

(env)$ sh project/destroy.sh

source myApp/bin/activate


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Concurrent Web Scraping with Selenium Grid and Docker Swarm #76

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Concurrent Web Scraping with Selenium Grid and Docker Swarm #76

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions