Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/testdrivenio/selenium-grid-docker-swarm
web scraping in parallel with Selenium Grid and Docker
https://github.com/testdrivenio/selenium-grid-docker-swarm
docker docker-swarm selenium selenium-grid selenium-webdriver webscraping
Last synced: 2 months ago
JSON representation
web scraping in parallel with Selenium Grid and Docker
- Host: GitHub
- URL: https://github.com/testdrivenio/selenium-grid-docker-swarm
- Owner: testdrivenio
- License: mit
- Created: 2018-02-28T21:07:12.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2023-05-23T03:29:40.000Z (over 1 year ago)
- Last Synced: 2024-08-13T03:05:50.098Z (6 months ago)
- Topics: docker, docker-swarm, selenium, selenium-grid, selenium-webdriver, webscraping
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 35
- Watchers: 4
- Forks: 16
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Concurrent Web Scraping with Selenium Grid and Docker Swarm
## Want to learn how to build this project?
Check out the [blog post](https://testdriven.io/concurrent-web-scraping-with-selenium-grid-and-docker-swarm).
## Want to use this project?
1. Fork/Clone
1. Create and activate a virtual environment
1. Install the requirements
1. [Sign up](https://m.do.co/c/d8f211a4b4c2) for Digital Ocean and [generate](https://www.digitalocean.com/community/tutorials/how-to-use-the-digitalocean-api-v2) an access token
1. Add the token to your environment:
```sh
(env)$ export DIGITAL_OCEAN_ACCESS_TOKEN=[your_token]
```1. Spin up four droplets and deploy Docker Swarm:
```sh
(env)$ sh project/create.sh
```1. Run the scraper:
```sh
(env)$ docker-machine env node-1
(env)$ eval $(docker-machine env node-1)
(env)$ NODE=$(docker service ps --format "{{.Node}}" selenium_hub)
(env)$ for i in {1..8}; do {
python project/script.py ${i} $(docker-machine ip $NODE) &
};
done
```1. Bring down the resources:
```sh
(env)$ sh project/destroy.sh
```