Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tempesta-tech/webbot
Website crawler for performance and security tasks
https://github.com/tempesta-tech/webbot
bots python-selenium scraping web-performance web-security
Last synced: 5 days ago
JSON representation
Website crawler for performance and security tasks
- Host: GitHub
- URL: https://github.com/tempesta-tech/webbot
- Owner: tempesta-tech
- License: gpl-2.0
- Created: 2023-10-01T10:55:31.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-30T20:54:22.000Z (12 months ago)
- Last Synced: 2024-03-26T06:10:54.129Z (8 months ago)
- Topics: bots, python-selenium, scraping, web-performance, web-security
- Language: Python
- Homepage: https://tempesta-tech.com/network-security-performance-analysis/#networksecurity
- Size: 13.7 KB
- Stars: 1
- Watchers: 6
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# webbot
Website crawler for performance and security tasks:
* reveal dead links on a target website
* warm web accelerator's cache
* emulate bots behaviour (e.g. scrappers) to test a bots mitigation software## Installation
Prerequisites for Ubuntu 22:
```sh
sudo apt update
sudo apt install -y unzip xvfb libxi6 libgconf-2-4
sudo apt install default-jdk
sudo curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add
sudo bash -c "echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' >> /etc/apt/sources.list.d/google-chrome.list"
sudo apt -y update
sudo apt -y install google-chrome-stable
```[Download](https://chromedriver.storage.googleapis.com/index.html) the latest Chrome
driver and install it:
```sh
unzip chromedriver_linux64.zip
mv chromedriver /usr/bin/chromedriver
chown root:root /usr/bin/chromedriver
chmod +x /usr/bin/chromedriver
```Install the [Selenium Python driver](https://pypi.org/project/selenium/) with:
```sh
pip install selenium
```You might experience exception
```
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 114
Current browser version is 117.0.5938.132 with binary path /usr/bin/google-chrome
```
in this case you should install a Chrome or Chromium version of matching major
version number. [Here](https://chromium.cypress.io/) the list of old binary versions
of the browser.You can run `wbot` with the custom browser like
```sh
./wbot.py --chrome_bin /opt/google/chromium-114/chrome
```## References
* [Selenium documentation](https://www.selenium.dev/documentation/), including Python API
### Relevant crawlers revealing dead links
* https://github.com/EndlessTrax/pyanchor
* https://github.com/stevenvachon/broken-link-checker
* https://github.com/untitaker/hyperlink
* https://github.com/w3c/node-linkchecker
* https://github.com/bem-site/broken-links-checker
* https://github.com/emmanuelroecker/php-linkchecker
* https://github.com/deptagency/octopus
* https://pypi.org/project/LinkChecker/