Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/healeycodes/Broken-Link-Crawler
:robot: Python bot that crawls your website looking for dead stuff
https://github.com/healeycodes/Broken-Link-Crawler
bot crawler python
Last synced: 3 months ago
JSON representation
:robot: Python bot that crawls your website looking for dead stuff
- Host: GitHub
- URL: https://github.com/healeycodes/Broken-Link-Crawler
- Owner: healeycodes
- License: mit
- Created: 2019-03-31T08:03:53.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-09-29T19:55:48.000Z (about 2 years ago)
- Last Synced: 2024-04-30T10:27:11.158Z (8 months ago)
- Topics: bot, crawler, python
- Language: Python
- Homepage: https://healeycodes.com/python/beginners/tutorial/webdev/2019/04/02/dead-link-bot.html
- Size: 148 KB
- Stars: 42
- Watchers: 3
- Forks: 14
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
This was for my **[tutorial](https://healeycodes.com/python/beginners/tutorial/webdev/2019/04/02/dead-link-bot.html)** on building a dead link checker so its scope has been kept quite small.
# Broken Link Crawler
![Desktop](https://github.com/healeycodes/Broken-Link-Crawler/blob/master/bot-in-action.gif)
Let's say I have a website and I want to find any dead links and images on this website.
```bash
$ python deadseeker.py 'https://healeycodes.com/'
> 404 - https://docs.python.org/3/library/missing.html
> 404 - https://github.com/microsoft/solitare2
```The website is crawled, and all `href` and `src` attributes are sent a request. Errors are reported. This bot doesn't observe `robots.txt` but _you should_.
### It is not a clever bot. But it is a good bot.
Accepting (small) PRs and issues!