Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/healeycodes/Broken-Link-Crawler

:robot: Python bot that crawls your website looking for dead stuff
https://github.com/healeycodes/Broken-Link-Crawler

bot crawler python

Last synced: about 1 month ago
JSON representation

:robot: Python bot that crawls your website looking for dead stuff

Awesome Lists containing this project

README

        

This was for my **[tutorial](https://healeycodes.com/python/beginners/tutorial/webdev/2019/04/02/dead-link-bot.html)** on building a dead link checker so its scope has been kept quite small.

# Broken Link Crawler

![Desktop](https://github.com/healeycodes/Broken-Link-Crawler/blob/master/bot-in-action.gif)

Let's say I have a website and I want to find any dead links and images on this website.

```bash
$ python deadseeker.py 'https://healeycodes.com/'
> 404 - https://docs.python.org/3/library/missing.html
> 404 - https://github.com/microsoft/solitare2
```

The website is crawled, and all `href` and `src` attributes are sent a request. Errors are reported. This bot doesn't observe `robots.txt` but _you should_.

### It is not a clever bot. But it is a good bot.


Accepting (small) PRs and issues!