https://github.com/danhje/dead-link-crawler
An efficient, asynchronous crawler that identifies broken links on a given domain.
https://github.com/danhje/dead-link-crawler
async broken-links crawler dead-links python python3
Last synced: 12 months ago
JSON representation
An efficient, asynchronous crawler that identifies broken links on a given domain.
- Host: GitHub
- URL: https://github.com/danhje/dead-link-crawler
- Owner: danhje
- License: mit
- Created: 2018-12-20T18:27:07.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2025-01-03T11:17:40.000Z (over 1 year ago)
- Last Synced: 2025-06-10T05:07:11.271Z (about 1 year ago)
- Topics: async, broken-links, crawler, dead-links, python, python3
- Language: Python
- Homepage:
- Size: 133 KB
- Stars: 14
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-starred - danhje/dead-link-crawler - An efficient, asynchronous crawler that identifies broken links on a given domain. (python3)
README
# Dead Link Crawler
An efficient, asynchronous crawler that identifies broken links on a given domain.
## Installation
```shell
git clone https://github.com/danhje/dead-link-crawler.git
cd dead-link-crawler
pipenv install
```
## Usage
To start Python from within the virtual environment:
```shell
pipenv run python
```
To start the crawl and print the results:
```python
from deadLinkCrawler import DeadLinkCrawler
crawler = DeadLinkCrawler()
crawler.startCrawl('http://danielhjertholm.me/prosjekter.htm', verbose=True)
crawler.printDeadLinks()
checkedLinks = crawler.checkedLinks
deadLinks = list(crawler.deadLinks)
```