Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/danhje/dead-link-crawler
An efficient, asynchronous crawler that identifies broken links on a given domain.
https://github.com/danhje/dead-link-crawler
async broken-links crawler dead-links python python3
Last synced: 5 days ago
JSON representation
An efficient, asynchronous crawler that identifies broken links on a given domain.
- Host: GitHub
- URL: https://github.com/danhje/dead-link-crawler
- Owner: danhje
- License: mit
- Created: 2018-12-20T18:27:07.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-10-17T23:03:31.000Z (about 1 year ago)
- Last Synced: 2024-08-02T12:44:48.060Z (3 months ago)
- Topics: async, broken-links, crawler, dead-links, python, python3
- Language: Python
- Homepage:
- Size: 218 KB
- Stars: 14
- Watchers: 4
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- project-awesome - danhje/dead-link-crawler - An efficient, asynchronous crawler that identifies broken links on a given domain. (Python)
- awesome-starred - danhje/dead-link-crawler - An efficient, asynchronous crawler that identifies broken links on a given domain. (python3)
README
# Dead Link Crawler
An efficient, asynchronous crawler that identifies broken links on a given domain.## Installation
```shell
git clone https://github.com/danhje/dead-link-crawler.git
cd dead-link-crawler
pipenv install
```## Usage
To start Python from within the virtual environment:
```shell
pipenv run python
```
To start the crawl and print the results:
```python
from deadLinkCrawler import DeadLinkCrawlercrawler = DeadLinkCrawler()
crawler.startCrawl('http://danielhjertholm.me/prosjekter.htm', verbose=True)
crawler.printDeadLinks()
checkedLinks = crawler.checkedLinks
deadLinks = list(crawler.deadLinks)
```