https://github.com/hmol/LinkCrawler

Find broken links in webpage
https://github.com/hmol/LinkCrawler

console-application dotnet http-requests http-response urls

Last synced: 6 months ago
JSON representation

Find broken links in webpage

Host: GitHub
URL: https://github.com/hmol/LinkCrawler
Owner: hmol
License: mit
Created: 2016-02-03T17:55:27.000Z (over 9 years ago)
Default Branch: develop
Last Pushed: 2022-11-16T01:14:11.000Z (almost 3 years ago)
Last Synced: 2024-08-01T21:41:47.593Z (about 1 year ago)
Topics: console-application, dotnet, http-requests, http-response, urls
Language: C#
Homepage:
Size: 131 KB
Stars: 117
Watchers: 21
Forks: 60
Open Issues: 21
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# LinkCrawler
Simple C# console application that will crawl the given webpage for broken image-tags and hyperlinks. The result of this will be written to output. Right now we have these outputs: console, csv, slack.

## Why?
Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located.

## Build
Clone repo :point_right: open solution in Visual Studio :point_right: build :facepunch:

AppVeyor is used as CI, so when code is pushed to this repo the solution will get built and all tests will be run.

| Branch | Build status |
| :----- | :---------------------------------------|
| develop | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/develop?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/develop) |
| master | [![Build status](https://ci.appveyor.com/api/projects/status/syw3l7xeicy7xc0b/branch/master?svg=true)](https://ci.appveyor.com/project/hmol/linkcrawler/branch/master) |

## AppSettings

| Key | Usage |
| :-------------------------- | :---------------------------------------|
| ```BaseUrl ``` | Base url for site to crawl |
| ```SuccessHttpStatusCodes``` | HTTP status codes that are considered "successful". Example: "1xx,2xx,302,303" |
| ```CheckImages``` | If true, ``` that controls what output should be used.

## Output to file
```LinkCrawler.exe >> crawl.log``` will save output to file.
![Slack](http://henrikm.com/content/images/2016/Feb/as-file.png "Output to file")

## Output to slack
If configured correctly, the defined slack-webhook will be notified about broken links.
![Slack](http://henrikm.com/content/images/2016/Feb/blurred1.jpg "Slack")

##How I use it
I have it running as an Webjob in Azure, scheduled every 4 days. It will notify the slack-channel where the editors of the website dwells.

Creating a webjob is simple. Just put your compiled project files (/bin/) inside a .zip, and upload it.
![Slack](http://henrikm.com/content/images/2016/Feb/azure-webjob-setup-1.PNG "WebJob")

Schedule it.

![Slack](http://henrikm.com/content/images/2016/Feb/azure-scheduele.PNG)

The output of a webjob is available because Azure saves it in log files.
![Slack](http://henrikm.com/content/images/2016/Feb/azure-log.PNG)

Read more about Azure Webjobs: https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/

Read more about Slack incoming webhooks: https://api.slack.com/incoming-webhooks

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hmol/LinkCrawler

Awesome Lists containing this project

README