https://github.com/benjaminmedia/crawl_errors

A simple HTTP crawler, that reports 500 and 404 errors for a domain.
https://github.com/benjaminmedia/crawl_errors

Last synced: 2 months ago
JSON representation

A simple HTTP crawler, that reports 500 and 404 errors for a domain.

Host: GitHub
URL: https://github.com/benjaminmedia/crawl_errors
Owner: BenjaminMedia
Created: 2012-01-26T13:16:46.000Z (almost 14 years ago)
Default Branch: master
Last Pushed: 2022-10-19T08:41:35.000Z (about 3 years ago)
Last Synced: 2025-03-26T14:54:34.694Z (8 months ago)
Language: Ruby
Homepage:
Size: 11.7 KB
Stars: 1
Watchers: 8
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Crawl errors [![Build Status](https://secure.travis-ci.org/ekampp/crawl_errors.png)](http://travis-ci.org/ekampp/crawl_errors)

This is a simple rack application that crawls through a website on a domain, reporting the links it visits along the way.

You can use this to find different errors (such as 404 and 500) on your site.

## Setup

Setting up is as simple as cloning the project and then install dependencies:

git clone git://github.com/BenjaminMedia/crawl_errors.git
bundle install

And you can go to the usage section.

## Usage

You run it with this command:

./crawl_errors.rb http://example.com/

You can format the domain as you like, adding a port `http://example.com:8080` or a subdomain `http://something.example.com`, but must include the protocol (http://).

If you want the crawler to only repport actual errors (not 200 OK) you should pass the `--report-errors-only` flag when you run the script, like this:

./crawl_errors.rb http://example.com --report-errors-only

If you need to log the error output of the crawl to the `log.txt` file for later use this can be done with the `--log-errors` flag.

./crawl_errors.rb http://example.com --log-errors

## Limitations

For now it only performs GET requests, and it doesn't adhere to the rel-nofollow rules. This is something that could be expanded on in later versions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/benjaminmedia/crawl_errors

Awesome Lists containing this project

README