{"id":22606688,"url":"https://github.com/benjaminmedia/crawl_errors","last_synced_at":"2025-10-03T13:31:56.550Z","repository":{"id":2314055,"uuid":"3273873","full_name":"BenjaminMedia/crawl_errors","owner":"BenjaminMedia","description":"A simple HTTP crawler, that reports 500 and 404 errors for a domain.","archived":false,"fork":false,"pushed_at":"2022-10-19T08:41:35.000Z","size":12,"stargazers_count":1,"open_issues_count":2,"forks_count":2,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-26T14:54:34.694Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BenjaminMedia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-01-26T13:16:46.000Z","updated_at":"2014-03-26T10:38:06.000Z","dependencies_parsed_at":"2023-01-11T16:09:17.342Z","dependency_job_id":null,"html_url":"https://github.com/BenjaminMedia/crawl_errors","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenjaminMedia%2Fcrawl_errors","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenjaminMedia%2Fcrawl_errors/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenjaminMedia%2Fcrawl_errors/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenjaminMedia%2Fcrawl_errors/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BenjaminMedia","download_url":"https://codeload.github.com/BenjaminMedia/crawl_errors/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246106689,"owners_count":20724401,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-08T14:14:36.865Z","updated_at":"2025-10-03T13:31:56.457Z","avatar_url":"https://github.com/BenjaminMedia.png","language":"Ruby","readme":"# Crawl errors [![Build Status](https://secure.travis-ci.org/ekampp/crawl_errors.png)](http://travis-ci.org/ekampp/crawl_errors)\n\n\nThis is a simple rack application that crawls through a website on a domain, reporting the links it visits along the way.\n\nYou can use this to find different errors (such as 404 and 500) on your site.\n\n## Setup\n\nSetting up is as simple as cloning the project and then install dependencies:\n\n    git clone git://github.com/BenjaminMedia/crawl_errors.git\n    bundle install\n\nAnd you can go to the usage section.\n\n## Usage\n\nYou run it with this command:\n\n    ./crawl_errors.rb http://example.com/\n\nYou can format the domain as you like, adding a port `http://example.com:8080` or a subdomain `http://something.example.com`, but must include the protocol (http://).\n\nIf you want the crawler to only repport actual errors (not 200 OK) you should pass the `--report-errors-only` flag when you run the script, like this:\n\n    ./crawl_errors.rb http://example.com --report-errors-only\n\nIf you need to log the error output of the crawl to the `log.txt` file for later use this can be done with the `--log-errors` flag.\n\n    ./crawl_errors.rb http://example.com --log-errors\n\n## Limitations\n\nFor now it only performs GET requests, and it doesn't adhere to the rel-nofollow rules. This is something that could be expanded on in later versions.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenjaminmedia%2Fcrawl_errors","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenjaminmedia%2Fcrawl_errors","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenjaminmedia%2Fcrawl_errors/lists"}