{"id":15486518,"url":"https://github.com/dblock/ruby-link-checker","last_synced_at":"2025-06-18T18:05:24.373Z","repository":{"id":149992782,"uuid":"620974105","full_name":"dblock/ruby-link-checker","owner":"dblock","description":"Fast ruby link checker.","archived":false,"fork":false,"pushed_at":"2023-06-13T19:48:54.000Z","size":226,"stargazers_count":2,"open_issues_count":5,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-05T05:38:53.276Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dblock.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-29T18:33:53.000Z","updated_at":"2024-09-25T03:00:02.000Z","dependencies_parsed_at":"2024-10-08T23:10:39.138Z","dependency_job_id":"f0f40ec6-7a4c-4157-83d3-6ffa1f625762","html_url":"https://github.com/dblock/ruby-link-checker","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/dblock/ruby-link-checker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dblock%2Fruby-link-checker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dblock%2Fruby-link-checker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dblock%2Fruby-link-checker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dblock%2Fruby-link-checker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dblock","download_url":"https://codeload.github.com/dblock/ruby-link-checker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dblock%2Fruby-link-checker/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260606473,"owners_count":23035350,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T06:08:44.015Z","updated_at":"2025-06-18T18:05:19.363Z","avatar_url":"https://github.com/dblock.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"Fast Ruby Link Checker\n======================\n\n[![Gem Version](http://img.shields.io/gem/v/ruby-link-checker.svg)](http://badge.fury.io/rb/ruby-link-checker)\n[![Build Status](https://github.com/dblock/ruby-link-checker/workflows/test/badge.svg?branch=main)](https://github.com/dblock/ruby-link-checker/actions)\n[![Code Climate](https://codeclimate.com/github/dblock/ruby-link-checker.svg)](https://codeclimate.com/github/dblock/ruby-link-checker)\n[![Test Coverage](https://api.codeclimate.com/v1/badges/164f1e23fc706b6efa63/test_coverage)](https://codeclimate.com/github/dblock/ruby-link-checker/test_coverage)\n\nA fast Ruby link checker with support for multiple HTTP libraries. Does not parse documents, just checks links. Fast. Anecdotal benchmarking on a M1 mac and T1 Internet yields ~50 URLs per second with `LinkChecker::Typhoeus::Hydra`.\n\n## Table of Contents\n\n- [Usage](#usage)\n  - [Dependencies](#dependencies)\n  - [Basic Usage](#basic-usage)\n  - [Passing Options](#passing-options)\n  - [Checkers](#checkers)\n    - [LinkChecker::Typhoeus::Hydra](#linkcheckertyphoeushydra)\n    - [LinkChecker::Net::HTTP](#linkcheckernethttp)\n  - [Options](#options)\n    - [Retries](#retries)\n    - [Results](#results)\n    - [Methods](#methods)\n    - [Logger](#logger)\n    - [User-Agent](#user-agent)\n  - [Global Configuration](#global-configuration)\n  - [Callbacks and Events](#callbacks-and-events)\n- [Contributing](#contributing)\n- [Copyright and License](#copyright-and-license)\n\n## Usage\n\n### Dependencies\n\nThe [`LinkChecker::Typhoeus::Hydra`](lib/ruby-link-checker/typhoeus/hydra/checker.rb) link checker is recommended. \n\nAdd `typhoeus` and `ruby-link-checker` to your `Gemfile` and run `bundle install`.\n\n```ruby\ngem 'typhoeus'\ngem 'ruby-link-checker'\n```\n\n### Basic Usage\n\n```ruby\nrequire 'typhoeus'\nrequire 'ruby-link-checker'\n\n# create a new checker instance\nchecker = LinkChecker::Typhoeus::Hydra::Checker.new\n\n# queue URLs to check\nlinks = [...]\nlinks.each do |url|\n  checker.check url\nend\n\n# run the checks\nchecker.run\n\n# display buckets of results\nchecker.results.each_pair do |bucket, results|\n  puts \"#{bucket}: #{results.size}\"\nend\n```\n\n### Passing Options\n\nYou can pipe custom options through `check` and retrieve them in events as follows.\n\n```ruby\nchecker.check 'https://www.example.org', { location: 'page.html' }\n\nchecker.on :success do |result|\n  result.options # contains { location: 'page.html' }\nend\n```\n\n### Checkers\n\n#### [LinkChecker::Typhoeus::Hydra](lib/ruby-link-checker/typhoeus/hydra/checker.rb)\n\nFast link checker that uses [Typhoeus](https://typhoeus.github.io/). \n\n```ruby\nrequire 'typhoeus'\nrequire 'ruby-link-checker'\n\n# create a new instance of a checker\nchecker = LinkChecker::Typhoeus::Hydra::Checker.new(\n  hydra: {\n    # lower than the Typhoeus default of 200, seems to start breaking around 50+\n    max_concurrency: 25\n  }\n)\n\n# log requests and response codes\nchecker.logger.level = Logger::INFO\n\nlinks = [...] # array of URLs\nlinks.each do |url|\n  checker.check url\nend\n\n# examine failures and errors as they come\nchecker.on :error, :failure do |result|\n  puts \"FAIL: #{result}\"\nend    \n\n# execute Hydra#run, will block until all requests have completed\nchecker.run\n\n# examine results\nchecker.results.each_pair do |bucket, results|\n  puts \"#{bucket}: #{results.size}\"\nend\n```\n\nYou can pass `Typhoeus` timeout options into a new instance of a checker, or configure timeouts globally.\n\n```ruby\nLinkChecker::Typhoeus::Hydra.configure do |config|\n  config.timeout = 5\n  config.connecttimeout = 10\nend\n```\n\n#### [LinkChecker::Net::HTTP](lib/ruby-link-checker/net/http/checker.rb)\n\nSlow, sequential checker.\n\n```ruby\nrequire 'net/http'\nrequire 'ruby-link-checker'\n\n# create a new instance of a checker\nchecker = LinkChecker::Net::HTTP::Checker.new\n\n# log requests and response codes\nchecker.logger.level = Logger::INFO\n\nlinks = [...] # array of URLs\nlinks.each do |url|\n  checker.check url\nend\n\n# examine results\nchecker.results.each_pair do |bucket, results|\n  puts \"#{bucket}: #{results.size}\"\nend\n```\n\nYou can pass `Net::HTTP` timeout options into a new instance of a checker, or configure timeouts globally.\n\n```ruby\nLinkChecker::Net::HTTP.configure do |config|\n  config.read_timeout = 5\n  config.open_timeout = 10\nend\n```\n\n### Options\n\n#### Retries\n\nBy default link checkers do not retry. You can set a number of times to retry all errors and failures with `retries`.\n\n```ruby\nchecker = LinkChecker::Net::HTTP::Checker.new(retry: 1)\n```\n\n#### Results\n\nBy default checkers collect results. \n\n```ruby\nchecker = LinkChecker::Net::HTTP::Checker.new(results: false)\n...\nchecker.run\n\nchecker.results # =\u003e { error: [...], failure: [...], success: [...] }\n```\n\nYou can disable this with `results: false`.\n\n```ruby\nchecker = LinkChecker::Net::HTTP::Checker.new(results: false)\n...\nchecker.run\n\nchecker.results # =\u003e nil\n```\n\n#### Methods\n\nBy default checkers try a `HEAD` request, followed by a `GET` if `HEAD` fails. You can change this behavior by specifying other methods.\n\nThe following examples disables `GET` and only makes `HEAD` requests.\n\n```ruby\nchecker = LinkChecker::Net::HTTP::Checker.new(methods: %w[HEAD])\n```\n\n#### Logger\n\nPass your own logger.\n\n```ruby\nchecker = LinkChecker::Net::HTTP::Checker.new(logger: Logger.new(STDOUT))\n```\n\n#### User-Agent\n\nPass your own user-agent. Default is `Ruby Link Checker/x.y.z`.\n\n```ruby\nchecker = LinkChecker::Net::HTTP::Checker.new(user_agent: 'Custom Agent/1.0')\n```\n\n### Global Configuration\n\nAll options can also be configured globally.\n\n```ruby\nLinkChecker.configure do |config|\n  config.user_agent = 'Custom Agent/1.0'\n  config.methods = ['HEAD', 'GET']\n  config.logger = ::Logger.new(STDOUT)\nend\n```\n\n### Callbacks and Events\n\nEvents enable processing of results as they become available.\n\n```ruby\nchecker.on :result do |result|\n  puts result # any result\nend\n\nchecker.on :error, :failure do |result|\n  puts result # error or failure\nend\n```\n\nCheckers support the following events.\n\n| Event    | Description                                                    |\n|----------|----------------------------------------------------------------|\n| :retry   | A request is being retried on failure or error.                |\n| :result  | A new result, any of success, failure, or error.               |\n| :success | A valid URL, usually a 2xx response from the server.           |\n| :failure | A failed URL, usually a 4xx or a 5xx response from the server. |\n| :error   | An error, such as an invalid URL or a network timeout.         |\n\nEvents are called with results, which contain the following properties.\n\n| Property          | Description                                                     |\n|-------------------|-----------------------------------------------------------------|\n| :url              | The original URL before redirects.                              |\n| :result_url       | The last URL, different from `url` in case of redirects.        |\n| :method           | The result HTTP method.                                         |\n| :code             | HTTP error code.                                                |\n| :request_headers  | Request headers.                                                |\n| :redirect_to      | A redirect URL in case of redirects.                            |\n| :error            | A raised error in case of errors.                               |\n\nSee [result.rb](lib/ruby-link-checker/result.rb) for more details.\n\n## Contributing\n\nYou're encouraged to contribute to ruby-link-checker. See [CONTRIBUTING](CONTRIBUTING.md) for details.\n\n## Copyright and License\n\nCopyright (c) Daniel Doubrovkine and [Contributors](CHANGELOG.md).\n\nThis project is licensed under the [MIT License](LICENSE.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdblock%2Fruby-link-checker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdblock%2Fruby-link-checker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdblock%2Fruby-link-checker/lists"}