{"id":15563063,"url":"https://github.com/dkam/probot","last_synced_at":"2026-02-04T15:02:13.711Z","repository":{"id":193780720,"uuid":"689485221","full_name":"dkam/probot","owner":"dkam","description":"A Ruby robots.txt parser.","archived":false,"fork":false,"pushed_at":"2024-12-23T23:55:10.000Z","size":24,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-09T06:36:35.254Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dkam.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-10T00:28:58.000Z","updated_at":"2024-12-23T23:55:13.000Z","dependencies_parsed_at":"2024-12-09T03:40:45.647Z","dependency_job_id":"6767e447-3b00-4900-8332-f2b7b74b2400","html_url":"https://github.com/dkam/probot","commit_stats":{"total_commits":7,"total_committers":1,"mean_commits":7.0,"dds":0.0,"last_synced_commit":"ad48a4e3351371741567f5f836aad3be9479f4af"},"previous_names":["dkam/probot"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkam%2Fprobot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkam%2Fprobot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkam%2Fprobot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkam%2Fprobot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dkam","download_url":"https://codeload.github.com/dkam/probot/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248483451,"owners_count":21111454,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T16:17:16.983Z","updated_at":"2026-02-04T15:02:13.644Z","avatar_url":"https://github.com/dkam.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Probot\n\nOMG another Ruby Robot.txt parser? It was an accident, I didn't mean to make it and I shouldn't have but here we are. It started out tiny and grew. Yes I should have used one of the other gems.\n\nDoes this even deserve a gem? Feel free to just copy and paste the single file which implements this - one less dependency eh? \n\nOn the plus side of this yak shaving, there are some nice features I don't think the others have.\n\n1. Support for consecutive user agents making up a single record:\n\n```txt\nUser-agent: first-agent\nUser-agent: second-agent\nDisallow: /\n```\n\nThis record blocks both first-agent and second-agent from the site.\n\n2. It selects the most specific allow / disallow rule, using rule length as a proxy for specificity. You can also ask it to show you the matching rules and their scores. \n\n```ruby\ntxt = %Q{\nUser-agent: *\nDisallow: /dir1\nAllow: /dir1/dir2\nDisallow: /dir1/dir2/dir3\n}\nProbot.new(txt).matches(\"/dir1/dir2/dir3\")\n=\u003e {:disallowed=\u003e{/\\/dir1/=\u003e5, /\\/dir1\\/dir2\\/dir3/=\u003e15}, :allowed=\u003e{/\\/dir1\\/dir2/=\u003e10}}\n```\n\nIn this case, we can see the Disallow rule with length 15 would be followed.\n\n3. It sets the User-Agent string when fetching robots.txt\n\n## Installation\n\nInstall the gem and add to the application's Gemfile by executing:\n\n    $ bundle add probot\n\nIf bundler is not being used to manage dependencies, install the gem by executing:\n\n    $ gem install probot\n\n## Usage\n\nIt's straightforward to use. Instantiate it if you'll make a few requests:\n\n```ruby\n\u003e r = Probot.new('https://booko.info', agent: 'BookScraper')\n\u003e r.rules\n=\u003e  {\"*\"=\u003e{\"disallow\"=\u003e[/\\/search/, /\\/products\\/search/, /\\/.*\\/refresh_prices/, /\\/.*\\/add_to_cart/, /\\/.*\\/get_prices/, /\\/lists\\/add/, /\\/.*\\/add$/, /\\/api\\//, /\\/users\\/bits/, /\\/users\\/create/, /\\/prices\\//, /\\/widgets\\/issue/], \"allow\"=\u003e[], \"crawl_delay\"=\u003e0, \"crawl-delay\"=\u003e0.1},\n \"YandexBot\"=\u003e{\"disallow\"=\u003e[], \"allow\"=\u003e[], \"crawl_delay\"=\u003e0, \"crawl-delay\"=\u003e300.0}}\n\n\u003e r.allowed?(\"/abc/refresh_prices\")\n=\u003e false\n\u003e r.allowed?(\"https://booko.info/9780765397522/All-Systems-Red\")\n=\u003e true\n\u003e r.allowed?(\"https://booko.info/9780765397522/refresh_prices\")\n=\u003e false\n```\n\nOr just one-shot it for one-offs: \n\n```ruby\nProbot.allowed?(\"https://booko.info/9780765397522/All-Systems-Red\", agent: \"BookScraper\")\n```\n\n\n## Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/Probot.\n\n## Further Reading\n\n*  https://moz.com/learn/seo/robotstxt\n*  https://stackoverflow.com/questions/45293419/order-of-directives-in-robots-txt-do-they-overwrite-each-other-or-complement-ea\n*  https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt\n*  https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt\n\n*  https://github.com/google/robotstxt  - Google's official parser\n\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdkam%2Fprobot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdkam%2Fprobot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdkam%2Fprobot/lists"}