{"id":15675141,"url":"https://github.com/jonathanhefner/grubby","last_synced_at":"2025-11-11T20:25:29.257Z","repository":{"id":56875409,"uuid":"102290983","full_name":"jonathanhefner/grubby","owner":"jonathanhefner","description":"Fail-fast web scraping","archived":false,"fork":false,"pushed_at":"2023-05-14T19:56:16.000Z","size":110,"stargazers_count":13,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-24T03:06:14.485Z","etag":null,"topics":["mechanize","ruby","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jonathanhefner.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-03T20:06:32.000Z","updated_at":"2024-01-05T23:43:18.000Z","dependencies_parsed_at":"2024-10-03T16:07:30.156Z","dependency_job_id":null,"html_url":"https://github.com/jonathanhefner/grubby","commit_stats":{"total_commits":113,"total_committers":2,"mean_commits":56.5,"dds":0.2654867256637168,"last_synced_commit":"41d26a84dbbc201e6103bf781fdd68bd447761e5"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanhefner%2Fgrubby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanhefner%2Fgrubby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanhefner%2Fgrubby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonathanhefner%2Fgrubby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jonathanhefner","download_url":"https://codeload.github.com/jonathanhefner/grubby/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252776573,"owners_count":21802467,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mechanize","ruby","web-scraping"],"created_at":"2024-10-03T15:57:14.175Z","updated_at":"2025-11-11T20:25:24.205Z","avatar_url":"https://github.com/jonathanhefner.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# grubby\n\n[Fail-fast] web scraping.  *grubby* adds a layer of utility and\nerror-checking atop the marvelous [Mechanize gem].  See API listing\nbelow, or browse the [full documentation].\n\n[Fail-fast]: https://en.wikipedia.org/wiki/Fail-fast\n[Mechanize gem]: https://rubygems.org/gems/mechanize\n[full documentation]: https://www.rubydoc.info/gems/grubby/\n\n\n## Examples\n\nThe following code scrapes stories from the [Hacker News](\nhttps://news.ycombinator.com/news) front page:\n\n```ruby\nrequire \"grubby\"\n\nclass HackerNews \u003c Grubby::PageScraper\n  scrapes(:items) do\n    page.search!(\".athing\").map{|element| Item.new(element) }\n  end\n\n  class Item \u003c Grubby::Scraper\n    scrapes(:story_link){ source.at!(\"a.storylink\") }\n\n    scrapes(:story_url){ expand_url(story_link[\"href\"]) }\n\n    scrapes(:title){ story_link.text }\n\n    scrapes(:comments_link, optional: true) do\n      source.next_sibling.search!(\".subtext a\").find do |link|\n        link.text.match?(/comment|discuss/)\n      end\n    end\n\n    scrapes(:comments_url, if: :comments_link) do\n      expand_url(comments_link[\"href\"])\n    end\n\n    scrapes(:comment_count, if: :comments_link) do\n      comments_link.text.to_i\n    end\n\n    def expand_url(url)\n      url.include?(\"://\") ? url : source.document.uri.merge(url).to_s\n    end\n  end\nend\n\n# The following line will raise an exception if anything goes wrong\n# during the scraping process.  For example, if the structure of the\n# HTML does not match expectations due to a site change, the script will\n# terminate immediately with a helpful error message.  This prevents bad\n# data from propagating and causing hard-to-trace errors.\nhn = HackerNews.scrape(\"https://news.ycombinator.com/news\")\n\n# Your processing logic goes here:\nhn.items.take(10).each do |item|\n  puts \"* #{item.title}\"\n  puts \"  #{item.story_url}\"\n  puts \"  #{item.comment_count} comments: #{item.comments_url}\" if item.comments_url\n  puts\nend\n```\n\nHacker News also offers a [JSON API](https://github.com/HackerNews/API),\nwhich may be more robust for scraping purposes.  *grubby* can scrape\nJSON just as well:\n\n```ruby\nrequire \"grubby\"\n\nclass HackerNews \u003c Grubby::JsonScraper\n  scrapes(:items) do\n    # API returns array of top 500 item IDs, so limit as necessary\n    json.take(10).map do |item_id|\n      Item.scrape(\"https://hacker-news.firebaseio.com/v0/item/#{item_id}.json\")\n    end\n  end\n\n  class Item \u003c Grubby::JsonScraper\n    scrapes(:story_url){ json[\"url\"] || hn_url }\n\n    scrapes(:title){ json[\"title\"] }\n\n    scrapes(:comments_url, optional: true) do\n      hn_url if json[\"descendants\"]\n    end\n\n    scrapes(:comment_count, optional: true) do\n      json[\"descendants\"]\u0026.to_i\n    end\n\n    def hn_url\n      \"https://news.ycombinator.com/item?id=#{json[\"id\"]}\"\n    end\n  end\nend\n\nhn = HackerNews.scrape(\"https://hacker-news.firebaseio.com/v0/topstories.json\")\n\n# Your processing logic goes here:\nhn.items.each do |item|\n  puts \"* #{item.title}\"\n  puts \"  #{item.story_url}\"\n  puts \"  #{item.comment_count} comments: #{item.comments_url}\" if item.comments_url\n  puts\nend\n```\n\n\n## Core API\n\n- [Grubby](https://www.rubydoc.info/gems/grubby/Grubby)\n  - [#fulfill](https://www.rubydoc.info/gems/grubby/Grubby:fulfill)\n  - [#get_mirrored](https://www.rubydoc.info/gems/grubby/Grubby:get_mirrored)\n  - [#ok?](https://www.rubydoc.info/gems/grubby/Grubby:ok%3F)\n  - [#time_between_requests](https://www.rubydoc.info/gems/grubby/Grubby:time_between_requests)\n- [Scraper](https://www.rubydoc.info/gems/grubby/Grubby/Scraper)\n  - [.each](https://www.rubydoc.info/gems/grubby/Grubby/Scraper.each)\n  - [.scrape](https://www.rubydoc.info/gems/grubby/Grubby/Scraper.scrape)\n  - [.scrapes](https://www.rubydoc.info/gems/grubby/Grubby/Scraper.scrapes)\n  - [#[]](https://www.rubydoc.info/gems/grubby/Grubby/Scraper:[])\n  - [#to_h](https://www.rubydoc.info/gems/grubby/Grubby/Scraper:to_h)\n- [PageScraper](https://www.rubydoc.info/gems/grubby/Grubby/PageScraper)\n  - [.scrape_file](https://www.rubydoc.info/gems/grubby/Grubby/PageScraper.scrape_file)\n  - [#page](https://www.rubydoc.info/gems/grubby/Grubby/PageScraper:page)\n- [JsonScraper](https://www.rubydoc.info/gems/grubby/Grubby/JsonScraper)\n  - [.scrape_file](https://www.rubydoc.info/gems/grubby/Grubby/JsonScraper.scrape_file)\n  - [#json](https://www.rubydoc.info/gems/grubby/Grubby/JsonScraper:json)\n- Mechanize::File\n  - [#save_to](https://www.rubydoc.info/gems/grubby/Mechanize/Parser:save_to)\n  - [#save_to!](https://www.rubydoc.info/gems/grubby/Mechanize/Parser:save_to%21)\n- Mechanize::Page\n  - [#at!](https://www.rubydoc.info/gems/grubby/Mechanize/Page:at%21)\n  - [#search!](https://www.rubydoc.info/gems/grubby/Mechanize/Page:search%21)\n- Mechanize::Page::Link\n  - [#to_absolute_uri](https://www.rubydoc.info/gems/grubby/Mechanize/Page/Link#to_absolute_uri)\n- URI\n  - [#basename](https://www.rubydoc.info/gems/grubby/URI:basename)\n  - [#query_param](https://www.rubydoc.info/gems/grubby/URI:query_param)\n\n\n## Auxiliary API\n\n*grubby* loads several gems that extend Ruby objects with utility\nmethods.  Some of those methods are listed below.  See each gem's\ndocumentation for a complete API listing.\n\n- [Active Support](https://rubygems.org/gems/activesupport)\n  ([docs](https://www.rubydoc.info/gems/activesupport/))\n  - [Enumerable#index_by](https://www.rubydoc.info/gems/activesupport/Enumerable:index_by)\n  - [File.atomic_write](https://www.rubydoc.info/gems/activesupport/File:atomic_write)\n  - [Object#presence](https://www.rubydoc.info/gems/activesupport/Object:presence)\n  - [String#blank?](https://www.rubydoc.info/gems/activesupport/String:blank%3F)\n  - [String#squish](https://www.rubydoc.info/gems/activesupport/String:squish)\n- [casual_support](https://rubygems.org/gems/casual_support)\n  ([docs](https://www.rubydoc.info/gems/casual_support/))\n  - [Enumerable#index_to](https://www.rubydoc.info/gems/casual_support/Enumerable:index_to)\n  - [String#after](https://www.rubydoc.info/gems/casual_support/String:after)\n  - [String#after_last](https://www.rubydoc.info/gems/casual_support/String:after_last)\n  - [String#before](https://www.rubydoc.info/gems/casual_support/String:before)\n  - [String#before_last](https://www.rubydoc.info/gems/casual_support/String:before_last)\n  - [String#between](https://www.rubydoc.info/gems/casual_support/String:between)\n  - [Time#to_hms](https://www.rubydoc.info/gems/casual_support/Time:to_hms)\n  - [Time#to_ymd](https://www.rubydoc.info/gems/casual_support/Time:to_ymd)\n- [gorge](https://rubygems.org/gems/gorge)\n  ([docs](https://www.rubydoc.info/gems/gorge/))\n  - [Pathname#file_crc32](https://www.rubydoc.info/gems/gorge/Pathname:file_crc32)\n  - [Pathname#file_md5](https://www.rubydoc.info/gems/gorge/Pathname:file_md5)\n  - [Pathname#file_sha1](https://www.rubydoc.info/gems/gorge/Pathname:file_sha1)\n- [mini_sanity](https://rubygems.org/gems/mini_sanity)\n  ([docs](https://www.rubydoc.info/gems/mini_sanity/))\n  - [Enumerator#result!](https://www.rubydoc.info/gems/mini_sanity/Enumerator:result%21)\n  - [Enumerator#results!](https://www.rubydoc.info/gems/mini_sanity/Enumerator:results%21)\n  - [Object#assert!](https://www.rubydoc.info/gems/mini_sanity/Object:assert%21)\n  - [Object#refute!](https://www.rubydoc.info/gems/mini_sanity/Object:refute%21)\n  - [String#match!](https://www.rubydoc.info/gems/mini_sanity/String:match%21)\n- [pleasant_path](https://rubygems.org/gems/pleasant_path)\n  ([docs](https://www.rubydoc.info/gems/pleasant_path/))\n  - [Pathname#available_name](https://www.rubydoc.info/gems/pleasant_path/Pathname:available_name)\n  - [Pathname#existence](https://www.rubydoc.info/gems/pleasant_path/Pathname:existence)\n  - [Pathname#make_dirname](https://www.rubydoc.info/gems/pleasant_path/Pathname:make_dirname)\n  - [Pathname#move_as](https://www.rubydoc.info/gems/pleasant_path/Pathname:move_as)\n  - [Pathname#rename_basename](https://www.rubydoc.info/gems/pleasant_path/Pathname:rename_basename)\n  - [Pathname#rename_extname](https://www.rubydoc.info/gems/pleasant_path/Pathname:rename_extname)\n- [ryoba](https://rubygems.org/gems/ryoba)\n  ([docs](https://www.rubydoc.info/gems/ryoba/))\n  - [Nokogiri::XML::Node#matches!](https://www.rubydoc.info/gems/ryoba/Nokogiri/XML/Node:matches%21)\n  - [Nokogiri::XML::Node#text!](https://www.rubydoc.info/gems/ryoba/Nokogiri/XML/Node:text%21)\n  - [Nokogiri::XML::Node#uri](https://www.rubydoc.info/gems/ryoba/Nokogiri/XML/Node:uri)\n  - [Nokogiri::XML::Searchable#ancestor!](https://www.rubydoc.info/gems/ryoba/Nokogiri/XML/Searchable:ancestor%21)\n  - [Nokogiri::XML::Searchable#ancestors!](https://www.rubydoc.info/gems/ryoba/Nokogiri/XML/Searchable:ancestors%21)\n  - [Nokogiri::XML::Searchable#at!](https://www.rubydoc.info/gems/ryoba/Nokogiri/XML/Searchable:at%21)\n  - [Nokogiri::XML::Searchable#search!](https://www.rubydoc.info/gems/ryoba/Nokogiri/XML/Searchable:search%21)\n\n\n## Installation\n\nInstall the [`grubby` gem](https://rubygems.org/gems/grubby).\n\n\n## Contributing\n\nRun `rake test` to run the tests.\n\n\n## License\n\n[MIT License](LICENSE.txt)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathanhefner%2Fgrubby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjonathanhefner%2Fgrubby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathanhefner%2Fgrubby/lists"}