{"id":20061664,"url":"https://github.com/tylerrick/scraper","last_synced_at":"2026-05-07T23:46:16.220Z","repository":{"id":1590783,"uuid":"2104988","full_name":"TylerRick/scraper","owner":"TylerRick","description":"A ruby scraping library using Mechanize","archived":false,"fork":false,"pushed_at":"2011-08-31T18:02:47.000Z","size":92,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-12T22:28:30.848Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TylerRick.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-07-26T04:05:14.000Z","updated_at":"2013-10-09T08:06:23.000Z","dependencies_parsed_at":"2022-07-16T18:30:28.557Z","dependency_job_id":null,"html_url":"https://github.com/TylerRick/scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TylerRick%2Fscraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TylerRick%2Fscraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TylerRick%2Fscraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TylerRick%2Fscraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TylerRick","download_url":"https://codeload.github.com/TylerRick/scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241488198,"owners_count":19970829,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T13:21:26.772Z","updated_at":"2026-05-07T23:46:11.183Z","avatar_url":"https://github.com/TylerRick.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"Scraper\n=======\n\nGetting started\n---------------\n\nAdd to your Gemfile:\n\n    gem 'scraper', :git =\u003e 'git://github.com/TylerRick/scraper.git'\n\nSubclass `Scraper::Page` and provide, at a minimum, a `process` and `continue` method.\n\nExample:\n\n    class ThingPage \u003c Scraper::Page\n      attr_reader :thing\n\n      def process_page\n        thing_id = doc.at('#thing_id').try(:inner_text) or raise UnexpectedPageStructureError.new(\"Couldn't find thing_id\")\n        @thing = Thing.find_by_thing_id(thing_id) || Thing.new(thing_id: thing_id)\n\n        get_name\n        get_url\n\n        save_record\n      end\n\n      def continue\n        doc.search('#children_things a').select do |a|\n          a['href'] =~ %r(^/things/)\n        end.each do |a|\n          crawl_child ThingPage, a['href']\n        end\n      end\n    end\n\n`parent` will automatically be available to the next `Page` object when you use `crawl_child`.\n\nTo start crawling:\n\n    ThingPage.new(url).crawl\n\nMotivation\n----------\n\nAfter looking at the state of the other existing Ruby scraping libraries, I decided none of them really did what I needed. So I extracted some patterns from some of the existing scrapers I've written with Mechanize and Nokogiri and this library was born!\n\nOther libraries I looked at:\n* **scrubyt** (no longer maintained, doesn't even run on Ruby 1.9, but otherwise looked interesting)\n* **scrapi** (nice DSL in some ways, but in the end, seemed like too much sugar and not enough meat; it was hard to figure out how to do anything beyond their simple examples; it didn't seem like it could help me do what I was trying to do; and it didn't use Nokogiri)\n\n\nLicense\n-------\n\nThis is free software available under the terms of the MIT license.\n\nTo do\n-----\n\n* Write tests\n* etc.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftylerrick%2Fscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftylerrick%2Fscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftylerrick%2Fscraper/lists"}