{"id":19050286,"url":"https://github.com/rogerluo410/gcrawler","last_synced_at":"2026-06-22T13:31:55.748Z","repository":{"id":59983048,"uuid":"540428745","full_name":"rogerluo410/gcrawler","owner":"rogerluo410","description":"Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.","archived":false,"fork":false,"pushed_at":"2022-09-27T02:54:31.000Z","size":42,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-03-04T10:33:35.077Z","etag":null,"topics":["crawler","crawling","google","ruby"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rogerluo410.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-09-23T12:26:28.000Z","updated_at":"2022-09-27T01:35:40.000Z","dependencies_parsed_at":"2022-09-25T12:20:53.327Z","dependency_job_id":null,"html_url":"https://github.com/rogerluo410/gcrawler","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/rogerluo410/gcrawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rogerluo410%2Fgcrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rogerluo410%2Fgcrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rogerluo410%2Fgcrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rogerluo410%2Fgcrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rogerluo410","download_url":"https://codeload.github.com/rogerluo410/gcrawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rogerluo410%2Fgcrawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34651747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-22T02:00:06.391Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","crawling","google","ruby"],"created_at":"2024-11-08T23:13:40.670Z","updated_at":"2026-06-22T13:31:55.726Z","avatar_url":"https://github.com/rogerluo410.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gcrawler\n\n[![Gem Version](https://badge.fury.io/rb/gcrawler.svg)](https://badge.fury.io/rb/gcrawler)\n[![Coverage Status](https://coveralls.io/repos/github/rogerluo410/gcrawler/badge.svg?branch=master)](https://coveralls.io/github/rogerluo410/gcrawler?branch=master)\n[![GitHub license](https://img.shields.io/github/license/rogerluo410/gcrawler)](https://img.shields.io/github/license/rogerluo410/gcrawler)\n\nGoogle search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'gcrawler'\n```\n\nAnd then execute:\n\n    $ bundle install\n\nOr install it yourself as:\n\n    $ gem install gcrawler\n\n## Usage\n\n```ruby\n    require 'gcrawler'\n\n    # Set proxy server, multiple IPs should be much safer than single IP.\n    proxies = [\n        { ip: '127.0.0.1', port: '7890' },\n        ...\n    ]\n\n    # Exclude the hosts from results' links.\n    exclude_hosts = [\n        'accounts.google.com',\n        'support.google.com'\n    ]\n\n    # Disable to search in the black domains.\n    black_domains = [\n        'www.google.at',\n        'www.google.bf'\n    ]\n\n    google_crawler = GoogleCrawler.new(\n        proxies: proxies, \n        black_domains: black_domains, \n        exclude_hosts: exclude_hosts\n    )\n\n    # Output: Mechanize::Page, see https://github.com/sparklemotion/mechanize\n    pp google_crawler.search_as_page('お肉とチーズの専門店', 'ミートダルマ札幌店')\n\n    # Output: [{text: , url:}, ...]\n    pp google_crawler.search_as_object('お肉とチーズの専門店', 'ミートダルマ札幌店', country: 'ja')\n\n    # Output: ['url1', 'url2', ...]\n    pp google_crawler.search_as_url('お肉とチーズの専門店', 'ミートダルマ札幌店', country: 'ja')\n\n    # Get the second page:\n    pp google_crawler.search_as_url('お肉とチーズの専門店', 'ミートダルマ札幌店', country: 'ja', start: 10)\n\n```\n\nFunction Input and Output definition: \n\n    search_as_page:\n        Args:\n            keywords (varargs): kw1, kw2, kw3, ...\n            language (str, optional): Query language. Defaults to nil.\n            num (uint, optional): Number of results per page(default is 10 per page). Defaults to nil.\n            start (int, optional): Offset. Defaults to 0.\n            country (str, optional): Query country, Defaults to None, example: countryCN or cn or CN.\n            pause (uint, optional): Set crawling delay seconds between two crawling requests. \n                                    Too short which may be forbidden by Google crawling monitor. \n                                    Defaults to 0.\n    \n        Return:\n            Mechanize::Page, see https://github.com/sparklemotion/mechanize\n\n  \n    search_as_url:\n        Args:\n            keywords (varargs): kw1, kw2, kw3, ...\n            language (str, optional): Query language. Defaults to nil.\n            num (uint, optional): Number of results per page(default is 10 per page). Defaults to nil.\n            start (int, optional): Offset. Defaults to 0.\n            country (str, optional): Query country, Defaults to None, example: countryCN or cn or CN.\n            pause (uint, optional): Set crawling delay seconds between two crawling requests. \n                                    Too short which may be forbidden by Google crawling monitor. \n                                    Defaults to 0.\n    \n        Return:\n            ['url1', 'url2', ...]\n\n    \n    search_as_object:\n        Args:\n            keywords (varargs): kw1, kw2, kw3, ...\n            language (str, optional): Query language. Defaults to nil.\n            num (uint, optional): Number of results per page(default is 10 per page). Defaults to nil.\n            start (int, optional): Offset. Defaults to 0.\n            country (str, optional): Query country, Defaults to None, example: countryCN or cn or CN.\n            pause (uint, optional): Set crawling delay seconds between two crawling requests. \n                                    Too short which may be forbidden by Google crawling monitor. \n                                    Defaults to 0.\n    \n        Return:\n            [{text: xxx, url: xxx}, ...]\n\n\n## Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/gcrawler. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/gcrawler/blob/master/CODE_OF_CONDUCT.md).\n\n## Inspiration\n\ngcrawler is greatly inspired by [Python version](https://github.com/howie6879/magic_google) for Ruby.\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n\n## Code of Conduct\n\nEveryone interacting in the Gcrawler project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/gcrawler/blob/master/CODE_OF_CONDUCT.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frogerluo410%2Fgcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frogerluo410%2Fgcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frogerluo410%2Fgcrawler/lists"}