{"id":21130472,"url":"https://github.com/serpapi/clauneck","last_synced_at":"2025-04-06T10:11:41.581Z","repository":{"id":179098793,"uuid":"662964050","full_name":"serpapi/clauneck","owner":"serpapi","description":"A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.","archived":false,"fork":false,"pushed_at":"2024-03-19T09:33:06.000Z","size":35,"stargazers_count":176,"open_issues_count":0,"forks_count":13,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-05T19:35:45.180Z","etag":null,"topics":["automation","command-line","command-line-tool","data-extraction","data-extractor","email","email-extract-with-proxy","email-extraction","email-extractor","email-marketing","email-scraper","open-source","ruby","rubygem","serp","social-media-scraper","web-crawling","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/serpapi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-06T09:05:53.000Z","updated_at":"2025-04-02T02:34:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"6a8018c3-2659-408f-b712-4d43647793ba","html_url":"https://github.com/serpapi/clauneck","commit_stats":null,"previous_names":["serpapi/clauneck"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serpapi%2Fclauneck","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serpapi%2Fclauneck/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serpapi%2Fclauneck/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serpapi%2Fclauneck/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/serpapi","download_url":"https://codeload.github.com/serpapi/clauneck/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393557,"owners_count":20931809,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","command-line","command-line-tool","data-extraction","data-extractor","email","email-extract-with-proxy","email-extraction","email-extractor","email-marketing","email-scraper","open-source","ruby","rubygem","serp","social-media-scraper","web-crawling","webscraping"],"created_at":"2024-11-20T05:34:01.707Z","updated_at":"2025-04-06T10:11:41.557Z","avatar_url":"https://github.com/serpapi.png","language":"Ruby","readme":"\u003ch1 align=\"center\"\u003eClauneck\u003c/h1\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n  \u003ca href=\"\"\u003e[![Gem Version][gem-shield]][gem-url]\u003c/a\u003e\n  \u003ca href=\"\"\u003e[![Contributors][contributors-shield]][contributors-url] \u003c/a\u003e\n  \u003ca href=\"\"\u003e[![Forks][forks-shield]][forks-url]\u003c/a\u003e\n  \u003ca href=\"\"\u003e[![Stargazers][stars-shield]][stars-url]\u003c/a\u003e\n  \u003ca href=\"\"\u003e[![Issues][issues-shield]][issues-url]\u003c/a\u003e\n  \u003ca href=\"\"\u003e[![Issues][issuesclosed-shield]][issuesclosed-url]\u003c/a\u003e\n  \u003ca href=\"\"\u003e[![MIT License][license-shield]][license-url]\u003c/a\u003e\n\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://user-images.githubusercontent.com/73674035/251452240-e80b12d7-0c7a-40fc-9cbc-bb3bcb7986a8.png\" alt=\"Clauneck Information Scraper\" width=\"50%\"/\u003e\n\u003c/p\u003e\n\n\n`Clauneck` is a Ruby gem designed to scrape specific information from a series of URLs, either directly provided or fetched from Google search results via [SerpApi's Google Search API](https://serpapi.com/search-api). It extracts and matches patterns such as email addresses and social media handles from the web pages, and stores the results in a CSV file.\n\nUnlike Google Chrome extensions that need you to visit webpages one by one, Clauneck excels in bringing the list of websites to you by leveraging [SerpApi’s Google Search API](https://serpapi.com/search-api).\n\n- [Cold Email Marketing with Open-Source Email Extractor](https://serpapi.com/blog/cold-email-marketing-with-open-source-email-extractor/): A Blog Post about the usecase of the tool\n\n---\n\n\n## The End Result\n\nThe script will write the results in a CSV file. If it cannot find any one of the information on a website, it will label it as `null`. For unknown errors happening in-between (connection errors, encoding errors, etc.) the fields will be filled with as `error`.\n\n\n| Website             | Information          | Type of Information |\n|---------------------|----------------------|-----------------|\n| serpapi.com     | `contact@serpapi.com`  | Email           |\n| serpapi.com     | `serpapicom`           | Instagram       |\n| serpapi.com     | `serpapicom`           | Facebook        |\n| serpapi.com     | `serp_api`             | Twitter         |\n| serpapi.com     | `null`                 | Tiktok          |\n| serpapi.com     | `channel/UCUgIHlYBOD3yA3yDIRhg_mg` | Youtube |\n| serpapi.com     | `serpapi`              | Github          |\n| serpapi.com     | `serpapi`              | Medium          |\n\n---\n\n## Prerequisites\nSince [SerpApi](https://serpapi.com) offers free credits that renew every month, and the user can access a list of free public proxies online, this tool’s pricing is technically free. You may extract data from approximately 10,000 pages (100 results in 1 page, and up to 100 pages) with a free account from [SerpApi](https://serpapi.com).\n\n- For collecting URLs to scrape, one of the following is required:\n  - SerpApi API Key: You may [Register to Claim Free Credits](https://serpapi.com/users/sign_up)\n  - List of URLs in a text document (The URLs should be Google web cache links that start with `https://webcache.googleusercontent.com`)\n- For scraping URLs, one of the following is required:\n  - List of Proxies in a text document (You may use public proxies. Only HTTP proxies are accepted.)\n  - Rotating Proxy IP\n\n---\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'clauneck'\n```\n\nAnd then execute:\n\n```\n$ bundle install\n```\n\nOr install it yourself as:\n\n```\n$ gem install clauneck\n```\n\n---\n\n## Basic Usage\n\nYou can use `Clauneck` as a command line tool or within your Ruby scripts. \n\n### Basic Command line usage\n\nIn the command line, use the `clauneck` command with options as follows:\n\n```\nclauneck --api_key YOUR_SERPAPI_KEY --output results.csv --q \"site:*.ai AND inurl:/contact OR inurl:/contact-us\"\n```\n\n### Basic Ruby script usage\n\nIn your Ruby script, call `Clauneck.run` method:\n\n```ruby\nrequire 'clauneck'\n\napi_key = \"\u003cSerpApi API Key\u003e\" # Visit https://serpapi.com/users/sign_up to get free credits.\nparams = {\n  \"q\": \"site:*.ai AND inurl:/contact OR inurl:/contact-us\"\n}\n\nClauneck.run(api_key: api_key, params: params)\n```\n\n---\n\n## Advanced Usage\n\n### Using Advanced Search Parameters\nYou can visit the Documentation for [SerpApi's Google Search API](https://serpapi.com/search-api) to get insight on which parameters you can use to construct searches.\n\n\u003cimg width=\"1470\" alt=\"image\" src=\"https://user-images.githubusercontent.com/73674035/251473233-4be601c1-846b-4ae6-bb65-4c45aa22667d.png\"\u003e\n\n### Using Advanced Search Operators\n\nGoogle allows different search operators in queries to be made. This enhances your abilty to customize your search and get more precise results. For example, this search query:\n`\"site:*.ai AND inurl:/contact OR inurl:/contact-us\"`\nwill search for websites ending with `.ai` and at `/contact` or `/contact-us` paths.\n\nYou may check out [Google Search Operators: The Complete List (44 Advanced Operators)](https://ahrefs.com/blog/google-advanced-search-operators/) for a list of more operators\n\n### Using Proxies for Scraping in a Text Document\nYou can utilize your own proxies for scraping web caches of the links you have acquired. Only HTTP proxies are accepted. The proxies should be in the following format\n```\nhttp://username:password@ip:port\nhttp://username:password@another-ip:another-port\n```\nor if they are public proxies:\n```\nhttp://ip:port\nhttp://another-ip:another-port\n```\n\nYou can add --proxy option in the command line to utilize the file:\n```\nclauneck --api_key YOUR_SERPAPI_KEY --proxy proxies.txt --output results.csv --q \"site:*.ai AND inurl:/contact OR inurl:/contact-us\"\n```\n\nor use the rotating proxy link directly:\n```\nclauneck --api_key YOUR_SERPAPI_KEY --proxy \"http://username:password@ip:port\" --output results.csv --q \"site:*.ai AND inurl:/contact OR inurl:/contact-us\"\n```\n\nYou may also use it in a script:\n```rb\napi_key = \"\u003cSerpApi API Key\u003e\" # Visit https://serpapi.com/users/sign_up to get free credits.\nparams = {\n  \"q\": \"site:*.ai AND inurl:/contact OR inurl:/contact-us\"\n}\nproxy = \"proxies.txt\"\n\nClauneck.run(api_key: api_key, params: params, proxy: proxy)\n```\n\nor directly use the rotating proxy link:\n```rb\napi_key = \"\u003cSerpApi API Key\u003e\" # Visit https://serpapi.com/users/sign_up to get free credits.\nparams = {\n  \"q\": \"site:*.ai AND inurl:/contact OR inurl:/contact-us\"\n}\nproxy = \"http://username:password@ip:port\"\n\nClauneck.run(api_key: api_key, params: params, proxy: proxy)\n```\n\nThe System IP Address will be used if no proxy is provided. The user can use System IP for small-scale projects. But it is not recommended.\n\n### Using Google Search URL to Scrape links with SerpApi\n\nInstead of providing search parameters, the user can directly feed a Google Search URL for the web cache links to be collected by [SerpApi's Google Search API](https://serpapi.com/search-api).\n\n### Using URLs to Scrape in a Text Document\n\nThe user may utilize their own list of URLs to be scraped. The URLs should start with `https://webcache.googleusercontent.com`, and be added to each line. For example:\n\n```\nhttps://webcache.googleusercontent.com/search?q=cache:LItv_3DO2N8J:https://serpapi.com/\u0026cd=10\u0026hl=en\u0026ct=clnk\u0026gl=cy\nhttps://webcache.googleusercontent.com/search?q=cache:_gaXFsYVmCgJ:https://serpapi.com/search-api\u0026cd=9\u0026hl=en\u0026ct=clnk\u0026gl=cy\n```\n\nYou can find cached links manually from Google Searches as shown below:\n\n![image](https://user-images.githubusercontent.com/73674035/251461862-5cc1e279-9d5c-4885-aebd-317512ae62ea.png)\n\n---\n\n## Options\n\n`Clauneck` accepts the following options:\n\n- `--api_key`: Your SerpApi key. It is required if you're not providing the `--urls` option.\n- `--proxy`: Your proxy file or proxy URL. Defaults to system IP if not provided.\n- `--pages`: The number of pages to fetch from Google using SerpApi. Defaults to `1`.\n- `--output`: The CSV output file where to store the results. Defaults to `output.csv`.\n- `--google_url`: The Google URL that contains the webpages you want to scrape. It should be a Google Search Results URL.\n- `--urls`: The URLs you want to scrape. If provided, the gem will not fetch URLs from Google.\n- `--help`: Shows the help message and exits.\n\n---\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/serpapi/clauneck.\n\n---\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n\n[gem-shield]: https://img.shields.io/gem/v/clauneck.svg\n[gem-url]: https://rubygems.org/gems/clauneck\n[contributors-shield]: https://img.shields.io/github/contributors/serpapi/clauneck.svg\n[contributors-url]: https://github.com/serpapi/clauneck/graphs/contributors\n[forks-shield]: https://img.shields.io/github/forks/serpapi/clauneck.svg\n[forks-url]: https://github.com/serpapi/clauneck/network/members\n[stars-shield]: https://img.shields.io/github/stars/serpapi/clauneck.svg\n[stars-url]: https://github.com/serpapi/clauneck/stargazers\n[issues-shield]: https://img.shields.io/github/issues/serpapi/clauneck.svg\n[issues-url]: https://github.com/serpapi/clauneck/issues\n[issuesclosed-shield]: https://img.shields.io/github/issues-closed/serpapi/clauneck.svg\n[issuesclosed-url]: https://github.com/serpapi/clauneck/issues?q=is%3Aissue+is%3Aclosed\n[license-shield]: https://img.shields.io/github/license/serpapi/clauneck.svg\n[license-url]: https://github.com/serpapi/clauneck/blob/master/LICENSE\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserpapi%2Fclauneck","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fserpapi%2Fclauneck","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserpapi%2Fclauneck/lists"}