{"id":17923826,"url":"https://github.com/nelsonfigueroa/github-email-scraper","last_synced_at":"2025-03-24T02:33:20.136Z","repository":{"id":129042171,"uuid":"420004841","full_name":"nelsonfigueroa/github-email-scraper","owner":"nelsonfigueroa","description":"Scrape contributor emails from a GitHub repository.","archived":false,"fork":false,"pushed_at":"2023-07-26T07:25:26.000Z","size":21,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-19T01:11:22.836Z","etag":null,"topics":["ruby","scrape","scraper"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nelsonfigueroa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-22T07:25:28.000Z","updated_at":"2024-01-24T12:09:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"ab446f5b-1fcf-418e-978e-3a31aee49b50","html_url":"https://github.com/nelsonfigueroa/github-email-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nelsonfigueroa%2Fgithub-email-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nelsonfigueroa%2Fgithub-email-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nelsonfigueroa%2Fgithub-email-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nelsonfigueroa%2Fgithub-email-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nelsonfigueroa","download_url":"https://codeload.github.com/nelsonfigueroa/github-email-scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245198923,"owners_count":20576469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ruby","scrape","scraper"],"created_at":"2024-10-28T20:45:42.162Z","updated_at":"2025-03-24T02:33:20.131Z","avatar_url":"https://github.com/nelsonfigueroa.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GitHub Email Scraper\n\nScrape contributor emails from a GitHub repository.\n\n## Note\n\nI found a better way of doing this with a CLI command instead of this Ruby script: https://nelson.cloud/scrape-contributor-emails-from-any-git-repository/\n\ntldr: just run this command within a git directory:\n\n```shell\ngit shortlog -sea | grep -E -o \"\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}\\b\" | awk '{print tolower($0)}' | sort | uniq | grep -wv 'users.noreply.github.com'\n```\n\n## Motivation\n\nI noticed that the GitHub API exposes email addresses used in commits. I wanted to see how easily someone could scrape for emails using a script. I talk about GitHub email scraping and protecting yourself in a [blog post](https://nelson.cloud/scraping-github-contributor-emails/).\n\n## Disclaimer\n\nThis was created for demonstrational purposes. What you do with this script or emails gathered is purely your responsibility.\n\n## Usage\n\nThis script uses the API endpoint as defined here: https://docs.github.com/en/rest/reference/repos#commits\n\n*Note that GitHub limits unauthenticated API calls to 60 per hour.*\n\n*Rate limting info: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting*\n\n\nYou'll need Ruby installed on your system. Then run:\n\n```\nruby main.rb -u \u003cgithub-username\u003e -r \u003cgithub-repository\u003e -p \u003ccommit-page\u003e\n```\n\nTo see instructions directly in the command line, run:\n\n```\n$ ruby main.rb -h\n\nUsage: example.rb [options]\n    -u, --username=USERNAME          Specify GitHub username\n    -r, --repository=REPOSTORY       Specify GitHub repository\n    -p, --page=PAGE                  Specify the commit page to begin scraping from\n```\n\n## Examples\n\nA regular scraping operation would look like this. If you do not specify `-p`, the scraper will begin from page 1. The rate limit will be exceeded on large repositories:\n\n```\n$ ruby main.rb -u torvalds -r linux\n\n\t+-------------------+\n\t|   GitHub          |\n\t|       Email       |\n\t|         Scraper   |\n\t+-------------------+\n\n\nScraping https://github.com/torvalds/linux/\nRate limit exceeded.\nPages scraped: 1-58 out of 10447\n43 emails written to torvalds-linux.txt\n\n```\n\nAn example that specifies the page of commits to begin scraping from:\n\n```\n$ ruby main.rb -u torvalds -r linux -p 100\n\n\t+-------------------+\n\t|   GitHub          |\n\t|       Email       |\n\t|         Scraper   |\n\t+-------------------+\n\n\nScraping https://github.com/torvalds/linux/\nRate limit exceeded.\nPages scraped: 100-159 out of 10447\n39 emails written to torvalds-linux.txt\n```\n\nAn example where the IP address is rate limited:\n\n```\n$ ruby main.rb -u torvalds -r linux\n\n\t+-------------------+\n\t|   GitHub          |\n\t|       Email       |\n\t|         Scraper   |\n\t+-------------------+\n\n\nScraping https://github.com/torvalds/linux/\nError, got status code 403\nResponse message:\nAPI rate limit exceeded for \u003cyour_IP\u003e. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnelsonfigueroa%2Fgithub-email-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnelsonfigueroa%2Fgithub-email-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnelsonfigueroa%2Fgithub-email-scraper/lists"}