{"id":13823498,"url":"https://github.com/khuyentran1401/top-github-scraper","last_synced_at":"2025-08-01T18:11:05.154Z","repository":{"id":62584929,"uuid":"342754299","full_name":"khuyentran1401/top-github-scraper","owner":"khuyentran1401","description":"Scape top GitHub repositories and users based on keywords","archived":false,"fork":false,"pushed_at":"2023-06-27T15:13:38.000Z","size":463,"stargazers_count":86,"open_issues_count":0,"forks_count":25,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-07-31T05:21:45.116Z","etag":null,"topics":["github","github-api","python","scraping","web-scraper","web-scraping"],"latest_commit_sha":null,"homepage":"https://khuyentran1401.github.io/top-github-scraper/html/top_github_scraper/index.html","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/khuyentran1401.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-02-27T02:43:25.000Z","updated_at":"2025-07-26T10:43:12.000Z","dependencies_parsed_at":"2024-01-13T16:01:06.843Z","dependency_job_id":"cfbe017e-dbf9-49e1-9df8-3804adb0a9de","html_url":"https://github.com/khuyentran1401/top-github-scraper","commit_stats":{"total_commits":32,"total_committers":1,"mean_commits":32.0,"dds":0.0,"last_synced_commit":"c49c5a0692ffc9c95d3aea121d60a9444fd4061f"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/khuyentran1401/top-github-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khuyentran1401%2Ftop-github-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khuyentran1401%2Ftop-github-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khuyentran1401%2Ftop-github-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khuyentran1401%2Ftop-github-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/khuyentran1401","download_url":"https://codeload.github.com/khuyentran1401/top-github-scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khuyentran1401%2Ftop-github-scraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268272989,"owners_count":24223790,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["github","github-api","python","scraping","web-scraper","web-scraping"],"created_at":"2024-08-04T09:00:35.601Z","updated_at":"2025-08-01T18:11:05.119Z","avatar_url":"https://github.com/khuyentran1401.png","language":"HTML","readme":" [![Medium article](https://img.shields.io/badge/Medium-View%20on%20Medium-red?logo=medium)](https://pub.towardsai.net/top-github-scraper-scrape-top-github-users-and-repositories-based-on-a-keyword-in-one-line-of-code-d48b29954aac?sk=56eb80d2b436c8b901c6eec25d1dd6e6)\n# Top Github Scraper\n\nScrape top Github repositories and users based on keywords. \n\nI used this tool to [analyze the top 1k machine learning users](https://towardsdatascience.com/i-scraped-more-than-1k-top-machine-learning-github-profiles-and-this-is-what-i-found-1ab4fb0c0474?sk=68156d6b1c05614d356645728fe02584) and [create an interactive map to search for users](https://pub.towardsai.net/top-github-scraper-scrape-top-github-users-and-repositories-based-on-a-keyword-in-one-line-of-code-d48b29954aac?sk=56eb80d2b436c8b901c6eec25d1dd6e6) based on their location. \n\n![demo](https://github.com/khuyentran1401/top-github-scraper/blob/master/figures/demo.gif?raw=True)\n\n## Setup\n\n**Installation**\n```bash\npip install top-github-scraper\n```\n**Add Credentials**\n\nTo make sure you can scrape many repositories and users, add your GitHub's credentials to `.env` file.\n```bash\ntouch .env\n```\nAdd your username and [token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) to `.env` file:\n```bash\nGITHUB_USERNAME=yourusername\nGITHUB_TOKEN=yourtoken\n```\n## Usage\nView full documentation [here](https://khuyentran1401.github.io/top-github-scraper/html/top_github_scraper/index.html).\n\n### Get Top Github Repositories' URLs\n```python\nfrom top_github_scraper import get_top_repo_urls\n\nget_top_repo_urls(keyword=\"machine learning\", stop_page=10)\n```\n\nOutput at `top_repo_urls_\u003ckeyword\u003e_\u003csort_by\u003e_\u003cstart_page\u003e_\u003cend_page\u003e.json`:\n```python\n[\n    \"/josephmisiti/awesome-machine-learning\",\n    \"/wepe/MachineLearning\",\n    \"/udacity/machine-learning\",\n    \"/Jack-Cherish/Machine-Learning\",\n    \"/ZuzooVn/machine-learning-for-software-engineers\",\n    \"/rasbt/python-machine-learning-book\",\n    \"/lawlite19/MachineLearning_Python\",\n    \"/lazyprogrammer/machine_learning_examples\",\n    \"/trekhleb/homemade-machine-learning\",\n    \"/ujjwalkarn/Machine-Learning-Tutorials\"\n]\n```\n\n### Get Top Github Repositories' Information\n```python\nfrom top_github_scraper import get_top_repos\n\nget_top_repos(\"machine learning\", stop_page=10)\n```\nOutput for 1 repository at `top_repo_info_\u003ckeyword\u003e_\u003csort_by\u003e_\u003cstart_page\u003e_\u003cend_page\u003e.json` :\n```python\n{\n        \"stargazers_count\": 48620,\n        \"forks_count\": 12155,\n        \"contributors\": {\n            \"login\": [\n                \"josephmisiti\",\n                \"josephmmisiti\",\n                \"hslatman\",\n                \"0asa\",\n                \"ajkl\",\n                \"ipcenas\",\n                \"cogmission\",\n                \"spekulatius\",\n                \"basickarl\",\n                \"NathanEpstein\"\n            ],\n            \"url\": [\n                \"https://api.github.com/users/josephmisiti\",\n                \"https://api.github.com/users/josephmmisiti\",\n                \"https://api.github.com/users/hslatman\",\n                \"https://api.github.com/users/0asa\",\n                \"https://api.github.com/users/ajkl\",\n                \"https://api.github.com/users/ipcenas\",\n                \"https://api.github.com/users/cogmission\",\n                \"https://api.github.com/users/spekulatius\",\n                \"https://api.github.com/users/basickarl\",\n                \"https://api.github.com/users/NathanEpstein\"\n            ],\n            \"contributions\": [\n                671,\n                105,\n                21,\n                12,\n                11,\n                9,\n                8,\n                7,\n                7,\n                7\n            ]\n        }\n    }\n```\n\n### Get Top Github Contributors' Profiles\n```python\nfrom top_github_scraper import get_top_contributors\n\nget_top_contributors(\"machine learning\", stop_page=10)\n```\nOutput at `top_contributor_info_\u003ckeyword\u003e_\u003csort_by\u003e_\u003cstart_page\u003e_\u003cend_page\u003e.csv`:\n\n|| login | url | type | name | company | location | email | hireable | bio | public_repos | public_gists | followers |following\n| ------------- |:-------------:|:-------------:| :-----:| :-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|\n| 0 | josephmisiti | https://api.github.com/users/josephmisiti | User | Joseph Misiti | Math \u0026 Pencil |\"Brooklyn, NY\"|  | True | Mathematician \u0026 Co-founder of Math \u0026 Pencil|229|142|2705|275\n1|josephmmisiti|https://api.github.com/users/josephmmisiti|User|||||||0|0|2|0\n2|hslatman|https://api.github.com/users/hslatman|User|Herman Slatman|DistributIT|||||133|20|469|67\n3|0asa|https://api.github.com/users/0asa|User|Vincent Botta| | Belgium|||\"Innovation Engineer @evs-broadcast, previously Data Scientist @kensuio, E-Marketing Tools Manager @Diagenode, cofounder @Antibody-Adviser and photographer\"|35|15|25|16\n4|ajkl|https://api.github.com/users/ajkl|User|Ajinkya Kale|||kaleajinkya@gmail.com|||58|1|29|4\n5|ipcenas|https://api.github.com/users/ipcenas|User|||||||79|0|1|0\n6|cogmission|https://api.github.com/users/cogmission|User|David Ray||Third planet from the sun...|cognitionmission@gmail.com||Humanity's freedom and abundance through the pursuit of technological innovation in the area of cognitive applications - Cognition Mission|30|19|54|44\n7|spekulatius|https://api.github.com/users/spekulatius|User|Peter Thaleikis|@bringyourownideas |127.0.0.1||True|Software engineer focused on solutions using open source and simply filling in the gaps to fulfill the requirements.|42|1|232|920\n8|basickarl|https://api.github.com/users/basickarl|User|Karl Morrison||\"Malmö, Sweden\"|karl@basickarl.io||The question is: Will you take me seriously|5|1|12|6\n9|NathanEpstein|https://api.github.com/users/NathanEpstein|User|Nathan Epstein||\"New York, NY\"|nathanepst@gmail.com|True||23|12|208|0\n\n### Get Top Github Users' Profiles\n```python\nfrom top_github_scraper import get_top_users\n\nget_top_users(\"machine learning\", stop_page=10)\n```\nOutput at `top_user_info_\u003ckeyword\u003e_\u003cstart_page\u003e_\u003cend_page\u003e.csv`\n\n|| login | url | type | name | company | location | email | hireable | bio | public_repos | public_gists | followers |following\n| ------------- |:-------------:|:-------------:| :-----:| :-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|\n0|rasbt|https://api.github.com/users/rasbt|User|Sebastian Raschka|UW-Madison|\"Madison, WI\"|||\"Machine Learning researcher \u0026 open source contributor. Author of \"\"Python Machine Learning.\"\" Asst. Prof. of Statistics @ UW-Madison.\"|71|5|13888|35\n1|tqchen|https://api.github.com/users/tqchen|User|Tianqi Chen|\"CMU, OctoML\"||||Large scale Machine Learning|28|1|8611|126\n2|halfrost|https://api.github.com/users/halfrost|User|halfrost|@Alibaba | Shanghai China|i@halfrost.com||💪天道酬勤，勤能补拙。博观而约取，厚积而薄发。Gopher / Rustacean / iOS Dev. / Machine Learning / Retired acmer / Math / Philosophy / Technical Writer.|22|0|8566|314\n3|ageron|https://api.github.com/users/ageron|User|Aurélien Geron||Paris|||Author of the book Hands-On Machine Learning with Scikit-Learn and TensorFlow. Former PM of YouTube video classification and founder \u0026 CTO of a telco operator.|43|16|8383|2\n4|chiphuyen|https://api.github.com/users/chiphuyen|User|Chip Huyen|https://snorkel.ai|\"Mountain View, CA\"||True|Developing tools and best practices for machine learning production.|19|1|7839|15\n5|rhiever|https://api.github.com/users/rhiever|User|Randy Olson|FOXO BioScience|\"Vancouver, WA\"|rso@randalolson.com||\"Chief Data Scientist, @FOXOBioScience. AI, Machine Learning, and Data Visualization specialist. Community leader for /r/DataIsBeautiful.\"|77|17|5363|13\n6|lexfridman|https://api.github.com/users/lexfridman|User|Lex Fridman|MIT|\"Cambridge, MA\"|||\"AI researcher working on autonomous vehicles, human-robot interaction, and machine learning at MIT and beyond.\"|2|0|5031|0\n7|eriklindernoren|https://api.github.com/users/eriklindernoren|User|Erik Linder-Norén||\"Stockholm, Sweden\"|eriklindernoren@gmail.com||\"ML engineer at Apple. Excited about machine learning, basketball and building things.\"|24|0|3764|11\n8|roboticcam|https://api.github.com/users/roboticcam|User|A/Prof Richard Xu                 徐亦达教授|University of Technology Sydney|Sydney Australia|||\"I am an A/Professor in Machine Learning at UTS. manage a large research team of postdoc, PhD students close to 30 people\"|10|0|3561|0\n9|ogrisel|https://api.github.com/users/ogrisel|User|Olivier Grisel|Inria|\"Paris, France\"|olivier.grisel@ensta.org||Machine Learning Engineer a Inria Saclay (Parietal team).|174|93|3237|116\n\n### Parameters\n\nView a full list of paramters [here](./PARAMETERS.md).\n## How the Data is Scraped\n\n`top-github-scraper` scrapes the owners as well as the contributors of the top repositories that pop up in the search when searching for a specific keyword on GitHub.\n\n![image](https://github.com/khuyentran1401/top-github-scraper/blob/master/figures/machine_learning_results.png?raw=True)\n\nFor each user, `top-github-scraper` scrapes 16 data points:\n* `login`: username\n* `url`: URL of the user\n* `type`: Whether this account is a user or an organization\n* `name`: Name of the user\n* `company`: User's company\n* `location`: User's location\n* `email`: User's email\n* `hireable`: Whether the user is hireable\n* `bio`: Short description of the user\n* `public_repos`: Number of public repositories the user has (including forked repositories)\n* `public_gists`: Number of public repositories the user has (including forked gists)\n* `followers`: Number of followers the user has\n* `following`: Number of people the user is following\n\n","funding_links":[],"categories":["HTML"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkhuyentran1401%2Ftop-github-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkhuyentran1401%2Ftop-github-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkhuyentran1401%2Ftop-github-scraper/lists"}