{"id":19285978,"url":"https://github.com/chenjiandongx/github-spider","last_synced_at":"2025-04-13T06:41:41.293Z","repository":{"id":92058715,"uuid":"89732505","full_name":"chenjiandongx/Github-spider","owner":"chenjiandongx","description":"Github 仓库及用户分析爬虫","archived":false,"fork":false,"pushed_at":"2017-05-07T13:23:40.000Z","size":99,"stargazers_count":267,"open_issues_count":2,"forks_count":91,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-03-24T11:43:23.920Z","etag":null,"topics":["crawler","github","scrapy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chenjiandongx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-04-28T18:09:47.000Z","updated_at":"2025-02-15T21:23:02.000Z","dependencies_parsed_at":"2024-01-14T15:23:14.961Z","dependency_job_id":"3f14cc31-c4aa-49d3-9006-520c66d0dc8b","html_url":"https://github.com/chenjiandongx/Github-spider","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenjiandongx%2FGithub-spider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenjiandongx%2FGithub-spider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenjiandongx%2FGithub-spider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chenjiandongx%2FGithub-spider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chenjiandongx","download_url":"https://codeload.github.com/chenjiandongx/Github-spider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248675434,"owners_count":21143763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","github","scrapy"],"created_at":"2024-11-09T21:47:34.736Z","updated_at":"2025-04-13T06:41:41.272Z","avatar_url":"https://github.com/chenjiandongx.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Github 用户及仓库分析爬虫  \n\n### 爬虫介绍\n写完了 Stackoverflow 的爬虫，这回打算写 Github 的，利用 Scrapy 框架对 Github 用户和仓库信息进行爬取，图片利用管道下载。  \nGithub 是一个很棒的社区，这里可以找到很多优秀的项目，很多实用的库类，简直是 coder 的天堂，同时也是全球最大的同性交友社区？\n爬取的数据主要分为两大类， User 类 和 Repo 类 ，也就是针对用户情况和仓库信息\n\n## User 类\n\n先来看看 Github 全站 followers 人数 top10 都是哪些大犇\n\n| Avatar | User                            | Repos | Stars | Followers | Following |\n|--------|---------------------------------|-------|-------|-----------|-----------|\n|![](http://oog4yfyu0.bkt.clouddn.com/torvalds.jpg)      | https://github.com/torvalds     |     4 |     2 |   53500 |    0 |\n|![](http://oog4yfyu0.bkt.clouddn.com/JakeWharton.jpg)   | https://github.com/JakeWharton  |    93 |   213 |   34000 |   12 |\n|![](http://oog4yfyu0.bkt.clouddn.com/tj.jpg)            | https://github.com/tj           |   253 |  1700 |   27600 |   46 |\n|![](http://oog4yfyu0.bkt.clouddn.com/ruanyf.jpg)        | https://github.com/ruanyf       |    43 |   125 |   24700 |    0 |\n|![](http://oog4yfyu0.bkt.clouddn.com/addyosmani.jpg)    | https://github.com/addyosmani   |   292 |   732 |   24700 |  241 |\n|![](http://oog4yfyu0.bkt.clouddn.com/paulirish.jpg)     | https://github.com/paulirish    |   261 |   683 |   23300 |  239 |\n|![](http://oog4yfyu0.bkt.clouddn.com/mojombo.jpg)       | https://github.com/mojombo      |    61 |   121 |   20100 |   11 |\n|![](http://oog4yfyu0.bkt.clouddn.com/gaearon.jpg)       | https://github.com/gaearon      |   202 |  1100 |   17000 |  171 |\n|![](http://oog4yfyu0.bkt.clouddn.com/sindresorhus.jpg)  | https://github.com/sindresorhus |   877 |  2200 |   16900 |   40 |\n|![](http://oog4yfyu0.bkt.clouddn.com/daimajia.jpg)      | https://github.com/daimajia     |    60 |  2900 |   16400 |  236 |\n\nLinus 大神以压倒性的优势夺得第一名，说实在不知道 Linus 的真不好意思说自己是写代码的，这是信仰。然而大神还是很傲娇的，毕竟没有 following anybody，可能是强到了没朋友了吧，毕竟 ***talk is cheap, show me the code***。JakeWharton 以 34000+ 位于第二，以前也看过一点 Android 的东西，不知道说什么，膜拜吧。  \n\n中国区还是有两名种子选手挺进了 top10，阮一峰 和 代码家  \n\nGithub 的地区选项自由度很大，所以比较难统计出各国的注册账户的人数。China 关键字的有 77473 人，USA 关键字有 48667 人  \n \n那来了解一下国情，在国区的这 77473 人中，followers 人数 top10 如下  \n\n|  Avatar |         User    |\tFollowing | Followers |\n|---------|-----------------|-------------|-----------|\n|![](http://oog4yfyu0.bkt.clouddn.com/ruanyf.jpg)       |https://github.com/ruanyf\t    |  0\t |  25.2k  |\n|![](http://oog4yfyu0.bkt.clouddn.com/daimajia.jpg)     |https://github.com/daimajia\t|  236   |  16.5k  |\n|![](http://oog4yfyu0.bkt.clouddn.com/yyx990803.jpg)    |https://github.com/yyx990803\t|  89\t |  16.2k  |\n|![](http://oog4yfyu0.bkt.clouddn.com/michaelliao.jpg)  |https://github.com/michaelliao |  0\t |  12.4k  |\n|![](http://oog4yfyu0.bkt.clouddn.com/JacksonTian.jpg)  |https://github.com/JacksonTian\t|  145\t |  12.1k  |\n|![](http://oog4yfyu0.bkt.clouddn.com/Trinea.jpg)       |https://github.com/Trinea\t    |  37    |  11.9k  |\n|![](http://oog4yfyu0.bkt.clouddn.com/lifesinger.jpg)   |https://github.com/lifesinger\t|  12\t |  10k    |\n|![](http://oog4yfyu0.bkt.clouddn.com/stormzhang.jpg)   |https://github.com/stormzhang\t|  88    |  9.6k   |\n|![](http://oog4yfyu0.bkt.clouddn.com/cloudwu.jpg)      |https://github.com/cloudwu\t    |  1\t |  9.5k   |\n|![](http://oog4yfyu0.bkt.clouddn.com/onevcat.jpg)      |https://github.com/onevcat\t    |  120   |  9k     |\n\nvue.js 作者尤雨溪位列第三。廖雪峰紧跟其后排在第四，话说我也看过他的 Python 教程的\n\n\n个人仓库数量 top10，因为组织的话无法查看具体仓库数，所以就选取了个人的\n\n|               User                | Repos   |\n| ----------------------------------|-------  |\n| https://github.com/pombredanne    |  35.4k  |\n| https://github.com/gitter-badger  |  27.1k  |\n| https://github.com/carriercomm    |  18.8k  |\n| https://github.com/digideskio     |  16.9k  |\n| https://github.com/bestwpw        |  13.8k  |\n| https://github.com/modulexcite    |  10.7k  |\n| https://github.com/happyqq        |  9.1k   |\n| https://github.com/kleopatra999   |  8.2k   |\n| https://github.com/treejames      |  7.2k   |\n| https://github.com/carabina       |  7.2k   |  \n\n前两名都好多，项目数量都达到了 27k 以上，好强，他们是怎么办到的\n\n\n## Repo 类\n仓库的 stars top10  \n\n| Repo                                          | Fork      | Star      | Watch      |\n|-----------------------------------------------|-----------|-----------|------------|\n| https://github.com/freeCodeCamp/freeCodeCamp  |     11121 |    261439 |       7638 |\n| https://github.com/twbs/bootstrap             |     50468 |    109702 |       6833 |\n| https://github.com/vhf/free-programming-books |     20950 |     83871 |       6221 |\n| https://github.com/facebook/react             |     12036 |     65030 |       4402 |\n| https://github.com/d3/d3                      |     16709 |     63463 |       3171 |\n| https://github.com/getify/You-Dont-Know-JS    |      9232 |     57138 |       3279 |\n| https://github.com/sindresorhus/awesome       |      7113 |     57119 |       3787 |\n| https://github.com/angular/angular.js         |     27738 |     55503 |       4407 |\n| https://github.com/tensorflow/tensorflow      |     26135 |     54976 |       4968 |\n| https://github.com/robbyrussell/oh-my-zsh     |     12298 |     52575 |       1895 |\n  \n\n仓库的 forks top10  \n\n| Repo                                                  | Fork      | Star      | Watch      |\n|-------------------------------------------------------|-----------|-----------|------------|\n| https://github.com/jtleek/datasharing                 |    170171 |      3858 |        546 |\n| https://github.com/rdpeng/ProgrammingAssignment2      |    101258 |       469 |        117 |\n| https://github.com/octocat/Spoon-Knife                |     90787 |      9969 |        308 |\n| https://github.com/twbs/bootstrap                     |     50468 |    109702 |       6833 |\n| https://github.com/rdpeng/ExData_Plotting1            |     43190 |       136 |         18 |\n| https://github.com/angular/angular.js                 |     27738 |     55503 |       4407 |\n| https://github.com/rdpeng/RepData_PeerAssessment1     |     27072 |        57 |         17 |\n| https://github.com/tensorflow/tensorflow              |     26135 |     54976 |       4968 |\n| https://github.com/DataScienceSpecialization/courses  |     24094 |      2538 |        819 |\n| https://github.com/udacity/frontend-nanodegree-resume |     24044 |       706 |        118 |  \n\n两个 top10 中有多少个是重叠的呢，答案是 3 个  \n\n| Repo                                     | Star      | Fork      | Watch      |\n|------------------------------------------|-----------|-----------|------------|\n| https://github.com/twbs/bootstrap        |    109702 |     50468 |       6833 |\n| https://github.com/angular/angular.js    |     55503 |     27738 |       4407 |\n| https://github.com/tensorflow/tensorflow |     54976 |     26135 |       4968 |  \n\n那你知道两者的 top100 中有多少个是重叠的吗，答案是 51 个，top500 是 270 个\n\nforks 数超过 1000 的仓库共有 1586 个，看看各语言都有几个，选取排名前 10 的语言生成条形图  \n\n![](https://github.com/chenjiandongx/Github/blob/master/images/l_forks_1000.png)  \n\n再把维度扩大到 10000，共 41 个  \n\n![](https://github.com/chenjiandongx/Github/blob/master/images/l_forks_10000.png)\n\nJavaScript，Java，Python 基本上是稳居前 3 名，特别是 JavaScript，真是大红大紫，当然我大 Python 也是很有潜力的  \n\nstars 数超过 1000 的仓库有 10410 个  \n\n![](https://github.com/chenjiandongx/Github/blob/master/images/L_stars_1000.png)\n\n超过 10000 的 402 个  \n\n![](https://github.com/chenjiandongx/Github/blob/master/images/L_stars_10000.png)  \n\n各大语言的分布情况基本上和 forks 数是一致的。唯一不同的语言就是 HTML 换成了 CSS，不过也都差不多，这两门语言基本上都是不分家的  \n\n来看个有趣的排名，全站代码量 top3 的仓库  \n\n| Repo                                        |\n|---------------------------------------------|\n|https://github.com/opengapps/arm             |\n|https://github.com/kiang/data.fda.gov.tw     |\n|https://github.com/hanxiao/hanxiao.github.io |  \n\n\n### 了解一下 Python 的情况\nPython 仓库 stars 数 top10  \n\n| Repo                                                     | Fork      | Star      | Watch      |\n|----------------------------------------------------------|-----------|-----------|------------|\n| https://github.com/vinta/awesome-python                  |      6215 |     33163 |       2957 |\n| https://github.com/jakubroztocil/httpie                  |      1949 |     29302 |        856 |\n| https://github.com/pallets/flask                         |      8430 |     26618 |       1681 |\n| https://github.com/nvbn/thefuck                          |      1273 |     26200 |        554 |\n| https://github.com/rg3/youtube-dl                        |      4846 |     25453 |       1064 |\n| https://github.com/django/django                         |     10298 |     25208 |       1523 |\n| https://github.com/kennethreitz/requests                 |      4462 |     24600 |       1007 |\n| https://github.com/ansible/ansible                       |      7496 |     22732 |       1634 |\n| https://github.com/josephmisiti/awesome-machine-learning |      5320 |     21963 |       2221 |\n| https://github.com/scrapy/scrapy                         |      5338 |     20053 |       1430 |\n\nPython 仓库 forks 数 top10  \n\n| Repo                                                     | Fork      | Star      | Watch      |\n|----------------------------------------------------------|-----------|-----------|------------|\n| https://github.com/shadowsocks/shadowsocks               |     10533 |     17302 |       1520 |\n| https://github.com/django/django                         |     10298 |     25208 |       1523 |\n| https://github.com/scikit-learn/scikit-learn             |      9952 |     18159 |       1646 |\n| https://github.com/pallets/flask                         |      8430 |     26618 |       1681 |\n| https://github.com/ansible/ansible                       |      7496 |     22732 |       1634 |\n| https://github.com/udacity/fullstack-nanodegree-vm       |      6495 |       122 |         22 |\n| https://github.com/vinta/awesome-python                  |      6215 |     33163 |       2957 |\n| https://github.com/odoo/odoo                             |      6045 |      6481 |       1130 |\n| https://github.com/scrapy/scrapy                         |      5338 |     20053 |       1430 |\n| https://github.com/josephmisiti/awesome-machine-learning |      5320 |     21963 |       2221 |  \n\nshadowsocks 在 stars 里排不进 top10，居然在 forks 里勇夺第一了，这梯子圆了多少人的翻墙梦。另外一架梯子 XX-NET 很遗憾，两项都没挤进 top10，扎心了老铁  \n\n| Repo                                                     | Fork | Star | Watch |\n|----------------------------------------------------------|-----------|-----------|------------|\n| https://github.com/XX-net/XX-Net                         |   4682    |   13787   |       1343 |  \n\n老规矩，看看这两个 top10 交集部分，有 5 个，如下。（ 两个前 top100 中交集有 52 个 ）   \n\n| Repo                                                     | Star | Fork | Watch |\n|----------------------------------------------------------|-----------|-----------|------------|\n| https://github.com/django/django                         |     25208 |     10298 |       1523 |\n| https://github.com/pallets/flask                         |     26618 |      8430 |       1681 |\n| https://github.com/ansible/ansible                       |     22732 |      7496 |       1634 |\n| https://github.com/vinta/awesome-python                  |     33163 |      6215 |       2957 |\n| https://github.com/josephmisiti/awesome-machine-learning |     21963 |      5320 |       2221 |\n\n两大 web 框架 django 和 flask 的表现还是不负众望的，awesome 系列在每种语言里都很受欢迎  \n\n#### 谢谢观赏 (ง •̀_•́)ง  (,,• ₃ •,,)  \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchenjiandongx%2Fgithub-spider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchenjiandongx%2Fgithub-spider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchenjiandongx%2Fgithub-spider/lists"}