Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/madnight/githut
Github Language Statistics
https://github.com/madnight/githut
bigquery dataset functional-reactive-programming github-language-statistics github-pages-website jamstack languages programming-languages react react-hooks serverless sql-query statistics
Last synced: 7 days ago
JSON representation
Github Language Statistics
- Host: GitHub
- URL: https://github.com/madnight/githut
- Owner: madnight
- License: agpl-3.0
- Created: 2016-10-09T01:57:53.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-04-03T10:10:56.000Z (10 months ago)
- Last Synced: 2025-01-12T09:04:55.139Z (14 days ago)
- Topics: bigquery, dataset, functional-reactive-programming, github-language-statistics, github-pages-website, jamstack, languages, programming-languages, react, react-hooks, serverless, sql-query, statistics
- Language: JavaScript
- Homepage: https://madnight.github.io/githut
- Size: 38.4 MB
- Stars: 980
- Watchers: 25
- Forks: 129
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- my-awesome - madnight/githut - reactive-programming,github-language-statistics,github-pages-website,jamstack,languages,programming-languages,react,react-hooks,serverless,sql-query,statistics pushed_at:2024-04 star:1.0k fork:0.1k Github Language Statistics (JavaScript)
README
GitHub Language Statistics
## Data Generation
### Languages
Get language top list for Github
```SQL
SELECT language.name, COUNT(language.name)
AS count FROM [bigquery-public-data:github_repos.languages]
group by language.name order by count DESC
```Result of first 10 from 322
```JavaScript
{"language_name":"JavaScript","count":"1006022"}
{"language_name":"CSS","count":"745573"}
{"language_name":"HTML","count":"663315"}
{"language_name":"Shell","count":"593461"}
{"language_name":"Python","count":"492715"}
{"language_name":"Ruby","count":"365413"}
{"language_name":"Java","count":"340622"}
{"language_name":"PHP","count":"328907"}
{"language_name":"C","count":"286272"}
{"language_name":"C++","count":"267552"}
...
```
### Licenses
Get license top list for Github
```SQL
SELECT license, COUNT(license)
AS count FROM [bigquery-public-data:github_repos.licenses]
group by license order by count DESC
```Full result
```JavaScript
{"license":"mit","count":"1551711"}
{"license":"apache-2.0","count":"455316"}
{"license":"gpl-2.0","count":"376453"}
{"license":"gpl-3.0","count":"284761"}
{"license":"bsd-3-clause","count":"161041"}
{"license":"bsd-2-clause","count":"57412"}
{"license":"unlicense","count":"43899"}
{"license":"lgpl-3.0","count":"38213"}
{"license":"agpl-3.0","count":"38034"}
{"license":"cc0-1.0","count":"28600"}
{"license":"epl-1.0","count":"24074"}
{"license":"lgpl-2.1","count":"23872"}
{"license":"isc","count":"17690"}
{"license":"mpl-2.0","count":"17421"}
{"license":"artistic-2.0","count":"9413"}
```### Pull Requests
Get the number of Pull Requests per day/month/year
```SQL
SELECT language as name, year, quarter, count FROM ( SELECT * FROM (
SELECT lang as language, y as year, q as quarter, type,
COUNT(*) as count FROM (SELECT a.type type, b.lang lang, a.y y, a.q q FROM (
SELECT type, actor.login, YEAR(created_at) as y, QUARTER(created_at) as q,
STRING(REGEXP_REPLACE(repo.url, r'https:\/\/github\.com\/|https:\/\/api\.github\.com\/repos\/', '')) as name
FROM [githubarchive:month.201901] WHERE NOT LOWER(actor.login) LIKE "%bot%") a
JOIN ( SELECT repo_name as name, lang FROM ( SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY repo_name ORDER BY lang) as num FROM (
SELECT repo_name, FIRST_VALUE(language.name) OVER (
partition by repo_name order by language.bytes DESC) AS lang
FROM [bigquery-public-data:github_repos.languages]))
WHERE num = 1 order by repo_name)
WHERE lang != 'null') b ON a.name = b.name)
GROUP by type, language, year, quarter
order by year, quarter, count DESC)
WHERE count >= 100) WHERE type = 'PullRequestEvent'
```### Manual
Googles BigQuery is free for public datasets like Github, Reddit or Stackoverflow. It is limited to 1000 GB query volume per month. One of the querys above takes about 50-200 MB query volume. The public dataset for Github is available here: https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=samples&t=github_nested&page=table### URL Schema
```
madnight.github.io/githut/#/pull_requests/2021/1/Python,Lua,JavaScript
▲ ▲ ▲ ▲
│ │ │ │
pull_requests ───┘ year ─┘ │ └─ languages
pushes └─ quarter
stars
issues
```### BibTeX
If you wish to quote, you may use the following BibTeX.
```
@misc{githuttwo,
author = {Fabian Beuke},
title = {GitHut 2.0: GitHub Language Statistics},
year = {2023},
note = {GitHub repository},
howpublished = {\url{https://madnight.github.io/githut/#/}}
}
```