https://github.com/thibaudcolas/commits-analysis
https://github.com/thibaudcolas/commits-analysis
git github statistical-analysis
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/thibaudcolas/commits-analysis
- Owner: thibaudcolas
- Created: 2016-02-08T16:19:41.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-09-11T19:58:09.000Z (almost 10 years ago)
- Last Synced: 2025-11-12T01:06:01.774Z (7 months ago)
- Topics: git, github, statistical-analysis
- Language: HTML
- Size: 59.6 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
commits-analysis
================
> An analysis of code commits on GitHub over a year.
## Workflow
### 1. Retrieve a list of all of the user's repositories.
>In this section we are retrieving **all** repositories the user has access to, instead of just the public ones from the account. There are simple, unauthentified API calls to retrieve a list of a user's public repositories.
- Generate a token: https://github.com/settings/tokens
- Basic auth access: https://developer.github.com/v3/auth/#basic-authentication
- List your repositories: https://developer.github.com/v3/repos/#list-your-repositories
- Pagination notes: https://developer.github.com/v3/#pagination
Required token authorizations:
- `repo` – Full control
- `read:org` – Read org and team membership
```sh
# First set your basic auth parameters. Those are kept in a global variable to prevent them being committed anywhere.
export GH_AUTH="login:token"
touch data/repositories.json
# Start retrieving as many pages as necessary.
curl -u $GH_AUTH "https://api.github.com/user/repos?per_page=100&page=1" >> data/repositories.json
# When do you stop? When curl only retrieves a couple of bytes, like:
# % Total % Received % Xferd Average Speed Time Time Time Current
# Dload Upload Total Spent Left Speed
# 100 2 100 2 0 0 2 0 0:00:01 --:--:-- 0:00:01 2
# Concatenate the JSON arrays with sed:
sed -i -e 's/\]\[/,/g' data/repositories.json
sed -i -e 's/},\]/}\]/g' data/repositories.json
```
### 2. Retrieve commit activity for each repository
- The API endpoint we use: https://developer.github.com/v3/repos/statistics/#get-the-last-year-of-commit-activity-data
```sh
node retrieve-stats.js
# If the fails are empty, run this again multiple times.
```
The account to retrieve data for and the type of repository to get can be configured within `retrieve-stats.js`:
```js
const USER = 'thibaudcolas';
const API_DOMAIN = 'https://api.github.com';
const allRepos = require('./data/repositories.json');
const filteredRepos = _.filter(allRepos, { fork: false, private: false });
```
### 3. Processing
Now we have the data and it's time to do something with it!
```sh
node processing.js
```