https://github.com/anvaka/allgithub

Crawling github data
https://github.com/anvaka/allgithub

Last synced: about 1 month ago
JSON representation

Crawling github data

Host: GitHub
URL: https://github.com/anvaka/allgithub
Owner: anvaka
License: mit
Created: 2015-05-13T04:38:11.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2024-02-25T23:06:07.000Z (over 1 year ago)
Last Synced: 2025-05-07T21:09:52.740Z (about 1 month ago)
Language: JavaScript
Size: 22.5 KB
Stars: 29
Watchers: 3
Forks: 9
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-viz - Crawler for GitHub - Crawling github data for Software Galaxies visualisation ([↑](#contents) Graph Datasets)

README

# allgithub

Crawling github data for https://github.com/anvaka/pm/

# usage

## Prerequisites:

1. Make sure redis is installed and running on default port
2. [Register github token](https://help.github.com/articles/creating-an-access-token-for-command-line-use/)
and set it into `GH_TOKEN` environment variable.
3. Install the crawler:

```
git clone https://github.com/anvaka/ghcrawl
cd ghcrawl
npm i
```

Now we are ready to index.

## Find all users with more than 2 followers

This will use a search API and will go through all users on GitHub who have more
than two followers. At
the moment there are [more than 400k users](https://github.com/search?q=followers%3A%3E2&type=Users&utf8=%E2%9C%93).

Each search request can return up to 100 records per page, which gives us
`400,000 / 100 = 4,000` requests to make. Search API is rate limited at 30
requests per minute. Which means the indexing will take `4,000/30 = 133` -
more than two hours:

```
node findUsersWithFollowers.js
```

## Find all followers

Now that we have all users who have more than two followers, let's index
those followers. Bad news we will have to make one request per user.
Good news, rate limit is 5,000 requests per hour, which gives us estimated
amount of work: `400,000/5,000 = 80` - more than 80 hours of work:

```
node indexUserFollowers.js
```

## Time to get the graph

Now that we have all users indexed, we can construct the graph:

```
node makeFollowersGraph.js > github.dot
```

# Layout

Convert graph to binary format:

```
node --max-old-space-size=4096 ./toBinary.js
```

Then use [ngraph.native](https://github.com/anvaka/ngraph.native) for faster
graph layout.

# license

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anvaka/allgithub

Awesome Lists containing this project

README