https://github.com/prayerslayer/gh-license-scraper
Go through public repositories and fetch their licenses.
https://github.com/prayerslayer/gh-license-scraper
Last synced: 4 months ago
JSON representation
Go through public repositories and fetch their licenses.
- Host: GitHub
- URL: https://github.com/prayerslayer/gh-license-scraper
- Owner: prayerslayer
- License: mit
- Created: 2015-03-13T22:51:32.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2015-03-14T20:17:35.000Z (almost 11 years ago)
- Last Synced: 2024-04-15T02:57:48.008Z (over 1 year ago)
- Language: JavaScript
- Size: 152 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Github License Scraper
A tool to fetch licenses from Github repositories.
## What does it do?
1. Get repositories from Github using one of two available strategies
2. Put subset of information into CSV file
## Prerequisites
* Node 0.10.x or better
* A Github [Access Token](https://help.github.com/articles/creating-an-access-token-for-command-line-use/) that can access private repositories. I.e. the `repo` scope is sufficient.
## Installation and usage
Clone this repository. Then run `node app.js` with the necessary parameters. Stop it using SIGINT, i.e. `ctrl+c`.
## Parameters
* **`token`**: The Access Token. Mandatory.
* `out`: The file to write to. Defaults to `repos.csv`.
* `timeout`: Timeout (in ms) between calls, you only have 5000 per hour. Defaults to 20 seconds.
* `strategy`: Either `popular` or `sample`. Defaults to `popular`.
* `size`: How many repositories do you want to have? Defaults to 10K. Only applicable with `strategy=sample`.
* `pool`: How many repositories to consider? Defaults to roughly 32M (the id of this repo). Only applicable with `strategy=sample`.
* `page`: The page to start from. Only applicable with `strategy=popular`. Defaults to 0.
* `before`: Consider repositores created before `before`. Only applicable with `strategy=popular`. Defaults to `2015-01-01`.
## Strategies
There are two strategies available:
### Popular
Uses the Search API to get the most poular (= most starred) repositories created before a certain date (see parameters above). As of March 2015 the Github Search API will return maximum 1000 results.
### Sample
Take a random sample of repositories (defaults to 10K out of 32M).
## Example
node app.js --token abcdefgh --before 2015-01-03 --out data.csv