An open API service indexing awesome lists of open source software.

https://github.com/alextanhongpin/github-scraper

Scrapes Github User's data from Malaysia for data mining and analytics purpose
https://github.com/alextanhongpin/github-scraper

babel github nedb nodejs onion-architecture scraper typescript

Last synced: 3 months ago
JSON representation

Scrapes Github User's data from Malaysia for data mining and analytics purpose

Awesome Lists containing this project

README

        

# github-scraper

Scrapes public repos based on a particular keyword or location.

## Issues

```bash
{"message":"Only the first 1000 search results are available","documentation_url":"https://developer.github.com/v3/search/"}
```

## Installation

First, you have to install [Yarn](https://yarnpkg.com/lang/en/docs/install/). Then:

```bash
# This will install all dependencies from package.json
$ yarn install

# We use foreman to load the environment variables from `.env` file.
# This is important to prevent accidental commit of sensitive data to github
$ yarn global add foreman
```

## Add/Remove packages

```bash
$ yarn add
$ yarn add --dev
$ yarn remove
```

## Environment

For development, store all the environment variable in the `.env` file. This will be included in `.gitignore` so that it will not be commited to github.
Make sure you create the `.env` file or the service will not run.

The `.env` should contain the following:

```bash
# Create a personal access token from Github.
# It should contain the minimum scope repo::public_repo and user::read:user
ACCESS_TOKEN=
```

## Start

```bash
# If you do not have `foreman` installed globally
$ yarn global add foreman

# or
$ npm i -g foreman

# Start
$ nf start
```

## API Calls

### Analytics Endpoint

`GET /analytics?type=`:

- user_count
- user_count_by_years
- repo_count
- leaderboard_last_updated_repos
- leaderboard_most_stars_repos
- leaderboard_most_watchers_repos
- leaderboard_most_repos
- leaderboard_most_repos_by_language
- leaderboard_languages

`GET /analytics/profiles?login=`

### Users Endpoint

`GET /users/`

### Repos Endpoint

`GET /repos/`

## TODO

- [ ] ensure only unique repos for a particular user are added (no duplications)
- [ ] check for language-agnostic storage solution