https://github.com/alextanhongpin/github-scraper
Scrapes Github User's data from Malaysia for data mining and analytics purpose
https://github.com/alextanhongpin/github-scraper
babel github nedb nodejs onion-architecture scraper typescript
Last synced: 3 months ago
JSON representation
Scrapes Github User's data from Malaysia for data mining and analytics purpose
- Host: GitHub
- URL: https://github.com/alextanhongpin/github-scraper
- Owner: alextanhongpin
- License: other
- Created: 2017-11-05T17:28:09.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-05-27T12:19:42.000Z (about 7 years ago)
- Last Synced: 2025-01-29T21:54:20.426Z (4 months ago)
- Topics: babel, github, nedb, nodejs, onion-architecture, scraper, typescript
- Language: TypeScript
- Size: 332 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# github-scraper
Scrapes public repos based on a particular keyword or location.
## Issues
```bash
{"message":"Only the first 1000 search results are available","documentation_url":"https://developer.github.com/v3/search/"}
```## Installation
First, you have to install [Yarn](https://yarnpkg.com/lang/en/docs/install/). Then:
```bash
# This will install all dependencies from package.json
$ yarn install# We use foreman to load the environment variables from `.env` file.
# This is important to prevent accidental commit of sensitive data to github
$ yarn global add foreman
```## Add/Remove packages
```bash
$ yarn add
$ yarn add --dev
$ yarn remove
```## Environment
For development, store all the environment variable in the `.env` file. This will be included in `.gitignore` so that it will not be commited to github.
Make sure you create the `.env` file or the service will not run.The `.env` should contain the following:
```bash
# Create a personal access token from Github.
# It should contain the minimum scope repo::public_repo and user::read:user
ACCESS_TOKEN=
```## Start
```bash
# If you do not have `foreman` installed globally
$ yarn global add foreman# or
$ npm i -g foreman# Start
$ nf start
```## API Calls
### Analytics Endpoint
`GET /analytics?type=`:
- user_count
- user_count_by_years
- repo_count
- leaderboard_last_updated_repos
- leaderboard_most_stars_repos
- leaderboard_most_watchers_repos
- leaderboard_most_repos
- leaderboard_most_repos_by_language
- leaderboard_languages`GET /analytics/profiles?login=`
### Users Endpoint
`GET /users/`
### Repos Endpoint
`GET /repos/`
## TODO
- [ ] ensure only unique repos for a particular user are added (no duplications)
- [ ] check for language-agnostic storage solution