https://github.com/hadley/r-on-github
An exploration of R code and package on github, using the github search and repo apis
https://github.com/hadley/r-on-github
Last synced: 5 months ago
JSON representation
An exploration of R code and package on github, using the github search and repo apis
- Host: GitHub
- URL: https://github.com/hadley/r-on-github
- Owner: hadley
- License: mit
- Created: 2013-07-25T17:50:05.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2013-10-30T16:48:04.000Z (over 11 years ago)
- Last Synced: 2024-11-01T10:43:11.474Z (5 months ago)
- Language: R
- Size: 17.4 MB
- Stars: 54
- Watchers: 8
- Forks: 15
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - hadley/r-on-github - An exploration of R code and package on github, using the github search and repo apis (R)
README
# R on github
This project uses githubs code and repository api to collect information about
all github repositories that use R.## Getting the data
Getting all repo data is a two step process:
* `1a-search.r` runs searches to find all R language repos. Searches are done by
month to overcome the current search API limits, and are cached in the
`cache/` directory* `1b-repos.r` takes each repo found by the search and creates a list with 5
components:* `info`: general information about the repository.
http://developer.github.com/v3/repos/#get
* `lng`: languages used in the repo.
http://developer.github.com/v3/repos/#list-languages
* `dir`: a directory listing of all files and directories in the repo root.
http://developer.github.com/v3/repos/contents/#get-contents
* `desc`: if a `DESCRIPTION` file is found, the result of parsing that
file with `read.dcf` and converting into a list
* `tags`: any tags used by the repo.
http://developer.github.com/v3/repos/#list-tags
The data on each repo is cached in `cache-repo/`.
To update repos for the current month, `source("1a-search.r")`, then
`source("1b-repos.r")`. You'll need to set your github user name and password
into environment variables `GITHUB_USER` and `GITHUB_PASS`. All requests are
appropriately throttled to stay within github's rate limits - this means that
downloading all repo info from scratch will take a number of hours.## Exploring the data
If you just want to use the already cached data, see `2-languages.r` and
`2-packages.r` for example exploratory analyses.