Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ropensci-archive/datapkg
:no_entry: ARCHIVED :no_entry: Read and Write Data Packages
https://github.com/ropensci-archive/datapkg
r r-package rstats
Last synced: about 1 month ago
JSON representation
:no_entry: ARCHIVED :no_entry: Read and Write Data Packages
- Host: GitHub
- URL: https://github.com/ropensci-archive/datapkg
- Owner: ropensci-archive
- License: other
- Archived: true
- Created: 2016-04-26T10:13:39.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2022-05-10T13:53:28.000Z (over 2 years ago)
- Last Synced: 2024-08-13T07:15:37.418Z (5 months ago)
- Topics: r, r-package, rstats
- Language: R
- Homepage: https://docs.ropensci.org/datapkg
- Size: 40 KB
- Stars: 40
- Watchers: 11
- Forks: 6
- Open Issues: 11
-
Metadata Files:
- Readme: README-NOT.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - ropensci-archive/datapkg - :no_entry: ARCHIVED :no_entry: Read and Write Data Packages (R)
README
## Data Package in R
[![Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.](http://www.repostatus.org/badges/latest/inactive.svg)](http://www.repostatus.org/#inactive)
Data-packages is a [standard format](http://frictionlessdata.io/data-packages/) for describing meta-data for a collection of datasets. The package `datapkg` provides convenience functions for retrieving and parsing data packages in R. To install in R:
```r
library(devtools)
install_github("hadley/readr")
install_github("ropenscilabs/jsonvalidate")
install_github("ropenscilabs/datapkg")
```## Reading data
The `datapkg_read` function retrieves and parses data packages from a local or remote sources. A few example packages are available from the [datasets](https://github.com/datasets) and [testsuite-py](https://github.com/frictionlessdata/testsuite-py) repositories. The path needs to point to a directory on disk or git remote or URL containing the root of the data package.
```r
# Load client
library(datapkg)# Clone via git
cities <- datapkg_read("git://github.com/datasets/world-cities")# Same data but download over http
cities <- datapkg_read("https://raw.githubusercontent.com/datasets/world-cities/master")
```The output object contains data and metadata from the data-package, with actual datasets inside the `$data` field.
```r
# Package info
print(cities)# Open actual data in RStudio Viewer
View(cities$data[[1]])
```In the case of multiple datasets, each one is either referenced by index or, if available, by name (names are optional in data packages).
```r
# Package with many datasets
euribor <- datapkg_read("https://raw.githubusercontent.com/datasets/euribor/master")# List datasets in this package
names(euribor$data)
View(euribor$data[[1]])
```## Writing data
The package also has basic functionality to save a data frame into a data package and
update the `datapackage.json` file accordingly.```r
# Create new data package
pkgdir <- tempfile()
datapkg_write(mtcars, path = pkgdir)
datapkg_write(iris, path = pkgdir)# Read it back
mypkg <- datapkg_read(pkgdir)
print(mypkg$data$mtcars)
```From here you can modify the `datapackage.json` file with other metadata.
## Status
This package is work in progress. Current open issues:
- Make `readr` parse `0`/`1` values for booleans: [PR#406](https://github.com/hadley/readr/pull/406)
- Support "year only" dates (`%Y`). Not sure if this constituates a valid date actually: [PR#407](https://github.com/hadley/readr/pull/407)
- R and `readr` require to specify which strings are interepreted as missing values. Default are empty string `""` and `NA`. A similar property needs to be defined in the spec.
- It is unclear what to do with parsing errors, or if the fields in `datapackage.json` does not match the csv data. Examples: [s-and-p-500](https://github.com/datasets/s-and-p-500) and [currency-codes](https://raw.githubusercontent.com/frictionlessdata/testsuite-py/master/datasets/currency-codes)Features:
- Writing data packages from data frames.
[![rOpenSci](http://ropensci.org/public_images/github_footer.png)](http://ropensci.org)
[![OKFN](http://assets.okfn.org/p/labs/img/logo.png)](https://okfn.org)