https://github.com/dehowell/catalogr
Manage a catalogue of datasets in AWS with R.
https://github.com/dehowell/catalogr
Last synced: 4 months ago
JSON representation
Manage a catalogue of datasets in AWS with R.
- Host: GitHub
- URL: https://github.com/dehowell/catalogr
- Owner: dehowell
- License: mit
- Archived: true
- Created: 2018-04-17T12:45:48.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-04-25T12:44:31.000Z (almost 7 years ago)
- Last Synced: 2024-08-13T07:11:34.087Z (8 months ago)
- Language: R
- Size: 7.81 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - dehowell/catalogr - Manage a catalogue of datasets in AWS with R. (R)
README
# catalogr
Manage a catalogue of datasets in AWS with R.
```r
library(catalogr)# Initialize `catalogr` with the name of your dataset bucket and AWS profile.
initialize(bucket = 'us-east-1-datasets-example', profile = 'default')# Write a dataset.
write_dataset(mtcars)# List defined datasets.
datasets()
# [1] "mtcars"# Read dataset from S3 into memory.
df <- read_dataset("mtcars")
```## How does it work?
`catalogr` stores date-stamped versions of data sets under a prefix named after the dataset in S3.
dataset_name/yyyymmdd-dataset_name.feather
It prefers the feather format unless you specify CSV. When you read the dataset, the most-recent version is returned.