https://github.com/hrbrmstr/sparrow
Temporary Shorcut For Reading Arrow/Parquet Bits Into R via 'reticulate'
https://github.com/hrbrmstr/sparrow
arrow pandas-dataframe parquet r rstats
Last synced: 8 months ago
JSON representation
Temporary Shorcut For Reading Arrow/Parquet Bits Into R via 'reticulate'
- Host: GitHub
- URL: https://github.com/hrbrmstr/sparrow
- Owner: hrbrmstr
- Created: 2018-05-02T21:11:55.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-05-02T21:12:06.000Z (over 7 years ago)
- Last Synced: 2024-12-25T04:24:39.455Z (12 months ago)
- Topics: arrow, pandas-dataframe, parquet, r, rstats
- Language: R
- Homepage:
- Size: 11.7 KB
- Stars: 15
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
Awesome Lists containing this project
README
---
output: rmarkdown::github_document
---
# sparrow
Temporary Shorcut For Reading Arrow/Parquet Bit Into R via 'reticulate'
## Description
Work is being done to make Parquet/Arrow a first-class R citizen
but -- until then -- I don't always want a Drill server round trip just
to read in some data and same goes for firing up a Spark instance (srsly).
So, this is a quick hack until the R packages are done.
## NOTE
**Requires** Python 3.5+, `pyarrow` and `pandas`.
## What's Inside The Tin
The following functions are implemented:
- `read_parquet`: Read in data from Parquet into an R data frame via 'reticulate'
## Installation
```{r eval=FALSE}
devtools::install_github("hrbrmstr/sparrow")
```
```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```
## Usage
```{r message=FALSE, warning=FALSE, error=FALSE}
library(sparrow)
# current verison
packageVersion("sparrow")
```
```{r cache=TRUE}
read_parquet("/tmp/honeypot.parquet")
```
```{r cache=TRUE}
read_parquet("/tmp/honeypot.parquet", c("src", "duration"))
```