Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nathaneastwood/sparkts

sparklyr interface to the spark-ts package
https://github.com/nathaneastwood/sparkts

r sparklyr

Last synced: about 2 months ago
JSON representation

sparklyr interface to the spark-ts package

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#",
fig.path = "tools/images/README-"
)
library(sparkts)
```

# sparkts

[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/sparkts)](http://cran.r-project.org/package=sparkts)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

The goal of `sparkts` is to provide a test bed of `sparklyr` extensions for the [`spark-ts`](https://github.com/srussell91/SparkTS) framework which was modified from the [`spark-timeseries`](https://github.com/sryza/spark-timeseries) framework.

## Installation

You can install `sparkts` from GitHub with:

```{r installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("nathaneastwood/sparkts")
```

For details on how to set up for further developing the package, please see the development vignette.

## Example

This is a basic example which shows you how to calculate the standard error for some time series data:

```{r example, cache = TRUE, message = FALSE}
library(sparkts)

# Set up a spark connection
sc <- sparklyr::spark_connect(
master = "local",
version = "2.2.0",
config = list(sparklyr.gateway.address = "127.0.0.1")
)

# Extract some data
std_data <- spark_read_json(
sc,
"std_data",
path = system.file(
"data_raw/StandardErrorDataIn.json",
package = "sparkts"
)
) %>%
spark_dataframe()

# Call the method
p <- sdf_standard_error(
sc = sc, data = std_data,
x_col = "xColumn", y_col = "yColumn", z_col = "zColumn",
new_column_name = "StandardError"
)

p %>% dplyr::collect()

# Disconnect from the spark connection
spark_disconnect(sc = sc)
```