Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thiyangt/tsdataleaks

R Package for detecting data leakages in time series forecasting competitions.
https://github.com/thiyangt/tsdataleaks

Last synced: 8 days ago
JSON representation

R Package for detecting data leakages in time series forecasting competitions.

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# tsdataleaks

![CRAN status](https://www.r-pkg.org/badges/version/tsdataleaks)](https://CRAN.R-project.org/package=tsdataleaks)

R Package for detecting data leakages in time series forecasting competitions.

## Installation

The development version from [GitHub](https://github.com/) with:

```r
install.packages("tsdataleaks")
library(tsdataleaks)
```

or

``` r
# install.packages("devtools")
devtools::install_github("thiyangt/tsdataleaks")
library(tsdataleaks)
```
## Example

To demonstrate the package functions, I created a small data set with 4 time series.

```{r example, comment=NA, warning=FALSE, message=FALSE}
set.seed(2020)
a <- rnorm(15)
d <- rnorm(10)
lst <- list(
a = a,
b = c(a[10:15]+rep(8,6), rnorm(10), a[1:5], a[1:5]),
c = c(rnorm(10), -a[1:5]),
d = d,
e = d)

```

## `find_dataleaks`: Exploit data leaks

```{r, comment=NA, message=FALSE, warning=FALSE}
library(tsdataleaks)
library(magrittr)
library(tidyverse)
library(viridis)
# h - I assume test period length is 5 and took that as wind size, h.
f1 <- find_dataleaks(lstx = lst, h=5, cutoff=1)
f1
```

Interpretation: The first element in the list means the last 5 observations of the time series `a` correlates with time series `b` observarion from 2 to 6.

## `viz_dataleaks`: Visualise the data leaks

```{r, comment=NA, message=FALSE, warning=FALSE}
viz_dataleaks(f1)
```

## `reason_dataleaks`

Display the reasons for data leaks and evaluate usefulness of data leaks towards the winning of the competition

```{r, comment=NA, message=FALSE, warning=FALSE}
r1 <- reason_dataleaks(lstx = lst, finddataleaksout = f1, h=5)
r1
```

# A list without naming element

```{r, warning=FALSE, message=FALSE}
a = rnorm(15)
lst <- list(
a,
c(a[10:15], rnorm(10), a[1:5], a[1:5]),
c(rnorm(10), a[1:5])
)
f1 <- find_dataleaks(lst, h=5)
```

```{r, warning=FALSE, message=FALSE}
viz_dataleaks(f1)
```

```{r, warning=FALSE, message=FALSE}
reason_dataleaks(lst, f1, h=5)
```

# Application to M-Competition data

## M1 Competition - Yearly data

```{r, warning=FALSE, message=FALSE}
library(Mcomp)
data("M1")
M1Y <- subset(M1, "yearly")
M1Y_x <- lapply(M1Y, function(temp){temp$x})
m1y_f1 <- find_dataleaks(M1Y_x, h=6, cutoff = 1)
m1y_f1
```

```{r, warning=FALSE, message=FALSE}
viz_dataleaks(m1y_f1)
```

```{r, warning=FALSE, message=FALSE, fig.width=12}
reason_dataleaks(M1Y_x, m1y_f1, h=6, ang=90)
```