Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thiyangt/tsdataleaks
R Package for detecting data leakages in time series forecasting competitions.
https://github.com/thiyangt/tsdataleaks
Last synced: 2 months ago
JSON representation
R Package for detecting data leakages in time series forecasting competitions.
- Host: GitHub
- URL: https://github.com/thiyangt/tsdataleaks
- Owner: thiyangt
- Created: 2020-07-25T11:36:09.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-02-19T08:34:24.000Z (11 months ago)
- Last Synced: 2024-09-26T13:48:24.431Z (3 months ago)
- Language: R
- Homepage:
- Size: 7.62 MB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# tsdataleaks
![CRAN status](https://www.r-pkg.org/badges/version/tsdataleaks)](https://CRAN.R-project.org/package=tsdataleaks)
R Package for detecting data leakages in time series forecasting competitions.
## Installation
The development version from [GitHub](https://github.com/) with:
```r
install.packages("tsdataleaks")
library(tsdataleaks)
```or
``` r
# install.packages("devtools")
devtools::install_github("thiyangt/tsdataleaks")
library(tsdataleaks)
```
## ExampleTo demonstrate the package functions, I created a small data set with 4 time series.
```{r example, comment=NA, warning=FALSE, message=FALSE}
set.seed(2020)
a <- rnorm(15)
d <- rnorm(10)
lst <- list(
a = a,
b = c(a[10:15]+rep(8,6), rnorm(10), a[1:5], a[1:5]),
c = c(rnorm(10), -a[1:5]),
d = d,
e = d)```
## `find_dataleaks`: Exploit data leaks
```{r, comment=NA, message=FALSE, warning=FALSE}
library(tsdataleaks)
library(magrittr)
library(tidyverse)
library(viridis)
# h - I assume test period length is 5 and took that as wind size, h.
f1 <- find_dataleaks(lstx = lst, h=5, cutoff=1)
f1
```Interpretation: The first element in the list means the last 5 observations of the time series `a` correlates with time series `b` observarion from 2 to 6.
## `viz_dataleaks`: Visualise the data leaks
```{r, comment=NA, message=FALSE, warning=FALSE}
viz_dataleaks(f1)
```## `reason_dataleaks`
Display the reasons for data leaks and evaluate usefulness of data leaks towards the winning of the competition
```{r, comment=NA, message=FALSE, warning=FALSE}
r1 <- reason_dataleaks(lstx = lst, finddataleaksout = f1, h=5)
r1
```# A list without naming element
```{r, warning=FALSE, message=FALSE}
a = rnorm(15)
lst <- list(
a,
c(a[10:15], rnorm(10), a[1:5], a[1:5]),
c(rnorm(10), a[1:5])
)
f1 <- find_dataleaks(lst, h=5)
``````{r, warning=FALSE, message=FALSE}
viz_dataleaks(f1)
``````{r, warning=FALSE, message=FALSE}
reason_dataleaks(lst, f1, h=5)
```# Application to M-Competition data
## M1 Competition - Yearly data
```{r, warning=FALSE, message=FALSE}
library(Mcomp)
data("M1")
M1Y <- subset(M1, "yearly")
M1Y_x <- lapply(M1Y, function(temp){temp$x})
m1y_f1 <- find_dataleaks(M1Y_x, h=6, cutoff = 1)
m1y_f1
``````{r, warning=FALSE, message=FALSE}
viz_dataleaks(m1y_f1)
``````{r, warning=FALSE, message=FALSE, fig.width=12}
reason_dataleaks(M1Y_x, m1y_f1, h=6, ang=90)
```