Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hendersontrent/theft

R package for Tools for Handling Extraction of Features from Time series (theft)
https://github.com/hendersontrent/theft

data-visualisation data-visualization dimensionality-reduction machine-learning r time-series

Last synced: 6 days ago
JSON representation

R package for Tools for Handling Extraction of Features from Time series (theft)

Awesome Lists containing this project

README

        

---
output: rmarkdown::github_document
---

# theft

[![CRAN version](https://www.r-pkg.org/badges/version/theft)](https://www.r-pkg.org/pkg/theft)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/theft)](https://www.r-pkg.org/pkg/theft)
[![DOI](https://zenodo.org/badge/351259952.svg)](https://zenodo.org/badge/latestdoi/351259952)

Tools for Handling Extraction of Features from Time series (theft)

```{r, include = FALSE}
knitr::opts_chunk$set(
comment = NA, fig.width = 12, fig.height = 8, cache = FALSE)
```

## Installation

You can install the stable version of `theft` from CRAN:

```{r eval = FALSE}
install.packages("theft")
```

You can install the development version of `theft` from GitHub using the following:

```{r eval = FALSE}
devtools::install_github("hendersontrent/theft")
```

Please also check out our paper [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146) which discusses the motivation and theoretical underpinnings of `theft` and walks through all of its functionality using the [Bonn EEG dataset](https://pubmed.ncbi.nlm.nih.gov/11736210/) --- a well-studied neuroscience dataset.

## General purpose

`theft` is a software package for R that facilitates user-friendly access to a consistent interface for the extraction of time-series features. The package provides a single point of access to $>1200$ time-series features from a range of existing R and Python packages. The packages which `theft` 'steals' features from currently are:

* [catch22](https://link.springer.com/article/10.1007/s10618-019-00647-x) (R; [see `Rcatch22` for the native implementation on CRAN](https://github.com/hendersontrent/Rcatch22))
* [feasts](https://feasts.tidyverts.org) (R)
* [tsfeatures](https://github.com/robjhyndman/tsfeatures) (R)
* [Kats](https://facebookresearch.github.io/Kats/) (Python)
* [tsfresh](https://tsfresh.com) (Python)
* [TSFEL](https://tsfel.readthedocs.io/en/latest/) (Python)

Note that `Kats`, `tsfresh` and `TSFEL` are Python packages. `theft` has built-in functionality for helping you install these libraries---all you need to do is install Python 3.9 on your machine. If you wish to access the Python feature sets, please run `?install_python_pkgs` in R after downloading `theft` or consult the vignette in the package for more information. For a comprehensive comparison of these six feature sets across a range of domains (including computation speed, within-set feature composition, and between-set feature correlations), please refer to the paper [An Empirical Evaluation of Time-Series Feature Sets](https://ieeexplore.ieee.org/document/9679937).

As of `v0.6.1`, users can also supply their own features to `theft` (see the vignette for more information)!

## Package extensibility

The companion package [`theftdlc`](https://github.com/hendersontrent/theftdlc) ('`theft` downloadable content'---just like you get [DLCs and expansions](https://en.bandainamcoent.eu/elden-ring/elden-ring/shadow-of-the-erdtree) for video games) contains an extensive suite of functions for analysing, interpreting, and visualising time-series features calculated from `theft`. Collectively, these packages are referred to as the '`theft` ecosystem'.

Hex stickers of the theft and theftdlc packages for R

A high-level overview of how the `theft` ecosystem for R is typically accessed by users is shown below. Note that prior to `v0.6.1` of, many of the `theftdlc` functions were contained in `theft` but under other names. To ensure the `theft` ecosystem is as user-friendly as possible and can scale to meet future demands, `theft` has been refactored to be just feature extraction, while `theftdlc` handles all the analysis of the extracted features. The deprecated names---such as `tsfeature_classifier()` being the outdated version of `classify()`---are also still available for now in `theftdlc`.

Schematic of the theft ecosystem in R

Many more functions and options for customisation are available within the packages and users are encouraged to explore the vignettes and helper files for more information.

## Quick tour

`theft` and `theftdlc` combine to create an intuitive and efficient tidy feature-based workflow. Here is an example of a single code chunk that calculates features using [`catch22`](https://github.com/hendersontrent/Rcatch22) and a custom set of mean and standard deviation, and projects the feature space into an interpretable two-dimensional space using principal components analysis:

```{r, message = FALSE, warning = FALSE, fig.height=6, fig.width=6}
library(dplyr)
library(theft)
library(theftdlc)

calculate_features(data = theft::simData,
group_var = "process",
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
project(norm_method = "RobustSigmoid",
unit_int = TRUE,
low_dim_method = "PCA") %>%
plot()
```

In that example, `calculate_features` comes from `theft`, while `project` and the `plot` generic come from `theftdlc`.

Similarly, we can perform time-series classification using a similar simple workflow to compare the performance of `catch22` against our custom set of the first two moments of the distribution:

```{r, message = FALSE, warning = FALSE}
calculate_features(data = theft::simData,
group_var = "process",
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
classify(by_set = TRUE,
n_resamples = 5,
use_null = TRUE) %>%
compare_features(by_set = TRUE,
hypothesis = "null") %>%
head()
```

In this example, `classify` and `compare_features` come from `theftdlc`.

Please see the vignette for more information and the full functionality of both packages.

## Citation

If you use `theft` or `theftdlc` in your own work, please cite both the paper:

T. Henderson and Ben D. Fulcher. [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146). arXiv, (2022).

and the software:

```{r, echo = FALSE}
citation("theft")
citation("theftdlc")
```