Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hendersontrent/theft
R package for Tools for Handling Extraction of Features from Time series (theft)
https://github.com/hendersontrent/theft
data-visualisation data-visualization dimensionality-reduction machine-learning r time-series
Last synced: 6 days ago
JSON representation
R package for Tools for Handling Extraction of Features from Time series (theft)
- Host: GitHub
- URL: https://github.com/hendersontrent/theft
- Owner: hendersontrent
- License: other
- Created: 2021-03-25T00:17:37.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-03T08:07:02.000Z (4 months ago)
- Last Synced: 2025-01-12T13:06:34.186Z (13 days ago)
- Topics: data-visualisation, data-visualization, dimensionality-reduction, machine-learning, r, time-series
- Language: R
- Homepage: https://hendersontrent.github.io/theft/
- Size: 53.7 MB
- Stars: 40
- Watchers: 2
- Forks: 5
- Open Issues: 5
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- awesome-time-series - theft
README
---
output: rmarkdown::github_document
---# theft
[![CRAN version](https://www.r-pkg.org/badges/version/theft)](https://www.r-pkg.org/pkg/theft)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/theft)](https://www.r-pkg.org/pkg/theft)
[![DOI](https://zenodo.org/badge/351259952.svg)](https://zenodo.org/badge/latestdoi/351259952)Tools for Handling Extraction of Features from Time series (theft)
```{r, include = FALSE}
knitr::opts_chunk$set(
comment = NA, fig.width = 12, fig.height = 8, cache = FALSE)
```## Installation
You can install the stable version of `theft` from CRAN:
```{r eval = FALSE}
install.packages("theft")
```You can install the development version of `theft` from GitHub using the following:
```{r eval = FALSE}
devtools::install_github("hendersontrent/theft")
```Please also check out our paper [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146) which discusses the motivation and theoretical underpinnings of `theft` and walks through all of its functionality using the [Bonn EEG dataset](https://pubmed.ncbi.nlm.nih.gov/11736210/) --- a well-studied neuroscience dataset.
## General purpose
`theft` is a software package for R that facilitates user-friendly access to a consistent interface for the extraction of time-series features. The package provides a single point of access to $>1200$ time-series features from a range of existing R and Python packages. The packages which `theft` 'steals' features from currently are:
* [catch22](https://link.springer.com/article/10.1007/s10618-019-00647-x) (R; [see `Rcatch22` for the native implementation on CRAN](https://github.com/hendersontrent/Rcatch22))
* [feasts](https://feasts.tidyverts.org) (R)
* [tsfeatures](https://github.com/robjhyndman/tsfeatures) (R)
* [Kats](https://facebookresearch.github.io/Kats/) (Python)
* [tsfresh](https://tsfresh.com) (Python)
* [TSFEL](https://tsfel.readthedocs.io/en/latest/) (Python)Note that `Kats`, `tsfresh` and `TSFEL` are Python packages. `theft` has built-in functionality for helping you install these libraries---all you need to do is install Python 3.9 on your machine. If you wish to access the Python feature sets, please run `?install_python_pkgs` in R after downloading `theft` or consult the vignette in the package for more information. For a comprehensive comparison of these six feature sets across a range of domains (including computation speed, within-set feature composition, and between-set feature correlations), please refer to the paper [An Empirical Evaluation of Time-Series Feature Sets](https://ieeexplore.ieee.org/document/9679937).
As of `v0.6.1`, users can also supply their own features to `theft` (see the vignette for more information)!
## Package extensibility
The companion package [`theftdlc`](https://github.com/hendersontrent/theftdlc) ('`theft` downloadable content'---just like you get [DLCs and expansions](https://en.bandainamcoent.eu/elden-ring/elden-ring/shadow-of-the-erdtree) for video games) contains an extensive suite of functions for analysing, interpreting, and visualising time-series features calculated from `theft`. Collectively, these packages are referred to as the '`theft` ecosystem'.
A high-level overview of how the `theft` ecosystem for R is typically accessed by users is shown below. Note that prior to `v0.6.1` of, many of the `theftdlc` functions were contained in `theft` but under other names. To ensure the `theft` ecosystem is as user-friendly as possible and can scale to meet future demands, `theft` has been refactored to be just feature extraction, while `theftdlc` handles all the analysis of the extracted features. The deprecated names---such as `tsfeature_classifier()` being the outdated version of `classify()`---are also still available for now in `theftdlc`.
Many more functions and options for customisation are available within the packages and users are encouraged to explore the vignettes and helper files for more information.
## Quick tour
`theft` and `theftdlc` combine to create an intuitive and efficient tidy feature-based workflow. Here is an example of a single code chunk that calculates features using [`catch22`](https://github.com/hendersontrent/Rcatch22) and a custom set of mean and standard deviation, and projects the feature space into an interpretable two-dimensional space using principal components analysis:
```{r, message = FALSE, warning = FALSE, fig.height=6, fig.width=6}
library(dplyr)
library(theft)
library(theftdlc)calculate_features(data = theft::simData,
group_var = "process",
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
project(norm_method = "RobustSigmoid",
unit_int = TRUE,
low_dim_method = "PCA") %>%
plot()
```In that example, `calculate_features` comes from `theft`, while `project` and the `plot` generic come from `theftdlc`.
Similarly, we can perform time-series classification using a similar simple workflow to compare the performance of `catch22` against our custom set of the first two moments of the distribution:
```{r, message = FALSE, warning = FALSE}
calculate_features(data = theft::simData,
group_var = "process",
feature_set = "catch22",
features = list("mean" = mean, "sd" = sd)) %>%
classify(by_set = TRUE,
n_resamples = 5,
use_null = TRUE) %>%
compare_features(by_set = TRUE,
hypothesis = "null") %>%
head()
```In this example, `classify` and `compare_features` come from `theftdlc`.
Please see the vignette for more information and the full functionality of both packages.
## Citation
If you use `theft` or `theftdlc` in your own work, please cite both the paper:
T. Henderson and Ben D. Fulcher. [Feature-Based Time-Series Analysis in R using the theft Package](https://arxiv.org/abs/2208.06146). arXiv, (2022).
and the software:
```{r, echo = FALSE}
citation("theft")
citation("theftdlc")
```