Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/business-science/anomalize

Tidy anomaly detection
https://github.com/business-science/anomalize

anomaly anomaly-detection decomposition detect-anomalies iqr r-package time-series

Last synced: about 2 months ago
JSON representation

Tidy anomaly detection

Awesome Lists containing this project

README

        

---
output: github_document
---

# Anomalize is being Superceded by Timetk:

# anomalize

[![R-CMD-check](https://github.com/business-science/anomalize/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/business-science/anomalize/actions/workflows/R-CMD-check.yaml)
[![Lifecycle Status](https://img.shields.io/badge/lifecycle-superceded-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![Coverage status](https://codecov.io/gh/business-science/anomalize/branch/master/graph/badge.svg)](https://app.codecov.io/github/business-science/anomalize?branch=master)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/anomalize)](https://cran.r-project.org/package=anomalize)
![](http://cranlogs.r-pkg.org/badges/anomalize?color=brightgreen)
![](http://cranlogs.r-pkg.org/badges/grand-total/anomalize?color=brightgreen)

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
dpi = 200,
message = F,
warning = F
)
library(anomalize)
library(dplyr) # for pipe
```

The `anomalize` package functionality has been superceded by `timetk`. We suggest you begin to use the `timetk::anomalize()` to benefit from enhanced functionality to get improvements going forward. [Learn more about Anomaly Detection with `timetk` here.](https://business-science.github.io/timetk/articles/TK08_Automatic_Anomaly_Detection.html)

The original `anomalize` package functionality will be maintained for previous code bases that use the legacy functionality.

To prevent the new `timetk` functionality from conflicting with old `anomalize` code, use these lines:

``` r
library(anomalize)

anomalize <- anomalize::anomalize
plot_anomalies <- anomalize::plot_anomalies
```

> Tidy anomaly detection

`anomalize` enables a tidy workflow for detecting anomalies in data. The main functions are `time_decompose()`, `anomalize()`, and `time_recompose()`. When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data.

## Anomalize In 2 Minutes (YouTube)

Anomalize

Check out our entire [Software Intro Series](https://www.youtube.com/watch?v=Gk_HwjhlQJs&list=PLo32uKohmrXsYNhpdwr15W143rX6uMAze) on YouTube!

## Installation

You can install the development version with `devtools` or the most recent CRAN version with `install.packages()`:

``` r
# devtools::install_github("business-science/anomalize")
install.packages("anomalize")
```

## How It Works

`anomalize` has three main functions:

- `time_decompose()`: Separates the time series into seasonal, trend, and remainder components
- `anomalize()`: Applies anomaly detection methods to the remainder component.
- `time_recompose()`: Calculates limits that separate the "normal" data from the anomalies!

## Getting Started

Load the `anomalize` package. Usually, you will also load the tidyverse as well!

```{r, eval = F}
library(anomalize)
library(tidyverse)
# NOTE: timetk now has anomaly detection built in, which
# will get the new functionality going forward.
# Use this script to prevent overwriting legacy anomalize:

anomalize <- anomalize::anomalize
plot_anomalies <- anomalize::plot_anomalies
```

Next, let's get some data. `anomalize` ships with a data set called `tidyverse_cran_downloads` that contains the daily CRAN download counts for 15 "tidy" packages from 2017-01-01 to 2018-03-01.

Suppose we want to determine which daily download "counts" are anomalous. It's as easy as using the three main functions (`time_decompose()`, `anomalize()`, and `time_recompose()`) along with a visualization function, `plot_anomalies()`.

```{r tidyverse_anoms_1, fig.height=8}
tidyverse_cran_downloads %>%
# Data Manipulation / Anomaly Detection
time_decompose(count, method = "stl") %>%
anomalize(remainder, method = "iqr") %>%
time_recompose() %>%
# Anomaly Visualization
plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.25) +
ggplot2::labs(title = "Tidyverse Anomalies", subtitle = "STL + IQR Methods")
```

Check out the [`anomalize` Quick Start Guide](https://business-science.github.io/anomalize/articles/anomalize_quick_start_guide.html).

## Reducing Forecast Error by 32%

Yes! Anomalize has a new function, `clean_anomalies()`, that can be used to repair time series prior to forecasting. We have a [brand new vignette - Reduce Forecast Error (by 32%) with Cleaned Anomalies](https://business-science.github.io/anomalize/articles/forecasting_with_cleaned_anomalies.html).
```{r}
tidyverse_cran_downloads %>%
dplyr::filter(package == "lubridate") %>%
dplyr::ungroup() %>%
time_decompose(count) %>%
anomalize(remainder) %>%

# New function that cleans & repairs anomalies!
clean_anomalies() %>%

dplyr::select(date, anomaly, observed, observed_cleaned) %>%
dplyr::filter(anomaly == "Yes")
```

## But Wait, There's More!

There are a several extra capabilities:

- `plot_anomaly_decomposition()` for visualizing the inner workings of how algorithm detects anomalies in the "remainder".

```{r, fig.height=7}
tidyverse_cran_downloads %>%
dplyr::filter(package == "lubridate") %>%
dplyr::ungroup() %>%
time_decompose(count) %>%
anomalize(remainder) %>%
plot_anomaly_decomposition() +
ggplot2::labs(title = "Decomposition of Anomalized Lubridate Downloads")
```

For more information on the `anomalize` methods and the inner workings, please see ["Anomalize Methods" Vignette](https://business-science.github.io/anomalize/articles/anomalize_methods.html).

## References

Several other packages were instrumental in developing anomaly detection methods used in `anomalize`:

- Twitter's `AnomalyDetection`, which implements decomposition using median spans and the Generalized Extreme Studentized Deviation (GESD) test for anomalies.
- `forecast::tsoutliers()` function, which implements the IQR method.

# Interested in Learning Anomaly Detection?

Business Science offers two 1-hour courses on Anomaly Detection:

- [Learning Lab 18](https://university.business-science.io/p/learning-labs-pro) - Time Series Anomaly Detection with `anomalize`

- [Learning Lab 17](https://university.business-science.io/p/learning-labs-pro) - Anomaly Detection with `H2O` Machine Learning