Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mhahsler/arules

Mining Association Rules and Frequent Itemsets with R
https://github.com/mhahsler/arules

arules association-rules cran frequent-itemsets r

Last synced: about 17 hours ago
JSON representation

Mining Association Rules and Frequent Itemsets with R

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r echo=FALSE, results = 'asis'}
pkg <- "arules"

source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg, anaconda = "r-arules", stackoverflow = "arules")
```

## Introduction

The arules package family for R provides the infrastructure for representing,
manipulating and analyzing transaction data and patterns
using [frequent itemsets and association rules](https://en.wikipedia.org/wiki/Association_rule_learning).
The package also provides a wide range of
[interest measures](https://mhahsler.github.io/arules/docs/measures) and mining algorithms including the code of
Christian Borgelt's popular and efficient C implementations of the association mining algorithms [Apriori](https://borgelt.net/apriori.html) and [Eclat](https://borgelt.net/eclat.html). In addition, the following mining algorithms are
available via [fim4r](https://borgelt.net/fim4r.html):

* Apriori
* Eclat
* Carpenter
* FPgrowth
* IsTa
* RElim
* SaM

Code examples can be found in
[Chapter 5 of the web book R Companion for Introduction to Data
Mining](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html).

```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg, 2)
```

## Packages

### arules core packages

* [arules](https://cran.r-project.org/package=arules): arules base package with data structures, mining algorithms (APRIORI and ECLAT), interest measures.
* [arulesViz](https://github.com/mhahsler/arulesViz): Visualization of association rules.
* [arulesCBA](https://github.com/ianstenbit/arulesCBA): Classification algorithms based on association rules (includes CBA).
* [arulesSequences](https://cran.r-project.org/package=arulesSequences): Mining frequent sequences (cSPADE).

### Other related packages

Additional mining algorithms

* [arulesNBMiner](https://github.com/mhahsler/arulesNBMiner): Mining NB-frequent itemsets and NB-precise rules.
* [fim4r](https://borgelt.net/fim4r.html): Provides fast implementations for several mining algorithms. An interface function called `fim4r()` is provided in `arules`.
* [opusminer](https://cran.r-project.org/package=opusminer): OPUS Miner algorithm for finding the op k productive, non-redundant itemsets. Call `opus()` with `format = 'itemsets'`.
* [RKEEL](https://cran.r-project.org/package=RKEEL): Interface to KEEL's association rule mining algorithm.
* [RSarules](https://cran.r-project.org/package=RSarules): Mining algorithm which randomly samples association rules with one pre-chosen item as the consequent from a transaction dataset.

In-database analytics

* [ibmdbR](https://cran.r-project.org/package=ibmdbR): IBM in-database analytics for R can calculate association rules from a database table.
* [rfml](https://cran.r-project.org/package=rfml): Mine frequent itemsets or association rules using a MarkLogic server.

Interface

* [rattle](https://cran.r-project.org/package=rattle): Provides a graphical user interface for association rule mining.
* [pmml](https://cran.r-project.org/package=pmml): Generates PMML (predictive model markup language) for association rules.

Classification

* [arc](https://cran.r-project.org/package=arc): Alternative CBA implementation.
* [inTrees](https://cran.r-project.org/package=inTrees): Interpret Tree Ensembles provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner.
* [rCBA](https://cran.r-project.org/package=rCBA): Alternative CBA implementation.
* [qCBA](https://cran.r-project.org/package=qCBA): Quantitative Classification by Association Rules.
* [sblr](https://cran.r-project.org/package=sbrl): Scalable Bayesian rule lists algorithm for classification.

Outlier Detection

* [fpmoutliers](https://cran.r-project.org/package=fpmoutliers): Frequent Pattern Mining Outliers.

Recommendation/Prediction

* [recommenerlab](https://github.com/mhahsler/recommenderlab): Supports creating predictions using association rules.

```{r echo=FALSE, results = 'asis'}
pkg_usage(pkg)
```

```{r echo=FALSE, results = 'asis'}
pkg_install(pkg)
```

## Usage

Load package and mine some association rules.
```{r }
library("arules")
data("IncomeESL")

trans <- transactions(IncomeESL)
trans

rules <- apriori(trans, supp = 0.1, conf = 0.9, target = "rules")
```

Inspect the rules with the highest lift.
```{r }
inspect(head(rules, n = 3, by = "lift"))
```

## Using arules with tidyverse

`arules` works seamlessly with [tidyverse](https://www.tidyverse.org/). For example:

* `dplyr` can be used for cleaning and preparing the transactions.
* `transaction()` and other functions accept `tibble` as input.
* Functions in arules can be connected with the pipe operator `|>`.
* [arulesViz](https://github.com/mhahsler/arulesViz) provides visualizations based on `ggplot2`.

For example, we can remove the ethnic information column before creating transactions and then mine and inspect rules.
```{r }
library("tidyverse")
library("arules")
data("IncomeESL")

trans <- IncomeESL |>
select(-`ethnic classification`) |>
transactions()
rules <- trans |>
apriori(
supp = 0.1, conf = 0.9, target = "rules",
control = list(verbose = FALSE)
)
rules |>
head(3, by = "lift") |>
as("data.frame") |>
tibble()
```

## Using arules from Python

`arules` and `arulesViz` can now be used directly from Python with the Python
package [`arulespy`](https://pypi.org/project/arulespy/) available form PyPI.

## Support

Please report bugs [here on GitHub.](https://github.com/mhahsler/arules/issues)
Questions should be posted on [stackoverflow and tagged with arules](https://stackoverflow.com/questions/tagged/arules).

## References

* Michael Hahsler. [ARULESPY: Exploring association rules and frequent itemsets in
Python.](http://dx.doi.org/10.48550/arXiv.2305.15263) arXiv:2305.15263 [cs.DB], May 2023.
* Michael Hahsler. [An R Companion for Introduction to Data Mining: Chapter 5](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html), 2021, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/
* Hahsler, Michael. [A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules](https://mhahsler.github.io/arules/docs/measures), 2015, URL: https://mhahsler.github.io/arules/docs/measures.
* Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. [The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets.](https://jmlr.csail.mit.edu/papers/v12/hahsler11a.html) _Journal of Machine Learning Research,_ 12:1977-1981, 2011.
* Michael Hahsler, Bettina Grün and Kurt Hornik. [arules - A Computational Environment for Mining Association Rules and Frequent Item Sets.](https://dx.doi.org/10.18637/jss.v014.i15) _Journal of Statistical Software,_ 14(15), 2005.