https://github.com/mlindsk/molic
Multivariate Outlierdetection In Contingency Tables
https://github.com/mlindsk/molic
categorical-data contingency-tables decomposable-graphical-models high-dimensional-data outlier-detection
Last synced: 3 months ago
JSON representation
Multivariate Outlierdetection In Contingency Tables
- Host: GitHub
- URL: https://github.com/mlindsk/molic
- Owner: mlindsk
- License: gpl-3.0
- Created: 2019-03-26T06:41:30.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2022-03-04T07:10:54.000Z (almost 4 years ago)
- Last Synced: 2025-10-16T21:26:37.454Z (3 months ago)
- Topics: categorical-data, contingency-tables, decomposable-graphical-models, high-dimensional-data, outlier-detection
- Language: R
- Size: 14.1 MB
- Stars: 6
- Watchers: 0
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project
README
---
title: "molic: Multivariate OutLIerdetection In Contingency tables"
output:
github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = FALSE,
warnings = FALSE,
fig.path = "man/figures/README-",
out.width = "100%"
)
```
[](https://github.com/mlindsk/molic/actions)
[](https://cran.r-project.org/package=molic)
[](https://joss.theoj.org/papers/9fa65ced7bf3db01343d68b4488196d8)
[](https://zenodo.org/badge/latestdoi/177729633)
## About molic
An **R** package to perform outlier detection in contingency tables (i.e. categorical data) using decomposable graphical models (DGMs); models for which the underlying association between all variables can be depicted by an undirected graph. **molic** are designed to work with undirected decomposable graphs returned from `fit_graph` in the [ess](https://github.com/mlindsk/ess) package. Compute-intensive procedures are implemented using [Rcpp](http://www.rcpp.org/)/C++ for better run-time performance.
## Installation
You can install the current stable release of the package by using the `devtools` package:
```{r, eval = FALSE}
devtools::install_github("mlindsk/molic", build_vignettes = FALSE)
```
## Articles
- [The Outlier Model](https://mlindsk.github.io/molic/articles/outlier_intro.html): The "behind the scenes" of the outlier model.
- [Detecting Skin Diseases](https://mlindsk.github.io/molic/articles/dermatitis.html): An example of using the outlier model to detect skin diseases.
- [Outlier Detection in Genetic Data](https://mlindsk.github.io/molic/articles/genetic_example.html): An example of how to conduct an outlier analysis in genetic data.
## Example of Usage
```{r}
library(dplyr)
library(molic)
library(ess) # For the fit_graph function
set.seed(7) # For reproducibility
```
Psoriasis patients
```{r}
d <- derma %>%
filter(ES == "psoriasis") %>%
select(-ES) %>%
as_tibble()
```
Fitting the interaction graph
```{r}
g <- fit_graph(d, trace = FALSE) # see package ess for details
plot(g, vertex.size = 15)
```
This plot shows how the variables are 'associated' in the psoriasis class; see [ess](https://github.com/mlindsk/ess) for more information about `fit_graph`. The outlier model exploits this knowledge instead of assuming independence between all variables (which would clearly be a wrong assumption looking at the graph). The graph may look very different for other classes than psoriasis.
## Example 1 - Testing which observations within the psoriasis class are outliers
We start by fitting an outlier model taking advantage of the fittet graph `g` which holds information about the psoriasis patients. The print method prints information about the distribution of the (deviance) test statistic.
```{r}
m1 <- fit_outlier(d, g)
print(m1)
```
Notice that `m1` is of class 'outlier'. This means, that the procedure has tested which observations _within_ the data are outliers. This method is most often just referred to as outlier detection. The outliers, on a 5% significance level, can now be extracted as follows:
```{r}
outs <- outliers(m1)
douts <- d[which(outs), ]
douts
```
The following plot is the distribution of the test statistic corresponding to the information retrieved using the print method. One can think of a simple t-test, where the distribution of the test statistic is a t-distribution. In order to conclude on the hypothesis, one finds the critical value and verify if the test statistic is greater or less than this.
```{r}
plot(m1)
```
Retrieving the observed test statistics for the individual observations:
```{r}
x1 <- douts[1, ] %>% unlist() # an outlier
x2 <- d[1, ] %>% unlist() # an inliner
dev1 <- deviance(m1, x1) # falls within the critical region in the plot (the red area)
dev2 <- deviance(m1, x2) # falls within the acceptable region in the plot
dev1
dev2
```
Retrieving the p-values:
```{r}
pval(m1, dev1)
pval(m1, dev2)
```
## Example 2 - Testing if a new observation is an outlier
An observation from class chronic dermatitis:
```{r}
z <- derma %>%
filter(ES == "chronic dermatitis") %>%
select(-ES) %>%
slice(1) %>%
unlist()
```
Test if z is an outlier in class psoriasis:
```{r}
m2 <- fit_outlier(d, g, z)
print(m2)
plot(m2)
```
Notice that `m2` is of class 'novelty'. The term _novelty detection_ is sometimes used in the litterature when the goal is to verify if a new unseen observation is an outlier in a homogeneous dataset. Retrieving the test statistic and p-value for `z`
```{r}
dz <- deviance(m2, z)
pval(m2, dz)
```
## How To Cite
If you want to cite the **outlier method** please use
```latex
@article{lindskououtlier,
title={Outlier Detection in Contingency Tables Using Decomposable Graphical Models},
author={Lindskou, Mads and Svante Eriksen, Poul and Tvedebrink, Torben},
journal={Scandinavian Journal of Statistics},
publisher={Wiley Online Library},
doi={10.1111/sjos.12407},
year={2019}
}
```
If you want to cite the **molic** package please use
```latex
@software{lindskoumolic,
author = {Mads Lindskou},
title = {{molic: An R package for multivariate outlier
detection in contingency tables}},
month = oct,
year = 2019,
publisher = {Journal of Open Source Software},
doi = {10.21105/joss.01665},
url = {https://doi.org/10.21105/joss.01665}
}
```