{"id":32200782,"url":"https://github.com/mlindsk/molic","last_synced_at":"2025-10-22T03:55:44.895Z","repository":{"id":43380736,"uuid":"177729633","full_name":"mlindsk/molic","owner":"mlindsk","description":"Multivariate Outlierdetection In Contingency Tables","archived":false,"fork":false,"pushed_at":"2022-03-04T07:10:54.000Z","size":14775,"stargazers_count":6,"open_issues_count":0,"forks_count":6,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-16T21:26:37.454Z","etag":null,"topics":["categorical-data","contingency-tables","decomposable-graphical-models","high-dimensional-data","outlier-detection"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlindsk.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-03-26T06:41:30.000Z","updated_at":"2022-11-17T18:49:36.000Z","dependencies_parsed_at":"2022-09-02T12:51:04.539Z","dependency_job_id":null,"html_url":"https://github.com/mlindsk/molic","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mlindsk/molic","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlindsk%2Fmolic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlindsk%2Fmolic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlindsk%2Fmolic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlindsk%2Fmolic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlindsk","download_url":"https://codeload.github.com/mlindsk/molic/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlindsk%2Fmolic/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280361492,"owners_count":26317693,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-21T02:00:06.614Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["categorical-data","contingency-tables","decomposable-graphical-models","high-dimensional-data","outlier-detection"],"created_at":"2025-10-22T03:55:43.878Z","updated_at":"2025-10-22T03:55:44.890Z","avatar_url":"https://github.com/mlindsk.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: \"molic: Multivariate OutLIerdetection In Contingency tables\"\noutput:\n  github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  message = FALSE,\n  warnings = FALSE,\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n[![R build status](https://github.com/mlindsk/molic/workflows/R-CMD-check/badge.svg)](https://github.com/mlindsk/molic/actions)\n[![](https://www.r-pkg.org/badges/version/molic?color=green)](https://cran.r-project.org/package=molic)\n[![status](https://joss.theoj.org/papers/9fa65ced7bf3db01343d68b4488196d8/status.svg)](https://joss.theoj.org/papers/9fa65ced7bf3db01343d68b4488196d8)\n[![DOI](https://zenodo.org/badge/177729633.svg)](https://zenodo.org/badge/latestdoi/177729633)\n\n## About molic\n\nAn **R** package to perform outlier detection in contingency tables (i.e. categorical data) using decomposable graphical models (DGMs); models for which the underlying association between all variables can be depicted by an undirected graph. **molic** are designed to work with undirected decomposable graphs returned from `fit_graph` in the [ess](https://github.com/mlindsk/ess) package. Compute-intensive procedures are implemented using [Rcpp](http://www.rcpp.org/)/C++ for better run-time performance.\n\n## Installation\n\nYou can install the current stable release of the package by using the `devtools` package: \n\n```{r, eval = FALSE}\ndevtools::install_github(\"mlindsk/molic\", build_vignettes = FALSE)\n```\n\n## Articles\n\n - [The Outlier Model](https://mlindsk.github.io/molic/articles/outlier_intro.html): The \"behind the scenes\" of the outlier model.\n - [Detecting Skin Diseases](https://mlindsk.github.io/molic/articles/dermatitis.html): An example of using the outlier model to detect skin diseases. \n - [Outlier Detection in Genetic Data](https://mlindsk.github.io/molic/articles/genetic_example.html): An example of how to conduct an outlier analysis in genetic data.\n\n\n## Example of Usage\n\n```{r}\nlibrary(dplyr)\nlibrary(molic)\nlibrary(ess)   # For the fit_graph function\nset.seed(7)    # For reproducibility\n```\n\nPsoriasis patients\n\n```{r}\nd \u003c- derma %\u003e%\n  filter(ES == \"psoriasis\") %\u003e%\n  select(-ES) %\u003e%\n  as_tibble()\n```\n\nFitting the interaction graph\n\n```{r}\ng \u003c- fit_graph(d, trace = FALSE) # see package ess for details\nplot(g, vertex.size = 15) \n```\n\nThis plot shows how the variables are 'associated' in the psoriasis class; see [ess](https://github.com/mlindsk/ess) for more information about `fit_graph`. The outlier model exploits this knowledge instead of assuming independence between all variables (which would clearly be a wrong assumption looking at the graph). The graph may look very different for other classes than psoriasis.\n\n## Example 1 - Testing which observations within the psoriasis class are outliers\n\nWe start by fitting an outlier model taking advantage of the fittet graph `g` which holds information about the psoriasis patients. The print method prints information about the distribution of the (deviance) test statistic.\n\n```{r}\nm1 \u003c- fit_outlier(d, g)\nprint(m1)\n```\n\nNotice that `m1` is of class 'outlier'. This means, that the procedure has tested which observations _within_ the data are outliers. This method is most often just referred to as outlier detection. The outliers, on a 5% significance level, can now be extracted as follows:\n\n```{r}\nouts  \u003c- outliers(m1)\ndouts \u003c- d[which(outs), ]\ndouts\n```\n\nThe following plot is the distribution of the test statistic corresponding to the information retrieved using the print method. One can think of a simple t-test, where the distribution of the test statistic is a t-distribution. In order to conclude on the hypothesis, one finds the critical value and verify if the test statistic is greater or less than this.\n\n```{r}\nplot(m1) \n```\n\nRetrieving the observed test statistics for the individual observations:\n\n```{r}\nx1   \u003c- douts[1, ] %\u003e% unlist() # an outlier\nx2   \u003c- d[1, ] %\u003e% unlist()     # an inliner\ndev1 \u003c- deviance(m1, x1) # falls within the critical region in the plot (the red area)\ndev2 \u003c- deviance(m1, x2) # falls within the acceptable region in the plot\ndev1\ndev2\n```\n\nRetrieving the p-values:\n\n```{r}\npval(m1, dev1)\npval(m1, dev2)\n```\n\n## Example 2 - Testing if a new observation is an outlier\n \nAn observation from class chronic dermatitis: \n\n```{r}\nz \u003c- derma %\u003e%\n  filter(ES == \"chronic dermatitis\") %\u003e%\n  select(-ES) %\u003e%\n  slice(1) %\u003e%\n  unlist()\n```\n\nTest if z is an outlier in class psoriasis:\n\n```{r}\nm2 \u003c- fit_outlier(d, g, z)\nprint(m2)\nplot(m2)\n```\n\nNotice that `m2` is of class 'novelty'. The term _novelty detection_ is sometimes used in the litterature when the goal is to verify if a new unseen observation is an outlier in a homogeneous dataset. Retrieving the test statistic and p-value for `z`\n\n```{r}\ndz \u003c- deviance(m2, z)\npval(m2, dz)\n```\n\n## How To Cite\n\nIf you want to cite the **outlier method** please use\n\n```latex\n@article{lindskououtlier,\n  title={Outlier Detection in Contingency Tables Using Decomposable Graphical Models},\n  author={Lindskou, Mads and Svante Eriksen, Poul and Tvedebrink, Torben},\n  journal={Scandinavian Journal of Statistics},\n  publisher={Wiley Online Library},\n  doi={10.1111/sjos.12407},\n  year={2019}\n}\n```\n\nIf you want to cite the **molic** package please use\n\n```latex\n@software{lindskoumolic,\n  author       = {Mads Lindskou},\n  title        = {{molic: An R package for multivariate outlier \n                   detection in contingency tables}},\n  month        = oct,\n  year         = 2019,\n  publisher    = {Journal of Open Source Software},\n  doi          = {10.21105/joss.01665},\n  url          = {https://doi.org/10.21105/joss.01665}\n}\n```\n\n\u003c!-- Also, see the [jti](https://github.com/mlindsk/jti) package which is used for making inference in Bayesian networks or DGMs; in the latter case, one can exploit the **ess** package. --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlindsk%2Fmolic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmlindsk%2Fmolic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlindsk%2Fmolic/lists"}