{"id":18501597,"url":"https://github.com/const-ae/mixdir","last_synced_at":"2025-04-09T18:33:14.336Z","repository":{"id":56936728,"uuid":"118273118","full_name":"const-ae/mixdir","owner":"const-ae","description":"Cluster high dimensional categorical datasets","archived":false,"fork":false,"pushed_at":"2023-09-11T18:19:38.000Z","size":344,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-23T09:08:14.823Z","etag":null,"topics":["categorical-data","clustering","questionnaires","r-package","variational-inference"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/const-ae.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-01-20T19:17:24.000Z","updated_at":"2024-02-18T12:08:42.000Z","dependencies_parsed_at":"2022-08-21T05:50:51.393Z","dependency_job_id":"d4973802-6980-4612-9e37-68912d00b7a7","html_url":"https://github.com/const-ae/mixdir","commit_stats":{"total_commits":60,"total_committers":1,"mean_commits":60.0,"dds":0.0,"last_synced_commit":"52e8747aa5d2f0cf8c70cab6719ff1176d4b19eb"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Fmixdir","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Fmixdir/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Fmixdir/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Fmixdir/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/const-ae","download_url":"https://codeload.github.com/const-ae/mixdir/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248087907,"owners_count":21045608,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["categorical-data","clustering","questionnaires","r-package","variational-inference"],"created_at":"2024-11-06T13:54:22.830Z","updated_at":"2025-04-09T18:33:14.319Z","avatar_url":"https://github.com/const-ae.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput:\n  md_document:\n    variant: markdown_github\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README_plots/\"\n)\n```\n\n# mixdir\n\nThe goal of mixdir is to cluster high dimensional categorical datasets.\n\nIt can\n\n* handle missing data\n* infer a reasonable number of latent class (try `mixdir(select_latent=TRUE)`)\n* cluster datasets with more than 70,000 observations and 60 features\n* propagate uncertainty and produce a soft clustering\n\n\nA detailed description of the algorithm and the features of the package can \nbe found in the the accompanying [paper](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=8631438\u0026isnumber=8631391).\nIf you find the package useful please cite\n\n\u003eC. Ahlmann-Eltze and C. Yau, \"MixDir: Scalable Bayesian Clustering for High-Dimensional Categorical Data\", \n2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 2018, pp. 526-539.\n\n## Installation\n\n```{r installation, eval=FALSE, include=TRUE}\ninstall.packages(\"mixdir\")\n\n# Or to get the latest version from github\ndevtools::install_github(\"const-ae/mixdir\")\n```\n\n\n## Example\n\nClustering the [mushroom](https://archive.ics.uci.edu/ml/datasets/mushroom) data set.\n\n![](man/figures/README_plots/clustering_overview.png)\n\n```{r example_load}\n# Loading the library and the data\nlibrary(mixdir)\nset.seed(1)\n\ndata(\"mushroom\")\n# High dimensional dataset: 8124 mushroom and 23 different features\nmushroom[1:10, 1:5]\n```\n\nCalling the clustering function `mixdir` on a subset of the data:\n\n```{r}\n# Clustering into 3 latent classes\nresult \u003c- mixdir(mushroom[1:1000,  1:5], n_latent=3)\n```\n\n\nAnalyzing the result\n\n```{r example}\n# Latent class of of first 10 mushrooms\nhead(result$pred_class, n=10)\n\n# Soft Clustering for first 10 mushrooms\nhead(result$class_prob, n=10)\npheatmap::pheatmap(result$class_prob, cluster_cols=FALSE,\n                  labels_col = paste(\"Class\", 1:3))\n\n# Structure of latent class 1\n# (bruises, cap color either yellow or white, edible etc.)\npurrr::map(result$category_prob, 1)\n\n# The most predicitive features for each class\nfind_predictive_features(result, top_n=3)\n# For example: if all I know about a mushroom is that it has a\n# yellow cap, then I am 99% certain that it will be in class 1\npredict(result, c(`cap-color`=\"yellow\"))\n\n# Note the most predictive features are different from the most typical ones\nfind_typical_features(result, top_n=3)\n```\n\nDimensionality Reduction\n\n```{r fig.width=8, fig.asp=0.31}\n# Defining Features\ndef_feat \u003c- find_defining_features(result, mushroom[1:1000,  1:5], n_features = 3)\nprint(def_feat)\n\n# Plotting the most important features gives an immediate impression\n# how the cluster differ\nplot_features(def_feat$features, result$category_prob)\n```\n\n\n\n# Underlying Model\n\nThe package implements a variational inference algorithm to solve a Bayesian latent class model (LCM). \n\n\n\u003cdiv class=\"figure\"\u003e\u003cimg src=\"man/figures/README_plots/equations_model.png\" align=\"center\" style=\"height: 150px\" \u003e\u003c/div\u003e\n\n\n\n![](man/figures/README_plots/model_plate_notation.png)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconst-ae%2Fmixdir","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconst-ae%2Fmixdir","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconst-ae%2Fmixdir/lists"}