Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dlab-berkeley/Unsupervised-Learning-in-R

Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).
https://github.com/dlab-berkeley/Unsupervised-Learning-in-R

clustering dimensionality-reduction glrm hdbscan isolation-forests latent-class-analysis umap unsupervised-learning

Last synced: 2 days ago
JSON representation

Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).

Awesome Lists containing this project

README

        

# Unsupervised Learning in R

Unsupervised machine learning is a class of algorithms that identifies patterns
in unlabeled data, i.e. without considering an outcome or target. This workshop
will describe and demonstrate powerful unsupervised learning algorithms used for
**clustering** (hdbscan, latent class analysis, hopach), **dimensionality
reduction** (umap, generalized low-rank models), and **anomaly detection** (isolation forests).
Participants will learn how to structure unsupervised
learning analyses and will gain familiarity with example code that can be
adapted to their own projects.

**Author**: [Chris Kennedy](http://github.com/ck37)

## Prerequisites

This is an intermediate machine learning workshop. Participants should have
significant prior experience with R and RStudio, including manipulation of data
frames, installation of packages, and plotting.

**Prerequisite workshops**

* [R Fundamentals](https://github.com/dlab-berkeley/R-Fundamentals) or similar training in R basics.

**Recommended workshops**

* [Machine Learning in R](https://github.com/dlab-berkeley/Machine-Learning-in-R) or other supervised learning experience.

## Technology requirements

Participants should have access to a computer with the following software:

* [R version 3.6](https://cran.rstudio.com/) or greater
* [RStudio](https://rstudio.com/products/rstudio/download/#download)
* [RTools](https://cran.r-project.org/bin/windows/Rtools/) - if using Windows

## Initial steps for participants

To prepare for the workshop, please download the materials and work through the package installation in `0-install.Rmd`. Please report any errors to the [GitHub issue queue](https://github.com/dlab-berkeley/Unsupervised-Learning-in-R/issues).

There is also an [RStudio Cloud workspace](https://rstudio.cloud/project/930459) that can be used.

## Reporting errors or giving feedback

Please [create a GitHub issue](https://github.com/dlab-berkeley/Unsupervised-Learning-in-R/issues) to report any errors or give feedback on this workshop.

## Resources

Books

* Boemke & Greenwell (2019). [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/) - free online version
* Hennig et al. (2015). [Handbook of Cluster Analysis](https://smile.amazon.com/Handbook-Cluster-Analysis-Handbooks-Statistical-ebook/dp/B019FNKOJ4) - thorough and highly recommended
* Aggarwal & Reddy. (2014). [Data clustering: algorithms and applications](https://smile.amazon.com/Data-Clustering-Algorithms-Applications-Knowledge-ebook/dp/B00EYROAQU/) - great complement to Hennig et al.
* Dolnicar et al. (2018). [Market segmentation analysis](https://smile.amazon.com/Market-Segmentation-Analysis-Understanding-Professionals-ebook/dp/B07FQDSF3X/) - free, closely tied to R, and chapter 7 is especially helpful
* Izenman (2013). [Modern Multivariate Statistical Techniques](https://www.amazon.com/Modern-Multivariate-Statistical-Techniques-Classification-ebook/dp/B00HWUR9CS/)
* Everitt et al. (2011). [Cluster Analysis](https://www.amazon.com/Cluster-Analysis-Wiley-Probability-Statistics-ebook/dp/B005CPJSME)