Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dlab-berkeley/Unsupervised-Learning-in-R
Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).
https://github.com/dlab-berkeley/Unsupervised-Learning-in-R
clustering dimensionality-reduction glrm hdbscan isolation-forests latent-class-analysis umap unsupervised-learning
Last synced: 2 days ago
JSON representation
Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).
- Host: GitHub
- URL: https://github.com/dlab-berkeley/Unsupervised-Learning-in-R
- Owner: dlab-berkeley
- License: other
- Created: 2020-01-10T18:20:34.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-06-08T16:45:20.000Z (over 4 years ago)
- Last Synced: 2024-08-03T06:03:23.753Z (3 months ago)
- Topics: clustering, dimensionality-reduction, glrm, hdbscan, isolation-forests, latent-class-analysis, umap, unsupervised-learning
- Language: R
- Homepage: https://dlab-berkeley.github.io/Unsupervised-Learning-in-R/slides.html
- Size: 472 KB
- Stars: 45
- Watchers: 10
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Unsupervised Learning in R
Unsupervised machine learning is a class of algorithms that identifies patterns
in unlabeled data, i.e. without considering an outcome or target. This workshop
will describe and demonstrate powerful unsupervised learning algorithms used for
**clustering** (hdbscan, latent class analysis, hopach), **dimensionality
reduction** (umap, generalized low-rank models), and **anomaly detection** (isolation forests).
Participants will learn how to structure unsupervised
learning analyses and will gain familiarity with example code that can be
adapted to their own projects.**Author**: [Chris Kennedy](http://github.com/ck37)
## Prerequisites
This is an intermediate machine learning workshop. Participants should have
significant prior experience with R and RStudio, including manipulation of data
frames, installation of packages, and plotting.**Prerequisite workshops**
* [R Fundamentals](https://github.com/dlab-berkeley/R-Fundamentals) or similar training in R basics.
**Recommended workshops*** [Machine Learning in R](https://github.com/dlab-berkeley/Machine-Learning-in-R) or other supervised learning experience.
## Technology requirements
Participants should have access to a computer with the following software:
* [R version 3.6](https://cran.rstudio.com/) or greater
* [RStudio](https://rstudio.com/products/rstudio/download/#download)
* [RTools](https://cran.r-project.org/bin/windows/Rtools/) - if using Windows
## Initial steps for participantsTo prepare for the workshop, please download the materials and work through the package installation in `0-install.Rmd`. Please report any errors to the [GitHub issue queue](https://github.com/dlab-berkeley/Unsupervised-Learning-in-R/issues).
There is also an [RStudio Cloud workspace](https://rstudio.cloud/project/930459) that can be used.
## Reporting errors or giving feedbackPlease [create a GitHub issue](https://github.com/dlab-berkeley/Unsupervised-Learning-in-R/issues) to report any errors or give feedback on this workshop.
## Resources
Books
* Boemke & Greenwell (2019). [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/) - free online version
* Hennig et al. (2015). [Handbook of Cluster Analysis](https://smile.amazon.com/Handbook-Cluster-Analysis-Handbooks-Statistical-ebook/dp/B019FNKOJ4) - thorough and highly recommended
* Aggarwal & Reddy. (2014). [Data clustering: algorithms and applications](https://smile.amazon.com/Data-Clustering-Algorithms-Applications-Knowledge-ebook/dp/B00EYROAQU/) - great complement to Hennig et al.
* Dolnicar et al. (2018). [Market segmentation analysis](https://smile.amazon.com/Market-Segmentation-Analysis-Understanding-Professionals-ebook/dp/B07FQDSF3X/) - free, closely tied to R, and chapter 7 is especially helpful
* Izenman (2013). [Modern Multivariate Statistical Techniques](https://www.amazon.com/Modern-Multivariate-Statistical-Techniques-Classification-ebook/dp/B00HWUR9CS/)
* Everitt et al. (2011). [Cluster Analysis](https://www.amazon.com/Cluster-Analysis-Wiley-Probability-Statistics-ebook/dp/B005CPJSME)