https://github.com/bearloga/dpmclust
Bayesian nonparametric clustering in R using DP-means
https://github.com/bearloga/dpmclust
Last synced: 8 months ago
JSON representation
Bayesian nonparametric clustering in R using DP-means
- Host: GitHub
- URL: https://github.com/bearloga/dpmclust
- Owner: bearloga
- License: other
- Created: 2018-10-08T16:08:43.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-03-06T17:43:25.000Z (over 6 years ago)
- Last Synced: 2025-04-10T16:13:03.929Z (about 1 year ago)
- Language: R
- Size: 141 KB
- Stars: 12
- Watchers: 2
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DP-means clustering in R
This package implements the DP-means algorithm introduced by Kulis and Jordan in their article *[Revisiting k-means: New Algorithms via Bayesian Nonparametrics](https://arxiv.org/abs/1111.0352)*. Instead of specifying how many clusters to partition the data into, like one would with k-means, user specifies a penalty parameter λ which controls if/when new clusters are created during iterations:

The algorithm starts with a single cluster and then processes the data points, creating new clusters when needed, and then updates centers until convergence.
## Installation
```R
# install.packages("remotes")
remotes::install_github("bearloga/dpmclust")
```
## Usage
`dp_means()` returns an object with same class and components as `kmeans()` does, which makes it easy to use other packages that support the `kmeans` object (e.g. [`autoplot()` in the `ggfortify` package](https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html)).
```R
y <- dp_means(x, lambda = 1)
# y$cluster
```
## Future Work
Need to implement [lambda means](https://ieeexplore.ieee.org/document/7899984) algorithm for choosing optimal λ.