https://github.com/ofai/bayesiannonparametrics.jl

BayesianNonparametrics in julia
https://github.com/ofai/bayesiannonparametrics.jl

bayesian-machine-learning bayesian-methods bayesian-nonparametric-models dirichlet-process-mixtures julia

Last synced: 3 months ago
JSON representation

BayesianNonparametrics in julia

Host: GitHub
URL: https://github.com/ofai/bayesiannonparametrics.jl
Owner: OFAI
License: other
Created: 2016-12-02T15:13:39.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-09-06T11:32:22.000Z (almost 7 years ago)
Last Synced: 2025-03-27T10:01:49.650Z (3 months ago)
Topics: bayesian-machine-learning, bayesian-methods, bayesian-nonparametric-models, dirichlet-process-mixtures, julia
Language: Julia
Homepage: http://ofai.at
Size: 284 KB
Stars: 32
Watchers: 2
Forks: 11
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        BayesianNonparametrics.jl

===========

[![Build Status](https://travis-ci.org/OFAI/BayesianNonparametrics.jl.svg?branch=master)](https://travis-ci.org/OFAI/BayesianNonparametrics.jl)

[![Coverage Status](https://coveralls.io/repos/github/OFAI/BayesianNonparametrics.jl/badge.svg?branch=master)](https://coveralls.io/github/OFAI/BayesianNonparametrics.jl?branch=master)

*BayesianNonparametrics* is a Julia package implementing state-of-the-art Bayesian nonparametric models for medium-sized unsupervised problems. The software package brings Bayesian nonparametrics to non-specialists allowing the widespread use of Bayesian nonparametric models. Emphasis is put on consistency, performance and ease of use allowing easy access to Bayesian nonparametric models inside Julia.

*BayesianNonparametrics* allows you to

- explain discrete or continous data using: Dirichlet Process Mixtures or Hierarchical Dirichlet Process Mixtures

- analyse variable dependencies using: Variable Clustering Model

- fit multivariate or univariate distributions for discrete or continous data with conjugate priors

- compute point estimtates of Dirichlet Process Mixtures posterior samples

#### News

*BayesianNonparametrics* is Julia 0.7  / 1.0 compatible

Installation

------------

You can install the package into your running Julia installation using Julia's package manager, i.e.

```julia

pkg> add BayesianNonparametrics

```

Documentation

-------------

Documentation is available in Markdown:

[documentation](docs/README.md)

Example

-------

The following example illustrates the use of *BayesianNonparametrics* for clustering of continuous observations using a Dirichlet Process Mixture of Gaussians. 

After loading the package:

```julia

using BayesianNonparametrics

```

we can generate a 2D synthetic dataset (or use a multivariate continuous dataset of interest)

```julia

(X, Y) = bloobs(randomize = false)

```

and construct the parameters of our base distribution:

```julia

μ0 = vec(mean(X, dims = 1))

κ0 = 5.0

ν0 = 9.0

Σ0 = cov(X)

H = WishartGaussian(μ0, κ0, ν0, Σ0)

```

After defining the base distribution we can specify the model:

```julia

model = DPM(H)

```

which is in this case a Dirichlet Process Mixture. Each model has to be initialised, one possible initialisation approach for Dirichlet Process Mixtures is a k-Means initialisation:

```julia

modelBuffer = init(X, model, KMeansInitialisation(k = 10))

```

The resulting buffer object can now be used to apply posterior inference on the model given `X`. In the following we apply Gibbs sampling for 500 iterations without burn in or thining:

```julia

models = train(modelBuffer, DPMHyperparam(), Gibbs(maxiter = 500))

```

You shoud see the progress of the sampling process in the command line. After applying Gibbs sampling, it is possible explore the posterior based on their posterior densities,

```julia

densities = map(m -> m.energy, models)

```

number of active components

```julia

activeComponents = map(m -> sum(m.weights .> 0), models)

```

or the groupings of the observations:

```julia

assignments = map(m -> m.assignments, models)

```

The following animation illustrates posterior samples obtained by a Dirichlet Process Mixture: 

![alt text](posteriorSamples.gif "Posterior Sample")

Alternatively, one can compute a point estimate based on the posterior similarity matrix:

```julia

A = reduce(hcat, assignments)

(N, D) = size(X)

PSM = ones(N, N)

M = size(A, 2)

for i in 1:N

  for j in 1:i-1

    PSM[i, j] = sum(A[i,:] .== A[j,:]) / M

    PSM[j, i] = PSM[i, j]

  end

end

```

and find the optimal partition which minimizes the lower bound of the variation of information:

```julia

mink = minimum(length(m.weights) for m in models)

maxk = maximum(length(m.weights) for m in models)

(peassignments, _) = pointestimate(PSM, method = :average, mink = mink, maxk = maxk)

```

The grouping wich minimizes the lower bound of the variation of information is illustrated in the following image:

![alt text](pointestimate.png "Point Estimate")

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ofai/bayesiannonparametrics.jl

Awesome Lists containing this project

README