https://github.com/tmaklin/rcgpar

c++ library for parallel and distributed estimation of mixture model components using variational inference.
https://github.com/tmaklin/rcgpar

c-plus-plus hpc mixture-model mpi openmp variational-inference

Last synced: 5 months ago
JSON representation

c++ library for parallel and distributed estimation of mixture model components using variational inference.

Host: GitHub
URL: https://github.com/tmaklin/rcgpar
Owner: tmaklin
License: lgpl-2.1
Created: 2021-11-03T13:28:24.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2025-11-15T12:53:10.000Z (7 months ago)
Last Synced: 2025-11-15T14:35:00.248Z (7 months ago)
Topics: c-plus-plus, hpc, mixture-model, mpi, openmp, variational-inference
Language: C++
Homepage:
Size: 1.19 MB
Stars: 0
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          **Note** rcgpar has been superceded by

[mixt](https://codeberg.org/themaklin/mixt) as of 24 November 2025.

# rcgpar - Fit mixture models in HPC environments

rcgpar provides CPU and GPU implementations of a variational

inference algorithm for estimating mixture model components from a

likelihood matrix in parallel.

## Installation

rcgpar is currently only available compiled from source.

## Compiling from source

### Requirements

- C++17 compliant compiler.

- cmake

#### Optional

- Compiler with OpenMP support.

- [LibTorch](https://pytorch.org/get-started/locally/)

- CUDA Toolkit or ROCm

### Compiling

Clone the rcgpar repository

```

git clone https://github.com/tmaklin/rcgpar

```

enter the directory and run

```

cd rcgpar

mkdir build

cd build

```

... and follow the instructions below.

#### CPU estimation only

in the `build/` directory, run

```

cmake ..

make

```

This creates the `librcgomp` library in `build/lib/`.

#### GPU acceleration

instead of the above, run

```

cmake -DCMAKE_LIBTORCH_PATH=/absolute/path/to/libtorch ..

make

```

where `/absolute/path/to/libtorch` should be the absolute (!) path to the LibTorch distribution.

This creates the `librcgomp` and `librcggpu` libraries in `build/lib/`.

## Usage

Link against `librcgomp` and/or `librcggpu` and include the

`rcgpar.hpp` header in your project. This header provides four

functions:

- 'rcgpar::rcg\_optl\_omp' using OpenMP

- 'rcgpar::rcg\_optl\_torch' using LibTorch

- 'rcgpar::em\_torch' a different algorithm using LibTorch.

The LibTorch algorithms will run on the GPU if one is

present. Otherwise, they will run on the CPU. These algorithms are

faster even when ran on the CPU but `rcg_optl_torch` consumes more

memory than `rcg_optl_omp`.

### rcg\_optl\_omp, rcg\_optl\_mpi, rcg\_optl\_torch, and em\_torch

These four functions perform the actual model fitting. All have to be called with the following

arguments:

```

const rcgpar::Matrix &logl:

    KxN row-major order matrix containing the log-likelihoods for theobservations,

    where K is the number of components and N is the number of observations.

const std::vector &log_times_observed:

    N-dimensional vector which contains the natural logarithm of the number

	of times that the N:th row in `logl` should be counted. Useful if many

	rows in the log-likelihood matrix are identical - they can be compressed

	by counting them several times via this argument.

const std::vector &alpha0:

    N-dimensional vector containing the prior parameters of the Dirichlet

	distribution that is used as a conjugate prior in the model. Good

	default choice is to set all entries to 1.

const double &tol:

    The estimation process will terminate once the evidence lower bound

	ELBO changes by less than this value from one iteration to the next.

	Good choices are around 1e-6 and 1e-8, adjust according to your needs.

const uint16_t maxiters:

    Maximum number of iterations to run the optimizer for if the tolerance

	criterion is not fulfilled.

std::ostream &log:

    Print status messages here. Silence the messages by supplying a

	std::ofstream that has not been assigned to any file.

```

'em\_torch' requires the extra argument:

```

std::string precision:

    Either "float" or "double", which determines the precision of the algorithm.

```

The optimizers return a KxN `rcgpar::Matrix` type row-major order

matrix, where each row is a probability vector assigning the row to

the mixture components.

Note: rcg\_optl\_mpi assumes that the root process holds the full

'logl' and 'log\_times\_observed values', which are then distributed

from the root process to other processes. Contrary to this, 'alpha0',

'tol', and 'maxiters' are assumed to be present on all processes when

calling rcg\_optl\_mpi.

### mixture\_components and mixture\_components\_torch

Use 'rcgpar::mixture\_components\(_torch)' to transform the matrix from

rcg\_optl\_omp/mpi/torch into a probability vector containing the relative

contributions of each mixture component. 'mixture\_components\(_torch)' takes

the following input arguments:

```

const rcgpar::Matrix &probs:

    The matrix returned from rcg_optl_omp/torch, em_torch, or rcg_optl_mpi.

const std::vector &log_times_observed:

    The N-dimensional vector of log times observed that was used

	as input to the call to rcg_optl_omp/torch, em_torch, or rcg_optl_mpi.

```

'mixture\_components\(_torch)' will return a N-dimensional probability vector

containing the mixture component proportions.

### Creating the input matrix

rcgpar requires the input log-likelihood matrix formatted with the

internal rcgpar::Matrix class. If your input log-likelihoods are

stored in a flattened vector, you can construct the input object to

rcg\_optl\_omp/mpi with the constructor:

```

Matrix(std::vector &flattened_logl,

               uint16_t n_mixture_components, uint32_t n_observations)

```

If your data is stored in a 2D vector, use the following constructor:

```

Matrix(std::vector> &logl_2D)

```

Note that both constructors assume the data is stored in row-major

order.

## License

The source code from this project is subject to the terms of the

LGPL-2.1 license. A copy of the LGPL-2.1 license is supplied with the

project, or can be obtained at

https://opensource.org/licenses/LGPL-2.1.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tmaklin/rcgpar

Awesome Lists containing this project

README