Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kthohr/stats

A C++ header-only library of statistical distribution functions.
https://github.com/kthohr/stats

armadillo armadillo-library blaze cdf constexpr cpp cpp11 density-functions distributions eigen eigen3 numerical-methods openmp probability quantile quantile-functions statistics stats

Last synced: 6 days ago
JSON representation

A C++ header-only library of statistical distribution functions.

Awesome Lists containing this project

README

        

# StatsLib   [![Mentioned in Awesome Cpp](https://awesome.re/mentioned-badge.svg)](https://github.com/fffaraz/awesome-cpp#math) [![Build Status](https://github.com/kthohr/stats/actions/workflows/main.yml/badge.svg)](https://github.com/kthohr/stats/actions/workflows/main.yml) [![Coverage Status](https://codecov.io/github/kthohr/stats/coverage.svg?branch=master)](https://codecov.io/github/kthohr/stats?branch=master) [![License](https://img.shields.io/badge/Licence-Apache%202.0-blue.svg)](./LICENSE) [![Documentation Status](https://readthedocs.org/projects/statslib/badge/?version=latest)](https://statslib.readthedocs.io/en/latest/?badge=latest)

StatsLib is a templated C++ library of statistical distribution functions, featuring unique compile-time computing capabilities and seamless integration with several popular linear algebra libraries.

Features:
* A header-only library of probability density functions, cumulative distribution functions, quantile functions, and random sampling methods.
* Functions are written in C++11 `constexpr` format, enabling the library to operate as both a compile-time and run-time computation engine.
* Designed with a simple **R**-like syntax.
* Optional vector-matrix functionality with wrappers to support:
* STL Vectors (`std::vector`)
* [Armadillo](http://arma.sourceforge.net/)
* [Blaze](https://bitbucket.org/blaze-lib/blaze)
* [Eigen](http://eigen.tuxfamily.org/index.php)
* Matrix-based operations are parallelizable with OpenMP.
* Released under a permissive, non-GPL license.

### Contents:
* [Distributions](#distributions)
* [Installation](#installation-and-depdencies)
* [Documentation](#documentation)
* [Jupyter Notebook](#jupyter-notebook)
* [Options](#Options)
* [Syntax and Examples](#syntax-and-examples)
* [Compile-time Computation Capabilities](#compile-time-computing-capabilities)
* [Author and License](#author)

## Distributions

Functions to compute the cdf, pdf, quantile, as well as random sampling methods, are available for the following distributions:

* Bernoulli
* Beta
* Binomial
* Cauchy
* Chi-squared
* Exponential
* F
* Gamma
* Inverse-Gamma
* Inverse-Gaussian
* Laplace
* Logistic
* Log-Normal
* Normal (Gaussian)
* Poisson
* Rademacher
* Student's t
* Uniform
* Weibull

In addition, pdf and random sampling functions are available for several multivariate distributions:

* inverse-Wishart
* Multivariate Normal
* Wishart

## Installation and Dependencies

StatsLib is a header-only library. Simply add the header files to your project using
```cpp
#include "stats.hpp"
```

The only dependency is the latest version of [GCEM](https://github.com/kthohr/gcem) and a C++11 compatible compiler.

## Documentation

Full documentation is available online:

[![Documentation Status](https://readthedocs.org/projects/statslib/badge/?version=latest)](https://statslib.readthedocs.io/en/latest/?badge=latest)

A PDF version of the documentation is available [here](https://buildmedia.readthedocs.org/media/pdf/statslib/latest/statslib.pdf).

## Jupyter Notebook

You can test the library online using an interactive Jupyter notebook:

[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/kthohr/stats/master?filepath=notebooks%2Fstats.ipynb)

## Options

The following options should be declared **before** including the StatsLib header files.

* For inline-only functionality (i.e., no `constexpr` specifiers):
```cpp
#define STATS_GO_INLINE
```

* OpenMP functionality is enabled by default if the `_OPENMP` macro is detected (e.g., by invoking `-fopenmp` with GCC or Clang). To explicitly enable OpenMP features use:
```cpp
#define STATS_USE_OPENMP
```

* To disable OpenMP functionality:
```cpp
#define STATS_DONT_USE_OPENMP
```

* To use StatsLib with Armadillo, Blaze or Eigen:
```cpp
#define STATS_ENABLE_ARMA_WRAPPERS
#define STATS_ENABLE_BLAZE_WRAPPERS
#define STATS_ENABLE_EIGEN_WRAPPERS
```

* To enable wrappers for `std::vector`:
```cpp
#define STATS_ENABLE_STDVEC_WRAPPERS
```

## Syntax and Examples

Functions are called using an **R**-like syntax. Some general rules:

* density functions: `stats::d*`. For example, the Normal (Gaussian) density is called using
``` cpp
stats::dnorm(,,);
```
* cumulative distribution functions: `stats::p*`. For example, the Gamma CDF is called using
``` cpp
stats::pgamma(,,);
```
* quantile functions: `stats::q*`. For example, the Beta quantile is called using
``` cpp
stats::qbeta(,,);
```
* random sampling: `stats::r*`. For example, to generate a single draw from the Logistic distribution:
``` cpp
stats::rlogis(,,);
```


All of these functions have matrix-based equivalents using Armadillo, Blaze, and Eigen dense matrices.

* The pdf, cdf, and quantile functions can take matrix-valued arguments. For example,

```cpp
// Using Armadillo:
arma::mat norm_pdf_vals = stats::dnorm(arma::ones(10,20),1.0,2.0);
```

* The randomization functions (`r*`) can output random matrices of arbitrary size. For example, For example, the following code will generate a 100-by-50 matrix of iid draws from a Gamma(3,2) distribution:

```cpp
// Armadillo:
arma::mat gamma_rvs = stats::rgamma(100,50,3.0,2.0);

// Blaze:
blaze::DynamicMatrix gamma_rvs = stats::rgamma>(100,50,3.0,2.0);

// Eigen:
Eigen::MatrixXd gamma_rvs = stats::rgamma(100,50,3.0,2.0);
```

* All matrix-based operations are parallelizable with OpenMP. For GCC and Clang compilers, simply include the `-fopenmp` option during compilation.

### Seeding Values

Random number seeding is available in two forms: seed values and random number engines.

* Seed values are passed as unsigned integers. For example, to generate a draw from a normal distribution N(1,2) with seed value 1776:
``` cpp
stats::rnorm(1,2,1776);
```
* Random engines in StatsLib use the 64-bit Mersenne-Twister generator (`std::mt19937_64`) and are passed by reference. Example:
``` cpp
std::mt19937_64 engine(1776);
stats::rnorm(1,2,engine);
```

### Examples

More examples with code:
```cpp
// evaluate the normal PDF at x = 1, mu = 0, sigma = 1
double dval_1 = stats::dnorm(1.0,0.0,1.0);

// evaluate the normal PDF at x = 1, mu = 0, sigma = 1, and return the log value
double dval_2 = stats::dnorm(1.0,0.0,1.0,true);

// evaluate the normal CDF at x = 1, mu = 0, sigma = 1
double pval = stats::pnorm(1.0,0.0,1.0);

// evaluate the Laplacian quantile at p = 0.1, mu = 0, sigma = 1
double qval = stats::qlaplace(0.1,0.0,1.0);

// draw from a t-distribution dof = 30
double rval = stats::rt(30);

// matrix output
arma::mat beta_rvs = stats::rbeta(100,100,3.0,2.0);

// matrix input
arma::mat beta_cdf_vals = stats::pbeta(beta_rvs,3.0,2.0);
```

## Compile-time Computing Capabilities

StatsLib is designed to operate equally well as a compile-time computation engine. Compile-time computation allows the compiler to replace function calls (e.g., `dnorm(0,0,1)`) with static values in the source code. That is, functions are evaluated during the compilation process, rather than at run-time. This capability is made possible due to the templated `constexpr` design of the library and can be verified by inspecting the assembly code generated by the compiler.

The compile-time features are enabled using the `constexpr` specifier. The example below computes the pdf, cdf, and quantile function of the Laplace distribution.
```cpp
#include "stats.hpp"

int main()
{

constexpr double dens_1 = stats::dlaplace(1.0,1.0,2.0); // answer = 0.25
constexpr double prob_1 = stats::plaplace(1.0,1.0,2.0); // answer = 0.5
constexpr double quant_1 = stats::qlaplace(0.1,1.0,2.0); // answer = -2.218875...

return 0;
}
```
Assembly code generated by Clang without any optimization:
```assembly
LCPI0_0:
.quad -4611193153885729483 ## double -2.2188758248682015
LCPI0_1:
.quad 4602678819172646912 ## double 0.5
LCPI0_2:
.quad 4598175219545276417 ## double 0.25000000000000006
.section __TEXT,__text,regular,pure_instructions
.globl _main
.p2align 4, 0x90
_main: ## @main
push rbp
mov rbp, rsp
xor eax, eax
movsd xmm0, qword ptr [rip + LCPI0_0] ## xmm0 = mem[0],zero
movsd xmm1, qword ptr [rip + LCPI0_1] ## xmm1 = mem[0],zero
movsd xmm2, qword ptr [rip + LCPI0_2] ## xmm2 = mem[0],zero
mov dword ptr [rbp - 4], 0
movsd qword ptr [rbp - 16], xmm2
movsd qword ptr [rbp - 24], xmm1
movsd qword ptr [rbp - 32], xmm0
pop rbp
ret
```

## Author

Keith O'Hara

## License

Apache Version 2