An open API service indexing awesome lists of open source software.

https://github.com/acerbilab/pyibs

Inverse binomial sampling for efficient log-likelihood estimation of simulator models in Python
https://github.com/acerbilab/pyibs

likelihood-free log-likelihood python sampling simulation-model

Last synced: 3 months ago
JSON representation

Inverse binomial sampling for efficient log-likelihood estimation of simulator models in Python

Awesome Lists containing this project

README

        

# PyIBS: Inverse Binomial Sampling in Python

## What is it?

PyIBS is a a Python implementation of the Inverse Binomial Sampling (IBS) estimator for obtaining unbiased, efficient estimates of the log-likelihood of a model by simulation, originally implemented [in MATLAB](https://github.com/acerbilab/ibs). [[1](#references)]

## When should I use PyIBS?

The typical scenario is the case in which you have a *simulator*, that is a model from which you can randomly draw synthetic observations (for a given parameter vector), but cannot evaluate the log-likelihood analytically or numerically. In other words, IBS affords likelihood-based inference for models without explicit likelihood functions (also known as implicit models).

IBS is commonly used as a part of an algorithm for maximum-likelihood estimation or Bayesian inference.

- For maximum-likelihood (or maximum-a-posteriori) estimation, we recommend to use PyIBS combined with [Bayesian Adaptive Direct Search (PyBADS)](https://github.com/acerbilab/pybads).
- For Bayesian inference of posterior distributions and model evidence, we recommend to use PyIBS with [Variational Bayesian Monte Carlo (PyVBMC)](https://github.com/acerbilab/pyvbmc).

This folder contains Python implementations and examples of IBS.

## Installation

PyIBS is available via `pip` and `conda-forge`.

1. Install with:
```console
python -m pip install pyibs
```
or:
```console
conda install --channel=conda-forge pyibs
```
PyIBS requires Python version 3.9 or newer.

### Quick start

The typical workflow of PyIBS follows four steps:

1. Define the model function (a generative function from which you can draw synthetic observations);
2. Setup the problem configuration (response matrix, stimulus matrix);
3. Initialize and then run the estimator OR use the estimator as objective function in an optimization (for example [Bayesian Adaptive Direct Search (PyBADS)](https://github.com/acerbilab/pybads) or [Variational Bayesian Monte Carlo (PyVBMC)](https://github.com/acerbilab/pyvbmc));
4. Examine and visualize the results.

Initializing and running the estimator in step 3 only involves a couple of lines of code:

```
from pyibs import IBS
# ...
ibs = IBS(sample_from_model, response_matrix, design_matrix)
neg_logl = ibs(params, num_reps, additional_output, return_positive)
```

with input arguments for the initialization:

- ``sample_from_model``: model function, it takes as input a vector of parameters ``params`` and ``design_matrix`` and generates a matrix of simulated model responses (one row per trial, corresponding to rows of design_matrix);
- ``response_matrix``: the observed responses;
- ``design_matrix``: used as input to sample from the model;
and following optional input arguments:
- ``vectorized``: indicates whether to use a vectorized sampling algorithm with acceleration. If it is not given, the vectorized algorithm is used if the time to generate samples for each trial is less than ``vectorized_threshold``.
- ``acceleration``: acceleration factor for vectorized sampling, default = ``1.5``;
- ``num_samples_per_call``: number of starting samples per trial per function call. If equal to ``0`` the number of starting samples is chosen automatically, default = ``0``;
- ``max_iter``: maximum number of iterations (per trial and estimate), default = ``1e5``;
- ``max_time``: maximum time for an IBS call (in seconds), default = ``np.inf``;
- ``max_samples``: maximum number of samples per function call, default = ``1e4``;
- ``acceleration_threshold``: threshold at which to stop accelerating (in seconds), default = ``0.1``;
- ``vectorized_threshold``: maximum threshold for using the vectorized algorithm (in seconds), default = ``0.1``;
- ``max_mem``: maximum number of samples for the vectorized implementation, default = ``1e6``;
- ``neg_logl_threshold``: threshold for the negative log-likelihood (works differently in vectorized version), default = ``np.inf``.

Input arguments for running the estimator:

- ``params``: parameter vector used to simulate the model's responses;
- ``num_reps``: number of independent log-likelihood estimates to calculate, an average of the repetitions is returned. If not given, it is 10;
- ``additional_output``: The output type, if not given then only the negative log-likelihood is returned. If equal to:
- ``var`` then the negative log-likelihood and the variance of the negative log-likelihood estimate is returned,
- ``std`` then the negative log-likelihood and the standard deviation of the negative log-likelihood estimate is returned,
- ``full`` then a dictionary type output is returned with additional information about the estimate;
- ``return_positive``: boolean that indicates whether to return the positive log-likelihood. If not given, the negative log-likelihood estimate is returned;

The outputs are:

- ``neg_logl``: the negative log-likelihood (if return_positive is False else positive log-likelihood);
- ``neg_logl_var``: the variance of the negative log-likelihood estimate (if additional_output is ``var``);
- ``neg_logl_std``: the standard deviation of negative log-likelihood estimate (if additional_output is ``std``);
If ``additional_output`` is ``full`` then a dictionary type output is returned with following additional information about the sampling:
- ``exit_flag``: the exit flag (0 = correct termination, 1 = negative log-likelihood threshold reached, 2 = maximum runtime reached, 3 = maximum iterations reached);
- ``message``: the exit message;
- ``elapsed_time``: the elapsed time (in seconds);
- ``num_samples_per_trial``: the number of samples per trial;
- ``fun_count``: the number of ``sample_from_model`` function evaluations in the call;

## Code

- `ibs_basic.py` is a bare-bone implementation of IBS for didactic purposes.
- `psycho_generator.py` and `psycho_neg_logl.py` are functions implementing, respectively, the generative model (simulator) and the negative log-likelihood function for the orientation discrimination model used in the example notebooks.
- `ibs.py` is an advanced vectorized implementation of IBS, which supports several advanced features: it allows for repeated sampling, early stopping through a log-likelihood threshold and to return variance or standard deviation of the estimation.
- Initialize an IBS object by passing it a generative model, a response matrix and a design matrix. Call the object with a parameter to return an estimate of the *negative* log-likelihood.
- Note that by default it returns the *negative* log-likelihood as it is meant to be used with an optimization method such as [PyBADS](https://github.com/acerbilab/pybads). Set `return_positive = true` to return the *positive* log-likelihood.
- If you want to run with [PyVBMC](https://github.com/acerbilab/pyvbmc), note that you need to pass the following arguments when calling the IBS object
- `return_positive = true` to return the *positive* log-likelihood;
- `additinal_output = std` to return as second output the standard deviation of the estimate.
- `ibs_simple_example.ipynb` is an example notebook for running `ibs_basic.py`. It is only for didactic purposes.
- `ibs_example_1_basic_use.ipynb` is an example notebook for running `ibs.py`. It contains an example on how to run the estimations and how to obtain different output types. It contains examples using the orientation discrimination model and one using a binomial model. The unbiasedness of the estimator is checked; this notebook is only for didactic purposes.
- `ibs_example_2_parameter_estimation.ipynb` is a full working example usage of IBS. It requires the installation of [PyBADS](https://github.com/acerbilab/pybads) and [PyVBMC](https://github.com/acerbilab/pyvbmc).

## References

1. van Opheusden\*, B., Acerbi\*, L. & Ma, W.J. (2020). Unbiased and efficient log-likelihood estimation with inverse binomial sampling. *PLoS Computational Biology* 16(12): e1008483. (\* equal contribution) ([link](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008483))