Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/adamhaber/pybrms

Pythonic interface to R's brms
https://github.com/adamhaber/pybrms
Last synced: 4 days ago
JSON representation
Pythonic interface to R's brms
Host: GitHub
URL: https://github.com/adamhaber/pybrms
Owner: adamhaber
License: apache-2.0
Created: 2019-12-10T10:11:57.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2022-02-24T05:31:38.000Z (over 2 years ago)
Last Synced: 2024-06-14T16:29:45.668Z (8 days ago)
Language: Jupyter Notebook
Size: 1.45 MB
Stars: 43
Watchers: 1
Forks: 7
Open Issues: 7
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Lists

awesome-stars - pybrms
README

        # pybrms 

> a pythonic interface for R's brms

brms is a fantastic R package that allows users to fit many kinds of Bayesian regression models - linear models, GLMs, survival analysis, etc - all in a multilevel context. Models are concisely specified using R's formula syntax, and the corresponding Stan program and data are automatically generated.

pybrms aims to bring the ease-of-use of brms to python users; more sampling, less index-gymnastics and shape errors.

## Install

Install via pip:

`pip install pybrms`

This installs the python package along with its pythonic dependencies; when you first call `pybrms`, it'll install `brms` and its dependencies. Don't worry if you don't have R or brms installed - they will be installed in your current virtual environment.

## Example 

Let's use pybrms to fit a poisson regression model, including an interaction term and by-patient varying intercept. `pybrms` can import all datasets that are included in `brms`:

```python

epilepsy = pybrms.get_brms_data("epilepsy")

```

Fitting the model is as simple as it is in `brms`:

```python

model = pybrms.fit(

    formula = "count ~ zAge + zBase * Trt + (1 | patient)",

    data = epilepsy, 

    family = "poisson"

)

```

    INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_6f531b7e8a9bc73464e98930b52f4547 NOW.

The user can also specify a list of priors, a `family` argument (is it a gaussian regression, binomial, poisson, etc), and optional `pystan` arguments like the number of chains, samples, etc. 

When sampling is completed, `fit` returns a `pystan` `StanFit4Model` object. The generated stan code is also available:

```python

model.get_stanmodel().show()

```

    StanModel object 'anon_model_6f531b7e8a9bc73464e98930b52f4547' coded as follows:

    // generated with brms 2.10.5

    functions {

    }

    data {

      int N;  // number of observations

      int Y[N];  // response variable

      int K;  // number of population-level effects

      matrix[N, K] X;  // population-level design matrix

      // data for group-level effects of ID 1

      int N_1;  // number of grouping levels

      int M_1;  // number of coefficients per level

      int J_1[N];  // grouping indicator per observation

      // group-level predictor values

      vector[N] Z_1_1;

      int prior_only;  // should the likelihood be ignored?

    }

    transformed data {

      int Kc = K - 1;

      matrix[N, Kc] Xc;  // centered version of X without an intercept

      vector[Kc] means_X;  // column means of X before centering

      for (i in 2:K) {

        means_X[i - 1] = mean(X[, i]);

        Xc[, i - 1] = X[, i] - means_X[i - 1];

      }

    }

    parameters {

      vector[Kc] b;  // population-level effects

      // temporary intercept for centered predictors

      real Intercept;

      vector[M_1] sd_1;  // group-level standard deviations

      // standardized group-level effects

      vector[N_1] z_1[M_1];

    }

    transformed parameters {

      // actual group-level effects

      vector[N_1] r_1_1 = (sd_1[1] * (z_1[1]));

    }

    model {

      // initialize linear predictor term

      vector[N] mu = Intercept + Xc * b;

      for (n in 1:N) {

        // add more terms to the linear predictor

        mu[n] += r_1_1[J_1[n]] * Z_1_1[n];

      }

      // priors including all constants

      target += student_t_lpdf(Intercept | 3, 1, 10);

      target += student_t_lpdf(sd_1 | 3, 0, 10)

        - 1 * student_t_lccdf(0 | 3, 0, 10);

      target += normal_lpdf(z_1[1] | 0, 1);

      // likelihood including all constants

      if (!prior_only) {

        target += poisson_log_lpmf(Y | mu);

      }

    }

    generated quantities {

      // actual population-level intercept

      real b_Intercept = Intercept - dot_product(means_X, b);

    }

    

## How it works

Behind the scene, `pybrms` calls `brms` via `rpy2`, handling the python-to-R-objects transitions in both directions and making sure that Stan gets the dtypes it expects by parsing the model's data block. 

More specifically, `pybrms` calls two `brms` functions: `make_stancode` and `make_standata`, which are used to generate the appropriate model code, design matrices, etc. These are then "pulled back" to python and fed into `pystan`.

## Adding priors

By defaults, `brms` uses non- or weakly-informative priors on model parameters. You can specify more informative priors using the following syntax:

```python

model = pybrms.fit("count ~ zAge + zBase * Trt + (1|patient) + (1|obs)",

                                           data = epilepsy, family = "poisson",

                                           priors = [("student_t(5,0,10)", "b"),

                                                     ("cauchy(0,2)", "sd")]

                                          )

```

    INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_3b2730a3196dc4b804b959d98397ba09 NOW.

Priors are passed as a list of tuples that conform to `brms` `set_prior` order of arguments - the first element is a Stan distribution, second is the class (b, Intercept, sd), third is coef, etc.

## Error handling

Error handling happens at the `rpy2` level, which catches the (R) error and displays it as a python `RRuntimeError` exception:

```python

model = pybrms.fit("count_typo ~ zAge + zBase * Trt + (1|patient) + (1|obs)",

                                           data = epilepsy, family = "poisson",

                                           priors = [("student_t(5,0,10)", "b"),

                                                     ("cauchy(0,2)", "sd")]

                                          )

```

    WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Error: The following variables are missing in 'data':

    'count_typo'

    

    ---------------------------------------------------------------------------

    RRuntimeError                             Traceback (most recent call last)

     in 

          2                                            data = epilepsy, family = "poisson",

          3                                            priors = [("student_t(5,0,10)", "b"),

    ----> 4                                                      ("cauchy(0,2)", "sd")]

          5                                           )

    ~/projects/pybrms/pybrms/pybrms.py in fit(formula, data, priors, family, sample_prior, sample, **pystan_args)

        128         family=family,

        129         priors=brms_prior,

    --> 130         sample_prior=sample_prior,

        131     )

        132     model_data = _convert_R_to_python(formula, data, family)

    ~/projects/pybrms/pybrms/pybrms.py in get_stan_code(formula, data, priors, family, sample_prior)

         54     if len(priors)>0:

         55         return brms.make_stancode(

    ---> 56             formula=formula, data=data, prior=priors, family=family, sample_prior=sample_prior

         57         )[0]

         58     else:

    ~/miniconda3/envs/rpy/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)

        190                 kwargs[r_k] = v

        191         return (super(SignatureTranslatedFunction, self)

    --> 192                 .__call__(*args, **kwargs))

        193 

        194 

    ~/miniconda3/envs/rpy/lib/python3.6/site-packages/rpy2/robjects/functions.py in __call__(self, *args, **kwargs)

        119             else:

        120                 new_kwargs[k] = conversion.py2rpy(v)

    --> 121         res = super(Function, self).__call__(*new_args, **new_kwargs)

        122         res = conversion.rpy2py(res)

        123         return res

    ~/miniconda3/envs/rpy/lib/python3.6/site-packages/rpy2/rinterface_lib/conversion.py in _(*args, **kwargs)

         26 def _cdata_res_to_rinterface(function):

         27     def _(*args, **kwargs):

    ---> 28         cdata = function(*args, **kwargs)

         29         # TODO: test cdata is of the expected CType

         30         return _cdata_to_rinterface(cdata)

    ~/miniconda3/envs/rpy/lib/python3.6/site-packages/rpy2/rinterface.py in __call__(self, *args, **kwargs)

        783                     error_occured))

        784             if error_occured[0]:

    --> 785                 raise embedded.RRuntimeError(_rinterface._geterrmessage())

        786         return res

        787 

    RRuntimeError: Error: The following variables are missing in 'data':

    'count_typo'

This makes sure you can debug your data/formula/etc without actually leaving python.

## Visualization

Since `pybrms` returns a `pystan` object, we can easily visualize the results using `arviz`:

```python

import arviz as az

inference_data = az.from_pystan(model)

az.plot_posterior(inference_data, var_names=['b', 'Intercept']);

```

![png](docs/images/output_21_0.png)

For a more detailed walkthrough, see the accompanying [blog post](link).