Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pymc-devs/pymc

Bayesian Modeling and Probabilistic Programming in Python
https://github.com/pymc-devs/pymc

bayesian-inference mcmc probabilistic-programming pytensor python statistical-analysis variational-inference

Last synced: 5 days ago
JSON representation

Bayesian Modeling and Probabilistic Programming in Python

Awesome Lists containing this project

README

        

.. image:: https://cdn.rawgit.com/pymc-devs/pymc/main/docs/logos/svg/PyMC_banner.svg
:height: 100px
:alt: PyMC logo
:align: center

|Build Status| |Coverage| |NumFOCUS_badge| |Binder| |Dockerhub| |DOIzenodo| |Conda Downloads|

PyMC (formerly PyMC3) is a Python package for Bayesian statistical modeling
focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI)
algorithms. Its flexibility and extensibility make it applicable to a
large suite of problems.

Check out the `PyMC overview `__, or
one of `the many examples `__!
For questions on PyMC, head on over to our `PyMC Discourse `__ forum.

Features
========

- Intuitive model specification syntax, for example, ``x ~ N(0,1)``
translates to ``x = Normal('x',0,1)``
- **Powerful sampling algorithms**, such as the `No U-Turn
Sampler `__, allow complex models
with thousands of parameters with little specialized knowledge of
fitting algorithms.
- **Variational inference**: `ADVI `__
for fast approximate posterior estimation as well as mini-batch ADVI
for large data sets.
- Relies on `PyTensor `__ which provides:
* Computation optimization and dynamic C or JAX compilation
* NumPy broadcasting and advanced indexing
* Linear algebra operators
* Simple extensibility
- Transparent support for missing value imputation

Linear Regression Example
==========================

Plant growth can be influenced by multiple factors, and understanding these relationships is crucial for optimizing agricultural practices.

Imagine we conduct an experiment to predict the growth of a plant based on different environmental variables.

.. code-block:: python

import pymc as pm

# Taking draws from a normal distribution
seed = 42
x_dist = pm.Normal.dist(shape=(100, 3))
x_data = pm.draw(x_dist, random_seed=seed)

# Independent Variables:
# Sunlight Hours: Number of hours the plant is exposed to sunlight daily.
# Water Amount: Daily water amount given to the plant (in milliliters).
# Soil Nitrogen Content: Percentage of nitrogen content in the soil.

# Dependent Variable:
# Plant Growth (y): Measured as the increase in plant height (in centimeters) over a certain period.

# Define coordinate values for all dimensions of the data
coords={
"trial": range(100),
"features": ["sunlight hours", "water amount", "soil nitrogen"],
}

# Define generative model
with pm.Model(coords=coords) as generative_model:
x = pm.Data("x", x_data, dims=["trial", "features"])

# Model parameters
betas = pm.Normal("betas", dims="features")
sigma = pm.HalfNormal("sigma")

# Linear model
mu = x @ betas

# Likelihood
# Assuming we measure deviation of each plant from baseline
plant_growth = pm.Normal("plant growth", mu, sigma, dims="trial")

# Generating data from model by fixing parameters
fixed_parameters = {
"betas": [5, 20, 2],
"sigma": 0.5,
}
with pm.do(generative_model, fixed_parameters) as synthetic_model:
idata = pm.sample_prior_predictive(random_seed=seed) # Sample from prior predictive distribution.
synthetic_y = idata.prior["plant growth"].sel(draw=0, chain=0)

# Infer parameters conditioned on observed data
with pm.observe(generative_model, {"plant growth": synthetic_y}) as inference_model:
idata = pm.sample(random_seed=seed)

summary = pm.stats.summary(idata, var_names=["betas", "sigma"])
print(summary)

From the summary, we can see that the mean of the inferred parameters are very close to the fixed parameters

===================== ====== ===== ======== ========= =========== ========= ========== ========== =======
Params mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
===================== ====== ===== ======== ========= =========== ========= ========== ========== =======
betas[sunlight hours] 4.972 0.054 4.866 5.066 0.001 0.001 3003 1257 1
betas[water amount] 19.963 0.051 19.872 20.062 0.001 0.001 3112 1658 1
betas[soil nitrogen] 1.994 0.055 1.899 2.107 0.001 0.001 3221 1559 1
sigma 0.511 0.037 0.438 0.575 0.001 0 2945 1522 1
===================== ====== ===== ======== ========= =========== ========= ========== ========== =======

.. code-block:: python

# Simulate new data conditioned on inferred parameters
new_x_data = pm.draw(
pm.Normal.dist(shape=(3, 3)),
random_seed=seed,
)
new_coords = coords | {"trial": [0, 1, 2]}

with inference_model:
pm.set_data({"x": new_x_data}, coords=new_coords)
pm.sample_posterior_predictive(
idata,
predictions=True,
extend_inferencedata=True,
random_seed=seed,
)

pm.stats.summary(idata.predictions, kind="stats")

The new data conditioned on inferred parameters would look like:

================ ======== ======= ======== =========
Output mean sd hdi_3% hdi_97%
================ ======== ======= ======== =========
plant growth[0] 14.229 0.515 13.325 15.272
plant growth[1] 24.418 0.511 23.428 25.326
plant growth[2] -6.747 0.511 -7.740 -5.797
================ ======== ======= ======== =========

.. code-block:: python

# Simulate new data, under a scenario where the first beta is zero
with pm.do(
inference_model,
{inference_model["betas"]: inference_model["betas"] * [0, 1, 1]},
) as plant_growth_model:
new_predictions = pm.sample_posterior_predictive(
idata,
predictions=True,
random_seed=seed,
)

pm.stats.summary(new_predictions, kind="stats")

The new data, under the above scenario would look like:

================ ======== ======= ======== =========
Output mean sd hdi_3% hdi_97%
================ ======== ======= ======== =========
plant growth[0] 12.149 0.515 11.193 13.135
plant growth[1] 29.809 0.508 28.832 30.717
plant growth[2] -0.131 0.507 -1.121 0.791
================ ======== ======= ======== =========

Getting started
===============

If you already know about Bayesian statistics:
----------------------------------------------

- `API quickstart guide `__
- The `PyMC tutorial `__
- `PyMC examples `__ and the `API reference `__

Learn Bayesian statistics with a book together with PyMC
--------------------------------------------------------

- `Bayesian Analysis with Python `__ (third edition) by Osvaldo Martin: Great introductory book.
- `Probabilistic Programming and Bayesian Methods for Hackers `__: Fantastic book with many applied code examples.
- `PyMC port of the book "Doing Bayesian Data Analysis" by John Kruschke `__ as well as the `first edition `__.
- `PyMC port of the book "Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath `__
- `PyMC port of the book "Bayesian Cognitive Modeling" by Michael Lee and EJ Wagenmakers `__: Focused on using Bayesian statistics in cognitive modeling.

Audio & Video
-------------

- Here is a `YouTube playlist `__ gathering several talks on PyMC.
- You can also find all the talks given at **PyMCon 2020** `here `__.
- The `"Learning Bayesian Statistics" podcast `__ helps you discover and stay up-to-date with the vast Bayesian community. Bonus: it's hosted by Alex Andorra, one of the PyMC core devs!

Installation
============

To install PyMC on your system, follow the instructions on the `installation guide `__.

Citing PyMC
===========
Please choose from the following:

- |DOIpaper| *PyMC: A Modern and Comprehensive Probabilistic Programming Framework in Python*, Abril-Pla O, Andreani V, Carroll C, Dong L, Fonnesbeck CJ, Kochurov M, Kumar R, Lao J, Luhmann CC, Martin OA, Osthege M, Vieira R, Wiecki T, Zinkov R. (2023)
- |DOIzenodo| A DOI for all versions.
- DOIs for specific versions are shown on Zenodo and under `Releases `_

.. |DOIpaper| image:: https://img.shields.io/badge/DOI-10.7717%2Fpeerj--cs.1516-blue.svg
:target: https://doi.org/10.7717/peerj-cs.1516
.. |DOIzenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4603970.svg
:target: https://doi.org/10.5281/zenodo.4603970

Contact
=======

We are using `discourse.pymc.io `__ as our main communication channel.

To ask a question regarding modeling or usage of PyMC we encourage posting to our Discourse forum under the `“Questions” Category `__. You can also suggest feature in the `“Development” Category `__.

You can also follow us on these social media platforms for updates and other announcements:

- `LinkedIn @pymc `__
- `YouTube @PyMCDevelopers `__
- `X @pymc_devs `__
- `Mastodon @[email protected] `__

To report an issue with PyMC please use the `issue tracker `__.

Finally, if you need to get in touch for non-technical information about the project, `send us an e-mail `__.

License
=======

`Apache License, Version
2.0 `__

Software using PyMC
===================

General purpose
---------------

- `Bambi `__: BAyesian Model-Building Interface (BAMBI) in Python.
- `calibr8 `__: A toolbox for constructing detailed observation models to be used as likelihoods in PyMC.
- `gumbi `__: A high-level interface for building GP models.
- `SunODE `__: Fast ODE solver, much faster than the one that comes with PyMC.
- `pymc-learn `__: Custom PyMC models built on top of pymc3_models/scikit-learn API

Domain specific
---------------

- `Exoplanet `__: a toolkit for modeling of transit and/or radial velocity observations of exoplanets and other astronomical time series.
- `beat `__: Bayesian Earthquake Analysis Tool.
- `CausalPy `__: A package focussing on causal inference in quasi-experimental settings.

Please contact us if your software is not listed here.

Papers citing PyMC
==================

See Google Scholar `here `__ and `here `__ for a continuously updated list.

Contributors
============

See the `GitHub contributor
page `__. Also read our `Code of Conduct `__ guidelines for a better contributing experience.

Support
=======

PyMC is a non-profit project under NumFOCUS umbrella. If you want to support PyMC financially, you can donate `here `__.

Professional Consulting Support
===============================

You can get professional consulting support from `PyMC Labs `__.

Sponsors
========

|NumFOCUS|

|PyMCLabs|

|Mistplay|

|ODSC|

Thanks to our contributors
==========================

|contributors|

.. |Binder| image:: https://mybinder.org/badge_logo.svg
:target: https://mybinder.org/v2/gh/pymc-devs/pymc/main?filepath=%2Fdocs%2Fsource%2Fnotebooks
.. |Build Status| image:: https://github.com/pymc-devs/pymc/workflows/pytest/badge.svg
:target: https://github.com/pymc-devs/pymc/actions
.. |Coverage| image:: https://codecov.io/gh/pymc-devs/pymc/branch/main/graph/badge.svg
:target: https://codecov.io/gh/pymc-devs/pymc
.. |Dockerhub| image:: https://img.shields.io/docker/automated/pymc/pymc.svg
:target: https://hub.docker.com/r/pymc/pymc
.. |NumFOCUS_badge| image:: https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A
:target: http://www.numfocus.org/
.. |NumFOCUS| image:: https://github.com/pymc-devs/brand/blob/main/sponsors/sponsor_logos/sponsor_numfocus.png?raw=true
:target: http://www.numfocus.org/
.. |PyMCLabs| image:: https://github.com/pymc-devs/brand/blob/main/sponsors/sponsor_logos/sponsor_pymc_labs.png?raw=true
:target: https://pymc-labs.io
.. |Mistplay| image:: https://github.com/pymc-devs/brand/blob/main/sponsors/sponsor_logos/sponsor_mistplay.png?raw=true
:target: https://www.mistplay.com/
.. |ODSC| image:: https://github.com/pymc-devs/brand/blob/main/sponsors/sponsor_logos/odsc/sponsor_odsc.png?raw=true
:target: https://odsc.com/california/?utm_source=pymc&utm_medium=referral
.. |contributors| image:: https://contrib.rocks/image?repo=pymc-devs/pymc
:target: https://github.com/pymc-devs/pymc/graphs/contributors
.. |Conda Downloads| image:: https://anaconda.org/conda-forge/pymc/badges/downloads.svg
:target: https://anaconda.org/conda-forge/pymc