Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alexioannides/pymc-example-project
Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.
https://github.com/alexioannides/pymc-example-project
bayesian-data-analysis bayesian-inference data-science machine-learning numpy pandas probabilistic-programming pymc3 python scikit-learn
Last synced: 3 months ago
JSON representation
Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.
- Host: GitHub
- URL: https://github.com/alexioannides/pymc-example-project
- Owner: AlexIoannides
- Created: 2018-01-16T00:26:47.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-21T01:05:57.000Z (almost 6 years ago)
- Last Synced: 2024-10-13T00:07:08.235Z (3 months ago)
- Topics: bayesian-data-analysis, bayesian-inference, data-science, machine-learning, numpy, pandas, probabilistic-programming, pymc3, python, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 77.3 MB
- Stars: 106
- Watchers: 5
- Forks: 19
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Example PyMC3 Project for Bayesian Data Analysis
This notebook contains the code required to conduct a Bayesian data analysis on data collected from a set of multiple-lot online auction events executed in Europen markets, over the course of a year. We will build interpretable models of the average Return-on-Reserve (RoR), using variables that describe various facets of multiple-lot online auction events - e.g. the average number of bidders per-lot, the number of lots offered, etc..
The ultimate aim of this endeavour is for this notebook (and it's surrounding directory structure) to serve as a template workflow for conducting an end-to-end Bayesian data analysis using PyMC3. It includes many helper functions for automating otherwise tedious tasks (e.g. interacting with Theano to score models on test data) and examples of how data pre-processing can be integrated with Scikit-Learn.
The overall approach has been heavily inspired by the book 'Statistical Rethinking - a Bayesian Course with Examples in R and Stan', by Richard McElreath http://xcelab.net/rm/statistical-rethinking/, which is the most significant book on statistics and modelling that I have read during a career spanning more than a decade. Credit also goes to 'Bayesian Data Analysis', by Andrew Gelman & Co. http://www.stat.columbia.edu/~gelman/book/, which I refer to on an on-going basis.
## Project Dependencies
We use [pipenv](https://docs.pipenv.org) for managing project dependencies and Python environments (i.e. virtual environments). All of the direct packages dependencies required to run the code (e.g. NumPy for arrays/tensors and Pandas for DataFrames), as well as all the packages used during development (e.g. IPython and JupyterLab as the chosen development environment), are described in the `Pipfile`. Their **precise** downstream dependencies are described in `Pipfile.lock`.
### Installing Pipenv
To get started with Pipenv, first of all download it - assuming that there is a global version of Python available on your system and on the PATH, then this can be achieved by running the following command,
```bash
pip3 install pipenv
```Pipenv is also available to install from many non-Python package managers. For example, on OS X it can be installed using the [Homebrew](https://brew.sh) package manager, with the following terminal command,
```bash
brew install pipenv
```For more information, including advanced configuration options, see the [official pipenv documentation](https://docs.pipenv.org).
### Installing this Projects' Dependencies
Make sure that you're in the project's root directory (the same one in which the `Pipfile` resides), and then run,
```bash
pipenv install --dev
```This will install all of the direct project dependencies as well as the development dependencies (the latter a consequence of the `--dev` flag).
### Running Python, IPython and JupyterLab from the Project's Virtual Environment
In order to continue development in a Python environment that precisely mimics the one the project was initially developed with, use Pipenv from the command line as follows,
```bash
pipenv run python3
```The `python3` command could just as well be `ipython3` or the JupterLab, for example,
```bash
pipenv run jupyter lab
```This will fire-up JupyterLab *where the default Python 3 kernel includes all of the direct and development project dependencies*. This is how we advise that the notebooks within this project are used.
### Pipenv Shells
Prepending `pipenv` to every command you want to run within the context of your Pipenv-managed virtual environment, can get very tedious. This can be avoided by entering into a Pipenv-managed shell,
```bash
pipenv shell
```which is equivalent to 'activating' the virtual environment. Any command will now be executed within the virtual environment. Use `exit` to leave the shell session.