https://github.com/dufourc1/bayesian_computation

Project for the course of Bayesian computation (MATH-435). The purpose of this project was to write the low level functions and algorithms used in bayesian statistics and to use them on a simple dataset.
https://github.com/dufourc1/bayesian_computation

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/dufourc1/bayesian_computation
Owner: dufourc1
License: apache-2.0
Created: 2019-06-01T08:49:25.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-06-19T06:13:16.000Z (about 6 years ago)
Last Synced: 2025-02-08T09:47:08.658Z (5 months ago)
Language: Jupyter Notebook
Homepage:
Size: 52.5 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Bayesian computation framework from scratch

Project for the course of Bayesian computation (MATH-435).

The purpose of this project was to write the low levels functions and algorithms used in bayesian

statistics and to use them on a simple dataset.

(Jupyter Lab is recommended when running `testing.pynb` for a better visual experience)

## Libraries used

The following libraries were used for this project, with Python 3.6.8

 Computational:

    numpy    (1.16.4)

    scipy    (1.3.0)

    autograd (1.2)

    pandas   (0.24.2)

Beware: at the time of writing, there is a compatibility issue between autograd and scipy that can be easily fixed following [this issue](https://github.com/HIPS/autograd/issues/501) from the github page of autograd.

Graphical:

    matplotlib  (3.1.0)

    seaborn     (0.9.0)

    plotly      (3.9.0)

## Prerequisites

The folder structure is the following:

    ├── data                  #csv of the dataset

    ├── results                              

        ├── plots

        └── csv

    ├── src                    # Source files (actual framework)

        ├── helpers.py

        ├── models                           

        ├── optimization

        ├── approximation

        └── sampling

    ├── project.ipynb          #project interface and use on dataset

    └── README.md

## Element of the source files

### models

Contains the following classes:

    Model:

        A model consist of a prior and a conditional model plus the data corresponding

        to the problem. Using autograd, one can compute the posterior, the log posterior,

        the gradient and the hessian of those quantities directly by calling:

`model.log_posterior`

`model.log_posterior_grad`

`model.log_posterior_hessian`

    Prior

         -Gaussian_exp_prior(Prior) 

                 Implementation of the gaussian prior for the parameters beta and the

                exponential prior for the sigma: beta ~ N(mean,std**2), sigma ~ Exp(lamdba)

         -Gaussian_gamma_prior(Prior)

                 Implementation of the Student prior for the parameters beta and the

                 gamma prior for the df of the student noise: beta ~ N(mean,std**2), df ~ Gamma(alpha,beta)

         -Gaussian_prior(Prior):

                 Implementation of the vanilla gaussian prior: theta ~ N(mean,std**2)

         

`prior.log_prior`

`prior.log_prior_grad`

`prior.log_prior_hessian`

    Conditional_model

        - Gaussian linear model

        - Student linear model

        - Logistic model 

        - Gamma model (not checked for errors)

       

`cond_mod.log_l`

`cond_mod.log_l_grad`

`cond_mod.log_l_hessian`

See this [issue](https://github.com/dufourc1/Bayesian_computation/issues/2) for the gamma model

### optimization

  Implementation of the optimization routine seen in the course, specialized to treat the problem

  of maximizing the posterior of a model by minimizing the negative log posterior of the same model

    vanilla_gd

    line_search_gd

    Wolfe_cond_gd

    stochastic_gd 

    newton_gd (unstable)

    

See this  [issue](https://github.com/dufourc1/Bayesian_computation/issues/4)

### approximation

  Implementation of various approximation techniques:

  these methods take as argument a model and return and approximation of its normalized posterior

    GVA (not entirely checked for errors) 

    

    laplace_approx

 See [issue](https://github.com/dufourc1/Bayesian_computation/issues/3)

### sampling

  Implementation of the sampling routines:

  these methods take as argument a model and return samples from its unormalized posterior

    importance_sampling

    rejection_sampling

    Gibbs_sampling (notImplemented)

    random_walk_MH

    Langevin_MH

## Project.iynb

The details of the statistical analysis are written in the notebook, only the data is presented here.

### Dataset

 Wages in the USA dataset (May 1985) from  _E. R. Berndt. The practice of econometric : classic and contemporary. Addison-Wesley Pub. Co., 1991_

 The dataset contains the following data:

- Continous variables

   | ED 	| EX 	| LNWAGE|

   |-------|--------| -----|

   | education in years |  experience in years   | the logarithm of the hourly wage  |

- Categorical variables about the type of employment

   | MANAG 	| SALES 	| CLER | SERV | PROF |

   |-------|--------| -----| --| ----|

   |managerial |  sales worker   | clerical worker  | service worker | technical worker |

- Categorical data about the industry

    | MANUF 	| CONSTR 	|

    |-------|--------|

    | manufacturing |   construction  |

- Other Categorical data:

     | FE 	| MARR	| MARRFE| SOUTH | NONWH | HISP |

     |-------|--------| -----|---|---|---|

     | sex (1 if female) | marital status    | marital status of female  | geograpical data| non white | hispanique|

## Author

* *Charles Dufour*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dufourc1/bayesian_computation

Awesome Lists containing this project

README