https://github.com/maxbiostat/computational_statistics
Course materials for Computational Statistics, PhD course at EMAp.
https://github.com/maxbiostat/computational_statistics
computational-statistics course education monte-carlo
Last synced: 6 months ago
JSON representation
Course materials for Computational Statistics, PhD course at EMAp.
- Host: GitHub
- URL: https://github.com/maxbiostat/computational_statistics
- Owner: maxbiostat
- License: gpl-3.0
- Created: 2020-09-13T14:31:12.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2025-04-07T17:03:20.000Z (6 months ago)
- Last Synced: 2025-04-07T18:23:02.730Z (6 months ago)
- Topics: computational-statistics, course, education, monte-carlo
- Language: TeX
- Homepage:
- Size: 4.02 MB
- Stars: 35
- Watchers: 8
- Forks: 4
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Support: supporting_material/1997_Dunbar_CollegeMaths.pdf
Awesome Lists containing this project
README
# Computational Statistics (["Estatística Computacional"](https://emap.fgv.br/disciplina/doutorado/estatistica-computacional))
Course materials for Computational Statistics, a PhD-level course at [EMAp](http://emap.fgv.br/).
## Lecture notes and other resources
- We will be using the excellent [materials](http://www.stats.ox.ac.uk/~rebeschi/teaching/AdvSim/18/index.html) from Professor Patrick Rebeschini (Oxford University) as a general guide for our course.
As complementary material,
- These lecture [notes](https://web.archive.org/web/20131215003910/https://statweb.stanford.edu/~susan/courses/s227/) by stellar statistician [Professor Susan Holmes](https://susan.su.domains/) are also well worth taking a look.
- [Monte Carlo theory, methods and examples](https://artowen.su.domains/mc/) by [Professor Art Owen](https://artowen.su.domains/), gives a nice and complete treatment of all the topics on simulation, including a whole chapter on variance reduction.
Other materials, including lecture notes and slides may be posted here as the course progresses.
[Here](https://github.com/maxbiostat/Computational_Statistics/blob/master/annotated_bibliography.md) you can find a nascent annotated bibliography with landmark papers in the field.
[This](http://hedibert.org/wp-content/uploads/2021/02/MC-MCMC-references.pdf) review paper by Professor Hedibert Lopes is far better than anything I could conjure, however.## Books
Books marked with [a] are advanced material.
**Main**
- Gamerman, D., & Lopes, H. F. (2006). [Markov chain Monte Carlo: stochastic simulation for Bayesian inference](http://www.dme.ufrj.br/mcmc/). Chapman and Hall/CRC.
- Robert, C. P., Casella, G. (2004). [Monte Carlo Statistical Methods](https://www.researchgate.net/profile/Christian_Robert2/publication/2681158_Monte_Carlo_Statistical_Methods/links/00b49535ccaf6ccc8f000000/Monte-Carlo-Statistical-Methods.pdf). John Wiley & Sons, Ltd.**Supplementary**
- Givens, G. H., & Hoeting, J. A. (2012). [Computational Statistics](https://www.stat.colostate.edu/computationalstatistics/) (Vol. 710). John Wiley & Sons.
- [a] Meyn, S. P., & Tweedie, R. L. (2012). [Markov chains and stochastic stability](https://www.springer.com/gp/book/9781447132691). Springer Science & Business Media. [[PDF](http://probability.ca/MT/BOOK.pdf)].
- [a] Nummelin, E. (2004). [General irreducible Markov chains and non-negative operators](https://www.cambridge.org/core/books/general-irreducible-markov-chains-and-nonnegative-operators/0557D49C011AA90B761FC854D5C14983) (Vol. 83). Cambridge University Press.## News
## Simulation
- [Random Number Generation](https://www.iro.umontreal.ca/~lecuyer/myftp/papers/handstat.pdf) by [Pierre L'Ecuyer](http://www-labs.iro.umontreal.ca/~lecuyer/);
- [Non-Uniform Random Variate Generation](http://www.nrbook.com/devroye/) by the great [Luc Devroye](http://luc.devroye.org/);
- Walker's [Alias method](https://en.wikipedia.org/wiki/Alias_method) is a fast way to generate discrete random variables;
- [Rejection Control and Sequential importance sampling](https://statweb.rutgers.edu/rongchen/publications/98JASA_rejection-control.pdf) (1998), by Liu et al. discusses how to improve importance sampling by controlling rejections.
- [This](https://doi.org/10.1214/18-STS676) is a nice general comment about the role of simulation in numerical integration.
- [This](https://arxiv.org/pdf/2502.07396) survey by Luca Martino and Fernando Llorente is an excellent overview of the optimality results for importance sampling, including the optimal proposal for self-normalising importance sampling.### Markov chains
- [These](https://pages.uoregon.edu/dlevin/MARKOV/markovmixing.pdf) notes from David Levin and Yuval Peres are excellent and cover a lot of material one might find interesting on Markov processes.
### Markov chain Monte Carlo
- Charlie Geyer's [website](http://users.stat.umn.edu/~geyer/) is a treasure trove of material on Statistics in general, MCMC methods in particular.
See, for instance, [On the Bogosity of MCMC Diagnostics](http://users.stat.umn.edu/~geyer/mcmc/diag.html).
- [Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions](http://www2.stat.duke.edu/~scs/Courses/Stat376/Papers/TransdimMCMC/BrooksRobertsRJ.pdf) is nice paper on the construction of efficient proposals for reversible jump/transdimensional MCMC.#### Hamiltonian Monte Carlo
The two definitive texts on HMC are [Neal (2011)](https://arxiv.org/pdf/1206.1901.pdf) and [Betancourt (2017)](https://arxiv.org/pdf/1701.02434.pdf).
A nice set of notes is [Vishnoi (2021)](https://arxiv.org/pdf/2108.12107.pdf). Moreover, [Hoffman & Gelman (2014)](https://jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf) describes the No-U-turn sampler.#### Normalising Constants
[This](https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/) post by Radford Neal explains why the Harmonic Mean Estimator (HME) is a _terrible_ estimator of the evidence.
#### Sequential Monte Carlo and Dynamic models
- [This](https://link.springer.com/book/10.1007/978-3-030-47845-2) book by Nicolas Chopin and Omiros Papaspiliopoulos is a great introduction (as it says in the title) about SMC.
SMC finds application in many areas, but dynamic (linear) models deserve a special mention. The seminal 1997 [book](https://link.springer.com/book/10.1007/b98971) by West and Harrison remains the _de facto_ text on the subject.## Optmisation
#### The EM algorithm
- This elementary [tutorial](https://zhwa.github.io/tutorial-of-em-algorithm.html) is simple but effective.
- The book [The EM algorithm and Extensions](https://books.google.com.br/books?hl=en&lr=&id=NBawzaWoWa8C&oi=fnd&pg=PR3&dq=The+EM+algorithm+and+Extensions&ots=tp68LOYAvP&sig=iCEMt5YUIMToTSESxLctWcob8VM#v=onepage&q=The%20EM%20algorithm%20and%20Extensions&f=false) is a well-cited resource.
- [Monte Carlo EM](https://github.com/bob-carpenter/case-studies/blob/master/monte-carlo-em/mcem.pdf) by Bob Carpenter (Columbia).#### Simulated Annealing
- The original 1983 [paper](https://www.science.org/doi/10.1126/science.220.4598.671) in Science [open link](http://wexler.free.fr/library/files/kirkpatrick%20(1983)%20optimization%20by%20simulated%20annealing.pdf) by Kirpatrick et al is a great read.
- [These](https://youtu.be/NPE3zncXA5s) visualisations of the traveling salesman problem might prove useful.
- [These](https://www.ime.usp.br/~yambar/MAE5704/Aula10SimulatedAnnealing/aula10slidesT.pdf) notes have a little bit of theory on the cooling scheme.## Bootstrap
- [Efron (1979)](https://projecteuclid.org/journals/annals-of-statistics/volume-7/issue-1/Bootstrap-Methods-Another-Look-at-the-Jackknife/10.1214/aos/1176344552.full) is a great resource and a seminal paper.
- A good introductory book is [An introduction to the bootstrap](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317) by Efron and Tibshirani (1993). [PDF](https://cindy.informatik.uni-bremen.de/cosy/teaching/CM_2011/Eval3/pe_efron_93.pdf).
- The technical justification of the bootstrap relies on the [Glivenko-Cantelli](https://en.wikipedia.org/wiki/Glivenko%E2%80%93Cantelli_theorem) theorem. The proof given in class is taken from [here](http://www.ms.uky.edu/~mai/sta709/NoteGC.pdf).## Miscellanea
- In [these](https://terrytao.wordpress.com/2010/01/03/254a-notes-1-concentration-of-measure/) notes, [Terence Tao](https://en.wikipedia.org/wiki/Terence_Tao) gives insights into **concentration of measure**, which is the reason why integrating with respect to a probability measure in high-dimensional spaces is _hard_.
- [A Primer for the Monte Carlo Method](https://archive.org/details/APrimerForTheMonteCarloMethod), by the great [Ilya Sobol](https://en.wikipedia.org/wiki/Ilya_M._Sobol), is one of the first texts on the Monte Carlo method.
- The Harris inequality, `E[fg] >= E[f]E[g]`, for `f` and `g` increasing, is a special case of the [FKG inequality](https://en.wikipedia.org/wiki/FKG_inequality).
- In [Markov Chain Monte Carlo Maximum Likelihood](https://www.stat.umn.edu/geyer/f05/8931/c.pdf), Charlie Geyer shows how one can use MCMC to do maximum likelihood estimation when the likelihood cannot be written in closed-form.
This paper is an example of MCMC methods being used outside of Bayesian statistics.- [This](https://github.com/maxbiostat/Computational_Statistics/blob/master/supporting_material/1997_Dunbar_CollegeMaths.pdf) paper discusses the solution of Problem A in [assigment 0 (2021)](https://github.com/maxbiostat/Computational_Statistics/blob/master/assignments/warmup_assignment.pdf).
#### Reparametrisation
Sometimes a clever way to make a target distribution easier to compute expectations with respect to is to _reparametrise_ it. Here are some resources:
- A youtube video [Introduction of the concepts and a simple example]( https://www.youtube.com/watch?v=gSd1msFFZTw);
- [Hamiltonian Monte Carlo for Hierarchical Models](https://arxiv.org/abs/1312.0906) from M. J. Betancourt and Mark Girolami;
- [A General Framework for the Parametrization of Hierarchical Models](https://projecteuclid.org/journals/statistical-science/volume-22/issue-1/A-General-Framework-for-the-Parametrization-of-Hierarchical-Models/10.1214/088342307000000014.full) from Omiros Papaspiliopoulos, Gareth O. Roberts, and Martin Sköld;
- [Efficient parametrisations for normal linear mixed models](https://www.jstor.org/stable/2337527?seq=1#metadata_info_tab_contents) from Alan E. Gelfand, Sujit K. Sahu and Bradley P. Carlin.See [#4](https://github.com/maxbiostat/Computational_Statistics/issues/4). Contributed by @lucasmoschen.
#### Variance reduction
- [Rao-Blackwellisation](http://www.columbia.edu/~im2131/ps/rao-black.pdf) is a popular technique for obtaining estimators with lower variance. I recommend the recent International Statistical Review [article](https://arxiv.org/abs/2101.01011) by Christian Robert and Gareth Roberts on the topic.
### Extra (fun) resources
- A [Visualisation](https://chi-feng.github.io/mcmc-demo/app.html) of MCMC for various algorithms and targets.
In these blogs and websites you will often find interesting discussions on computational, numerical and statistical aspects of applied Statistics and Mathematics.
- Christian Robert's [blog](https://xianblog.wordpress.com/);
- John Cook's [website](https://www.johndcook.com/blog/);
- [Statisfaction](https://statisfaction.wordpress.com/) blog.