{"id":13477309,"url":"https://github.com/ermongroup/a-nice-mc","last_synced_at":"2025-04-10T20:11:05.713Z","repository":{"id":84350764,"uuid":"94019605","full_name":"ermongroup/a-nice-mc","owner":"ermongroup","description":"Code for \"A-NICE-MC: Adversarial Training for MCMC\"","archived":false,"fork":false,"pushed_at":"2018-07-20T19:22:46.000Z","size":6737,"stargazers_count":126,"open_issues_count":1,"forks_count":28,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-03-24T17:52:48.489Z","etag":null,"topics":["bayesian-inference","bayesian-machine-learning","generative-adversarial-network","generative-models","markov-chain","markov-chain-monte-carlo","neural-networks","tensorflow","tensorflow-experiments"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ermongroup.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-06-11T17:09:42.000Z","updated_at":"2024-06-26T14:37:11.000Z","dependencies_parsed_at":"2023-03-04T10:00:11.661Z","dependency_job_id":null,"html_url":"https://github.com/ermongroup/a-nice-mc","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ermongroup%2Fa-nice-mc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ermongroup%2Fa-nice-mc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ermongroup%2Fa-nice-mc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ermongroup%2Fa-nice-mc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ermongroup","download_url":"https://codeload.github.com/ermongroup/a-nice-mc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248288357,"owners_count":21078903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-inference","bayesian-machine-learning","generative-adversarial-network","generative-models","markov-chain","markov-chain-monte-carlo","neural-networks","tensorflow","tensorflow-experiments"],"created_at":"2024-07-31T16:01:40.902Z","updated_at":"2025-04-10T20:11:05.690Z","avatar_url":"https://github.com/ermongroup.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# A-NICE-MC: Adversarial Training for MCMC\n\nTensorflow implementation for the paper [A-NICE-MC: Adversarial Training for MCMC](https://arxiv.org/abs/1706.07561), NIPS 2017.\n\nby [Jiaming Song](http://tsong.me), [Shengjia Zhao](http://szhao.me) and [Stefano Ermon](http://cs.stanford.edu/~ermon), Stanford Artificial Intelligence Laboratory\n\n---\n\n**A-NICE-MC** is a framework that trains a *parametric* Markov Chain Monte Carlo proposal.\nIt achieves higher performance than traditional nonparametric proposals, such as Hamiltonian Monte Carlo (HMC).\nThis repository provides code to replicate the experiments, as well as providing grounds for further research.\n\nA-NICE-MC stands for *Adversarial Non-linear Independent Component Estimation Monte Carlo*, in that:\n- The framework utilizes a parametric proposal for Markov Chain Monte Carlo (MC).\n- The proposal is represented through Non-linear Independent Component Estimation (NICE).\n- The NICE network is trained through adversarial methods (A); see [jiamings/markov-chain-gan](https://github.com/jiamings/markov-chain-gan).\n\n## Running the Experiments\nThe code depends on tensorflow \u003e= 1.0, numpy, scipy, matplotlib, and pandas.\nIt has been tested on both Python 2 and Python 3.\n\nThe Effective Sample Size metric for evaluating MCMC algorithms will appear on screen, and is stored in `logs/[experiment_name]/ess.csv`.\n\n### Analytical Expression Targets\n\nTo run the Ring experiments:\n```\npython examples/nice_ring2d.py\npython examples/nice_lord_of_rings.py\n```\n\nTo run the Mixture of Gaussian experiments:\n```\npython examples/nice_mog2.py\npython examples/nice_mog6.py\n```\n\n### Bayesian Logistic Regression Posterior Inference\n\nTo run the experiment on Australian dataset:\n```\npython examples/nice_australian.py\n```\n\nTo run the experiment on the German dataset:\n```\npython examples/nice_german.py\n```\n\nTo run the experiment on the Heart dataset:\n```\npython examples/nice_heart.py\n```\n\nThe resulting ESS should be at least as good as reported in the paper (if not better, train it for longer iterations).\n\nThe running time depends on the machine, so only the ratio between running times of A-NICE-MC and HMC is particularly meaningful. **Sanity check**: during one update HMC computes the entire dataset for 40 + 1 times (HMC steps + MH step), while A-NICE-MC computes the entire dataset for only 1 time (only for MH step); so A-NICE-MC at this stage should not be 40x faster, but it seems reasonable that it is 10x faster.\n\n### Visualization\nVisualizing samples from a single chain (in the 2d case). Details are in [figs/animation.ipynb](figs/animation.ipynb) (install ffmpeg if necessary).\n\n![](figs/lord_of_rings.gif)\n\n![](figs/mog2.gif)\n\n![](figs/mog6.gif)\n\n\n## How A-NICE-MC Works\nIn general, [Markov Chain Monte Carlo](https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) methods estimate a density `p(x)` by sampling through a Markov Chain, where the transition kernel has two components:\n- A proposal `p(x_|x)` that proposes a new `x_` given the previous `x`. The proposal should satisfy [detailed balance](https://en.wikipedia.org/wiki/Detailed_balance).\n- A [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) acceptance step (MH step), which accepts or rejects `x_` according to `p(x)` and `p(x_)`.\n\nIt might be tempting to use any generative model as the proposal; however, training is difficult because the kernel is non-differentiable, and score-based gradient estimator\nare not effective when initially the rejection rate is high.\n\nTherefore, we draw ideas from [Hamiltonian Monte Carlo](https://arxiv.org/pdf/1206.1901.pdf) and [NICE](https://arxiv.org/abs/1410.8516).\nNICE is a *deterministic, invertible* transformation that preserves volume; HMC introduces an auxiliary variable that reduces random walk behavior.\n\n### A NICE Proposal\n\nWe can therefore use a NICE network `x_, v_ = f(x, v)` as our proposal, where `v` is the auxiliary variable we sample independently from `x` at every step.\nHence, we can treat `f(x, v)` as some \"implicit generative model\", which can be used to construct `p(x_|x)`.\n\nWe use the following proposal to ensure `p(x_, v_|x, v) = p(x, v|x_, v_)` for all `(x, v)` and `(x_, v_)` pairs,\nthereby satisfying the *detailed balance* condition directly.\n- For `p = 0.5`, `x_, v_ = f(x, v)`\n- For `p = 0.5`, `x_, v_ = f^{-1}(x, v)`\n\n### Training\n\nThen, we can utilize adversarial training to train the Markov Chain from `f(x, v)` (not the proposal), \nthereby making the entire objective differentiable.\n\n\u003e Wait! How can you train on a differentiable model that is totally different from the MCMC kernel that you sample from?\n\n![](figs/nice_xv.png)\n\nDue to the invertibility of the NICE network, if the forward operation tranforms a point in the `(x, v)` manifold to another point in the `(x, v)` manifold, then the backward operation will do the same. Meanwhile, the forward operation will encourage the points to move toward `p(x, v)` and the MH step tends to reject backward operations, thereby removing random-walk behavior.\n\n### Increasing ESS with Pairwise Discriminator\n\nIdeally we would like to reduce autocorrelation between the samples from the chain. \nThis can be done by simply providing a pair of correlated data to the discriminator as generated data, so that the generator has the incentive to generate samples that are less correlated.\n\nSuppose two settings of generation:\n\n- `x -\u003e z1`\n- `z -\u003e z2 -\u003e stop_gradient(z2) -\u003e z3`\n\nwhere `x` is the \"true data\", `z` is the starting distribution, and `z1`, `z2`, and `z3` are the distribution that are generated by the model. In the case if pairwise discriminator, we consider the two type of pairs: `(x, z1)` and `(z2, z3)`. The optimal solution for the generator (given a perfect discriminator) is to let `p(z1), p(z2), p(z3)` to be the data distribution and `z2` and `z3` are uncorrelated.\n\nThis is illustrated in the following figure:\n\n![](figs/pairwise.png)\n\n### Bootstrap\nIn order to obtain samples to begin training, we adopt a bootstrap technique to obtain samples from our own model, which allows us to improve the sample quality and the model quality iteratively. \n\nCurrently, we draw the initial samples from the untrained model (with randomly initialized samples). This sounds a bit crazy, but it works in our experiments. For domains with higher dimensions it might be better to start with a chain that has higher acceptance rate.\n\n## Citation\nIf you use this code for your research, please cite our [paper](https://arxiv.org/abs/1706.07561):\n\n```\n@article{song2017nice,\n  title={A-NICE-MC: Adversarial Training for MCMC},\n  author={Song, Jiaming and Zhao, Shengjia and Ermon, Stefano},\n  journal={arXiv preprint arXiv:1706.07561},\n  year={2017}\n}\n```\n\n## Related Projects\n[markov-chain-gan](https://github.com/jiamings/markov-chain-gan): training a transition operator for a Markov chain. This contains part of the image generation experiments for this paper.\n\n## Contact\n[tsong@cs.stanford.edu](mailto:tsong@cs.stanford.edu)\n\nThis method is very new and experimental, so there might be cases where this fails (or because of poor parameter choices). We welcome all kinds of suggestions - including but not limited to \n\n- improving the method (MMD loss for `v`? other bootstrap techniques?) \n- additional experiments in other domains (some other applications that this method would shine?)\n- and how to improve the current code to make experiments more scalable (`save` and `load` feature?)\n\nIf something does not work as you would expect - please let me know. It helps everyone to know the strengths as well as weaknesses of the method.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fermongroup%2Fa-nice-mc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fermongroup%2Fa-nice-mc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fermongroup%2Fa-nice-mc/lists"}