Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/ZIYU-DEEP/Awesome-Information-Bottleneck

This is a curated list for Information Bottleneck Principle, in memory of Professor Naftali Tishby.
https://github.com/ZIYU-DEEP/Awesome-Information-Bottleneck

List: Awesome-Information-Bottleneck

awesome-list deep-learning deep-neural-networks deep-reinforcement-learning information information-bottleneck

Last synced: 4 months ago
JSON representation

This is a curated list for Information Bottleneck Principle, in memory of Professor Naftali Tishby.

Lists

README

        

# Awesome Information Bottleneck Paper List [![Awesome](https://awesome.re/badge-flat2.svg)](https://awesome.re) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)

*In memory of Professor Naftali Tishby.*\
*Last updated on October, 2022.*

## 0. Introduction
![illustration](./illustration.png)
**To learn, you must forget**. This may probably be one of the most intuitive lessons we have from Naftali Tishby's Information Bottleneck (IB) methods, which grew out of the fundamental tradeoff (rate *v.s.* distortion) from Claude Shannon's information theory, and later creatively explained the learning behaviors of deep neural networks by the fitting & compression framework.

It has been four years since the dazzling talk on [Opening the Black Box of Deep Neural Networks](https://www.youtube.com/watch?v=FSfN2K3tnJU), and more than twenty years since the [first paper](https://arxiv.org/abs/physics/0004057) on the Information Bottleneck method. It is time for us to take a look back, to celebrate what has been established, and to prepare for a future.

This repository is organized as follows:
- [Classics](#1-classics)
- [Reviews](#2-reviews)
- [Theories](#3-theories)
- [Models](#4-models)
- [Applications (General)](#5-applications-general)
- [Applications (RL)](#6-applications-rl)
- [Methods for Mutual Information Estimation](#7-methods-for-mutual-information-estimation) (😣 MI is notoriously hard to estimate! )
- [Other Information Theory Driven Work](#8-other-information-theory-driven-work) (verbose)
- [Citation](#9-citation)

All papers are selected and sorted by topic/conference/year/importance. Please send a pull request if you would like to add any paper.

We also made [slides on theory, applications and controversy](https://github.com/ZIYU-DEEP/Awesome-Information-Bottleneck/blob/main/IB-Intro-Ye.pdf) for the initial Information Bottleneck principle in deep learning (*p.s.*, some controversy has been addressed by recent publications, *e.g.*, [Lorenzen et al., 2021](http://arxiv.org/abs/2106.12912v1)).

## 1. Classics
**Agglomerative Information Bottleneck** [[link](https://papers.nips.cc/paper/1999/file/be3e9d3f7d70537357c67bb3f4086846-Paper.pdf)] \
Noam Slonim, Naftali Tishby\
*NIPS, 1999*

🐤 **The Information Bottleneck Method** [[link](https://arxiv.org/abs/physics/0004057)] \
Naftali Tishby, Fernando C. Pereira, William Bialek\
*Preprint, 2000*

**Predictability, complexity and learning** [[link](https://pubmed.ncbi.nlm.nih.gov/11674845/)] \
William Bialek, Ilya Nemenman, Naftali Tishby\
*Neural Computation, 2001*

**Sufficient Dimensionality Reduction: A novel analysis principle** [[link](https://www.cs.huji.ac.il/labs/learning/Papers/sdr_ICML.pdf)] \
Amir Globerson, Naftali Tishby\
*ICML, 2002*

**The information bottleneck: Theory and applications** [[link](http://www.yaroslavvb.com/papers/slonim-information.pdf)] \
Noam Slonim\
*PhD Thesis, 2002*

**An Information Theoretic Tradeoff between Complexity and Accuarcy** [[link](https://www.cs.huji.ac.il/labs/learning/Papers/ib_theory.pdf)] \
Ran Gilad-Bachrach, Amir Navot, Naftali Tishby\
*COLT, 2003*

**Information Bottleneck for Gaussian Variables** [[link](https://www.cs.huji.ac.il/labs/learning/Papers/GIB_JMLR2004.pdf)] \
Gal Chechik, Amir Globerson, Naftali Tishby, Yair Weiss\
*NIPS, 2003*

**Information and Fitness** [[link](https://www.cs.huji.ac.il/labs/learning/Papers/info+fitness.pdf)] \
Samuel F. Taylor, Naftali Tishby and William Bialek\
*Preprint, 2007*

**Efficient representation as a design principle for neural coding and computation** [[link](https://arxiv.org/abs/0712.4381)] \
William Bialek, Rob R. de Ruyter van Steveninck, and Naftali Tishby\
*Preprint, 2007*

**The Information Bottleneck Revisited or How to Choose a Good Distortion Measure** [[link](https://www.cs.huji.ac.il/labs/learning/Papers/flaske2.pdf)] \
Peter Harremoes and Naftali Tishby\
*ISIT, 2007*

🐤 **Learning and Generalization with the Information Bottleneck** [[link](https://www.cs.huji.ac.il/labs/learning/Papers/ibgen_full.pdf)] \
Ohad Shamir, Sivan Sabato, Naftali Tishby\
*Journal of Theoretical Computer Science, 2009*

🐤 **Information-Theoretic Bounded Rationality** [[link](https://arxiv.org/abs/1512.06789)] \
Pedro A. Ortega, Daniel A. Braun, Justin Dyer, Kee-Eung Kim, Naftali Tishby\
*Preprint, 2015*

🐤 **Opening the Black Box of Deep Neural Networks via Information** [[link](https://arxiv.org/abs/1703.00810)] \
Ravid Shwartz-Ziv, Naftali Tishby\
*ICRI, 2017*

## 2. Reviews
**Information Bottleneck and its Applications in Deep Learning** [[link](https://arxiv.org/abs/1904.03743)] \
Hassan Hafez-Kolahi, Shohreh Kasaei\
*Preprint, 2019*

**The Information Bottleneck Problem and Its Applications in Machine Learning** [[link](https://arxiv.org/abs/2004.14941)] \
Ziv Goldfeld, Yury Polyanskiy\
*Preprint, 2020*

**On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views** [[link](https://www.mdpi.com/1099-4300/22/2/151)] \
Abdellatif Zaidi, Iñaki Estella-Aguerri, Shlomo Shamai\
*Entropy, 2020*

**Information Bottleneck: Theory and Applications in Deep Learning** [[link](https://www.mdpi.com/1099-4300/22/12/1408)] \
Bernhard C. Geiger, Gernot Kubin\
*Entropy, 2020*

**On Information Plane Analyses of Neural Network Classifiers – A Review** [[link](https://arxiv.org/abs/2003.09671)] \
Bernhard C. Geiger\
*Preprint, 2021*
> Table 1 (p.2) gives a nice summary on the effect of different architectures & MI estimators on the existence of the compression phases and causal links between compression and generalizations.

**A Critical Review of Information Bottleneck Theory and its Applications to Deep Learning** [[link](https://arxiv.org/abs/2105.04405v1)] \
Mohammad Ali Alomrani\
*Preprint, 2021*

**Information Flow in Deep Neural Networks** [[link](https://arxiv.org/abs/2202.06749)] \
Ravid Shwartz-Ziv\
*PhD Thesis, 2022*

## 3. Theories
**Gaussian Lower Bound for the Information Bottleneck Limit** [[link](https://www.jmlr.org/papers/volume18/17-398/17-398.pdf)] \
Amichai Painsky, Naftali Tishby\
*JMLR, 2017*

**Information-theoretic analysis of generalization capability of learning algorithms** [[link](https://arxiv.org/pdf/1705.07809.pdf)] \
Aolin Xu, Maxim Raginsky\
*NeurIPS, 2017*

**Caveats for information bottleneck in deterministic scenarios** [[link](https://arxiv.org/abs/1808.07593)] [[ICLR version](https://openreview.net/forum?id=rke4HiAcY7)]\
Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk\
*UAI, 2018*

🐤🔥 **Emergence of Invariance and Disentanglement in Deep Representations** [[link](https://arxiv.org/abs/1706.01350)] \
Alessandro Achille, Stefano Soatto\
*JMLR, 2018*
> - This paper is a gem. On a high-level, it shows the relationship of generalization and **information bottleneck in weights** (IIW).
> - Be aware how this differs from Tishby's original definition on information bottleneck in representation).
> - Specifically, if we approximate SGD by stochastic differential equations, we can see that SGD naturally leads to minimization in IIW.
> - The authors argue that *an* optimal representation should have 4 properties: *sufficiency*, *minimality*, *invariance*, and *disentanglement*. Notably, the last two properties can naturally emerge with the minimization in mutual information between the datasets and network weights, or IIW.


**On the Information Bottleneck Theory of Deep Learning** [[link](https://openreview.net/forum?id=ry_WPG-A-)] \
Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, David Daniel Cox\
*ICLR, 2018*

**The Dual Information Bottleneck** [[link](https://arxiv.org/abs/2006.04641v1)] \
Zoe Piran, Ravid Shwartz-Ziv, Naftali Tishby\
*Preprint, 2019*

🐤 **Learnability for the Information Bottleneck** [[link](https://arxiv.org/abs/1907.07331)] [[slides](https://docs.google.com/presentation/d/1sBYA6V-mL6cwYxEWxA5oMDOKYEq1FIjvZ3jOoXDlVD8/edit?usp=sharing)] [[poster](https://docs.google.com/presentation/d/1jkMxI7j8YXTxtUdy9PAtRfYRaLvFdT7hI1qP9m-qDbE/edit?usp=sharing)] [[journal version](https://www.mdpi.com/1099-4300/21/10/924)] [[workshop version](https://openreview.net/forum?id=SJePKo5HdV)] \
Tailin Wu, Ian Fischer, Isaac L. Chuang, Max Tegmark\
*UAI, 2019*

🐤 **Phase Transitions for the Information Bottleneck in Representation Learning** [[link](https://openreview.net/forum?id=HJloElBYvB)] [[video](https://media.mis.mpg.de/mml/2021-02-04)] \
Tailin Wu, Ian Fischer\
*ICLR, 2020*

**Bottleneck Problems: Information and Estimation-Theoretic View** [[link](http://arxiv.org/abs/2011.06208v1)] \
Shahab Asoodeh, Flavio Calmon\
*Preprint, 2020*

**Information Bottleneck: Exact Analysis of (Quantized) Neural Networks** [[link](http://arxiv.org/abs/2106.12912v1)] \
Stephan Sloth Lorenzen, Christian Igel, Mads Nielsen\
*Preprint, 2021*
> - This paper shows that **different ways of binning** when **computing the mutual information** leads to qualitatively different results.
> - It then confirms then original IB paper's results of the **fitting & compression** phase using quantized nets with exact computation for mutual information.


**Perturbation Theory for the Information Bottleneck** [[link](http://arxiv.org/abs/2105.13977v1)] \
Vudtiwat Ngampruetikorn, David J. Schwab\
*Preprint, 2021*

**PAC-Bayes Information Bottleneck** [[link](https://openreview.net/forum?id=iLHOIDsPv1P)] \
Zifeng Wang, Shao-Lun Huang, Ercan Engin Kuruoglu, Jimeng Sun, Xi Chen, Yefeng Zheng\
*ICLR, 2022*
> - This paper discusses using $I(w, S)$ instead to $I(T, X)$ as the information bottleneck.
> - However, ***activations*** should in effect play a crucial role in network's generalization, but they are not explicitly captured by $I(w, S)$.


## 4. Models
**Deep Variational Information Bottleneck** [[link](https://openreview.net/forum?id=HyxQzBceg)] \
Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy\
*ICLR, 2017*

**The Deterministic Information Bottleneck** [[link](https://direct.mit.edu/neco/article/29/6/1611/8273/The-Deterministic-Information-Bottleneck)] [[UAI Version](https://www.auai.org/uai2016/proceedings/papers/319.pdf)] \
DJ Strouse, David J. Schwab\
*Neural Computation, 2017*
> This replaces the mutual information term with entropy in the original IB objective.


**Learning Sparse Latent Representations with the Deep Copula Information Bottleneck** [[link](https://openreview.net/forum?id=Hk0wHx-RW)] \
Aleksander Wieczorek, Mario Wieser, Damian Murezzan, Volker Roth \
*ICLR, 2018*

**Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck** [[link](https://papers.nips.cc/paper/2019/hash/e2ccf95a7f2e1878fcafc8376649b6e8-Abstract.html)] \
Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann\
*NeurIPS, 2019*

**Information bottleneck through variational glasses** [[link](http://bayesiandeeplearning.org/2019/papers/75.pdf)]\
Slava Voloshynovskiy, Mouad Kondah, Shideh Rezaeifar, Olga Taran, Taras Holotyak, Danilo Jimenez Rezende\
*NeurIPS Bayesian Deep Learning Workshop, 2019*

🐤 **Variational Discriminator Bottleneck** [[link](https://openreview.net/forum?id=HyxPx3R9tm)] \
Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine\
*ICLR, 2019*

**Nonlinear Information Bottleneck** [[link](https://www.mdpi.com/1099-4300/21/12/1181)] \
Artemy Kolchinsky, Brendan Tracey, David Wolpert\
*Entropy, 2019*
> This formuation shows better performance than VIB.


**General Information Bottleneck Objectives and their Applications to Machine Learning** [[link](https://arxiv.org/pdf/1912.06248.pdf)] \
Sayandev Mukherjee\
*Preprint, 2019*
> This paper synthesize IB and Predictive IB, and provides a new variational bound.


🐤 **Graph Information Bottleneck** [[link](https://arxiv.org/abs/2010.12811)] [[code](https://github.com/snap-stanford/GIB)] [[slides](https://docs.google.com/presentation/d/1yGs6kfaFHKlZdu0REuSpZTN4Pqt7b0bAnh2y8lvYm3A/edit)] \
Tailin Wu, Hongyu Ren, Pan Li, Jure Leskovec,\
*NeurIPS, 2020*

🐤 **Learning Optimal Representations with the Decodable Information Bottleneck** [[link](https://papers.nips.cc/paper/2020/hash/d8ea5f53c1b1eb087ac2e356253395d8-Abstract.html)] \
Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam\
*NeurIPS, 2020*

🐤 **Concept Bottleneck Models** [[link](https://arxiv.org/abs/2007.04612v3)] \
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang\
*ICML, 2020*

**Disentangled Representations for Sequence Data using Information Bottleneck Principle** [[link](http://proceedings.mlr.press/v129/yamada20a.html)] [[talk](https://papertalk.org/papertalks/13911)] \
Masanori Yamada, Heecheol Kim, Kosuke Miyoshi, Tomoharu Iwata, Hiroshi Yamakawa\
*ICML, 2020*

🐤 **IBA: Restricting the Flow: Information Bottlenecks for Attribution** [[link](https://openreview.net/forum?id=S1xWh1rYwB)] [[code](https://github.com/BioroboticsLab/IBA)] \
Karl Schulz, Leon Sixt, Federico Tombari, Tim Landgraf\
*ICLR, 2020*

**On the Difference between the Information Bottleneck and the Deep Information Bottleneck** [[link](https://www.mdpi.com/1099-4300/22/2/131)] \
Aleksander Wieczorek, Volker Roth\
*Entropy, 2020*

**The Convex Information Bottleneck Lagrangian** [[link](http://arxiv.org/abs/1911.11000v2)] \
Borja Rodríguez Gálvez, Ragnar Thobaben, Mikael Skoglund\
*Preprint, 2020*

**The HSIC Bottleneck: Deep Learning without Back-Propagation** [[link](https://arxiv.org/abs/1908.01580)] [[code](https://github.com/choasma/HSIC-bottleneck)] \
Wan-Duo Kurt Ma, J.P. Lewis, W. Bastiaan Kleijn
*AAAI, 2020*
> - This paper uses Hilbert-Schmidt independence criterion (HSIC) as a surrogate to compute mutual information in IB objective.
> - It shows an alternative way to learn a neural network without backpropagation, inspired by the IB principle.

**Disentangled Information Bottleneck** [[link](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjhjduUk_vyAhVZ6XMBHQlpBPoQFnoECAQQAQ&url=https%3A%2F%2Fojs.aaai.org%2Findex.php%2FAAAI%2Farticle%2Fview%2F17120%2F16927&usg=AOvVaw1yu4FBl7RBV1xGUgh21VmY)] [[code](https://github.com/PanZiqiAI/disentangled-information-bottleneck)] \
Ziqi Pan, Li Niu, Jianfu Zhang, Liqing Zhang\
*AAAI, 2021*

🐤 **IB-GAN: Disentangled Representation Learning** [[link](https://ojs.aaai.org/index.php/AAAI/article/view/16967)] [[code](https://github.com/insuj3on/IB-GAN)][[talk](https://papertalk.org/papertalks/30379)]\
Insu Jeon, Wonkwang Lee, Myeongjang Pyeon, Gunhee Kim\
*AAAI, 2021*
> This model add additional IB constraint based on InfoGAN.


**Deciding What to Learn: A Rate-Distortion Approach** [[link](https://arxiv.org/abs/2101.06197v3)] \
Dilip Arumugam, Benjamin Van Roy\
*ICML, 2021*

🐤 **Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization** [[link](http://arxiv.org/abs/2106.06607v1)] \
Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish\
*Preprint, 2021*

**Multi-Task Variational Information Bottleneck** [[link](http://arxiv.org/abs/2007.00339v4)] \
Weizhu Qian, Bowei Chen, Yichao Zhang, Guanghui Wen, Franck Gechter\
*Preprint, 2021*

## 5. Applications (General)
🐤 **Analyzing neural codes using the information bottleneck method** [[link](https://www.cs.huji.ac.il/labs/learning/Papers/nips01_sub.pdf)] \
Elad Schneidman, Noam Slonim, Naftali Tishby, Rob R. deRuyter van Steveninck, William Bialek\
*NIPS, 2001*

**Past-future information bottleneck in dynamical systems** [[link](https://journals.aps.org/pre/abstract/10.1103/PhysRevE.79.041925)] \
Felix Creutzig, Amir Globerson, Naftali Tishby\
*Physical Review, 2009*

**Compressing Neural Networks using the Variational Information Bottleneck** [[link](http://proceedings.mlr.press/v80/dai18d.html)] \
Bin Dai, Chen Zhu, Baining Guo, David Wipf \
*ICML, 2018*

🐤 **InfoMask: Masked Variational Latent Representation to Localize Chest Disease** [[link](https://arxiv.org/abs/1903.11741)] \
Saeid Asgari Taghanaki, Mohammad Havaei, Tess Berthier, Francis Dutil, Lisa Di Jorio, Ghassan Hamarneh, Yoshua Bengio\
*MICCAI, 2019*
> Be aware how this differs from the [IBA](https://openreview.net/forum?id=S1xWh1rYwB) paper.


**Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics** [[link](https://www.nature.com/articles/s41467-019-11405-4)] \
Yihang Wang, João Marcelo Lamim Ribeiro, Pratyush Tiwary\
*Nature Communications, 2019*

**Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks** [[link](https://papers.nips.cc/paper/2020/hash/517f24c02e620d5a4dac1db388664a63-Abstract.html)] \
Roman Pogodin, Peter Latham\
*NeurIPS, 2020*

**Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification** [[link](https://papers.nips.cc/paper/2020/hash/593906af0d138e69f49d251d3e7cbed0-Abstract.html)] \
Lynton Ardizzone, Radek Mackowiak, Carsten Rother, Ullrich Köthe\
*NeurIPS, 2020*

**Unsupervised Speech Decomposition via Triple Information Bottleneck** [[link](https://proceedings.mlr.press/v119/qian20a.html)] [[code](https://github.com/auspicious3000/SpeechSplit)] \
Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson, David Cox\
*ICML, 2020*

**Learning Efficient Multi-agent Communication: An Information Bottleneck Approach** [[link](https://proceedings.mlr.press/v119/wang20i.html)] \
Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, Zinovi Rabinovich\
*ICML, 2020*

🐤 **Inserting Information Bottlenecks for Attribution in Transformers** [[link](https://arxiv.org/abs/2012.13838v2)] \
Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin\
*EMNLP, 2020*

**Information Bottleneck for Estimating Treatment Effects with Systematically Missing Covariates** [[link](https://www.mdpi.com/1099-4300/22/4/389)] \
Sonali Parbhoo, Mario Wieser, Aleksander Wieczorek, and Volker Roth\
*Entropy, 2020*

**Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding** [[link](http://arxiv.org/abs/1905.11741v3)] \
Yigit Ugur, George Arvanitakis, Abdellatif Zaidi\
*Entropy, 2020*

**Learning to Learn with Variational Information Bottleneck for Domain Generalization** [[link](http://arxiv.org/abs/2007.07645v1)] \
Yingjun Du, Jun Xu, Huan Xiong, Qiang Qiu, Xiantong Zhen, Cees G. M. Snoek, Ling Shao\
*ECCV, 2020*

**The information bottleneck and geometric clustering** [[link](http://arxiv.org/abs/1712.09657v2)] \
DJ Strouse, David J Schwab\
*Preprint, 2020*

**Causal learning with sufficient statistics: an information bottleneck approach** [[link](http://arxiv.org/abs/2010.05375v1)] \
Daniel Chicharro, Michel Besserve, Stefano Panzeri\
*Preprint, 2020*

**Learning Robust Representations via Multi-View Information Bottleneck** [[link](http://arxiv.org/abs/2002.07017v2)] \
Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata\
*Preprint, 2020*

🐤 **Information Bottleneck Disentanglement for Identity Swapping** [[link](https://openaccess.thecvf.com/content/CVPR2021/html/Gao_Information_Bottleneck_Disentanglement_for_Identity_Swapping_CVPR_2021_paper.html)] \
Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He\
*CVPR, 2021*

**A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition** [[link](https://openaccess.thecvf.com/content/WACV2021/papers/Srivastava_A_Variational_Information_Bottleneck_Based_Method_to_Compress_Sequential_Networks_WACV_2021_paper.pdf)] \
Ayush Srivastava, Oshin Dutta, Jigyasa Gupta, Sumeet Agarwal, Prathosh AP\
*WACV, 2021*

**The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget** [[link](https://openreview.net/forum?id=Hye1kTVFDS)] \
Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine\
*ICLR, 2020*

**Variational Information Bottleneck for Effective Low-Resource Fine-Tuning** [[link](https://openreview.net/forum?id=kvhzKz-_DMF)] \
Rabeeh Karimi mahabadi, Yonatan Belinkov, James Henderson \
*ICLR, 2021*

**Dynamic Bottleneck for Robust Self-Supervised Exploration** [[link](https://proceedings.neurips.cc/paper/2021/hash/8d3369c4c086f236fabf61d614a32818-Abstract.html)] \
Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang\
*NeurIPS, 2021*

**Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck** [[link](https://openreview.net/forum?id=90M-91IZ0JC)] [[talk](https://papertalk.org/papertalks/35620)] \
Junho Kim, Byung-Kwan Lee, Yong Man Ro
*NeurIPS, 2021*

**Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness** [[link](https://proceedings.neurips.cc/paper/2021/file/055e31fa43e652cb4ab6c0ee845c8d36-Paper.pdf)] [[talk](https://papertalk.org/papertalks/36111)] \
Zifeng Wang, Tong Jian, Aria Masoomi, Stratis Ioannidis, Jennifer Dy\
*NeurIPS, 2021*

**A Variational Information Bottleneck Approach to Multi-Omics Data Integration** [[link](http://arxiv.org/abs/2102.03014v2)] \
Changhee Lee, Mihaela van der Schaar\
*AISTATS, 2021*

**Information Bottleneck Approach to Spatial Attention Learning** [[link](http://arxiv.org/abs/2108.03418v1)] \
Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu\
*IJCAI, 2021*

**Unsupervised Hashing with Contrastive Information Bottleneck** [[link](http://arxiv.org/abs/2105.06138v2)] \
Zexuan Qiu, Qinliang Su, Zijing Ou, Jianxing Yu, Changyou Chen\
*IJCAI, 2021*

**Neuron Campaign for Initialization Guided by Information Bottleneck Theory** [[link](http://arxiv.org/abs/2108.06530v1)] \
Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang\
*CIKM, 2021*

**Information Theoretic Meta Learning with Gaussian Processes** [[link](https://arxiv.org/pdf/2009.03228.pdf)] \
Michalis K. Titsias, Francisco J. R. Ruiz, Sotirios Nikoloutsopoulos, Alexandre Galashov\
*UAI, 2021*

**A Closer Look at the Adversarial Robustness of Information Bottleneck Models** [[link](http://arxiv.org/abs/2107.05712v1)] \
Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal\
*ICML Workshop on A Blessing in Disguise, 2021*

**Information Bottleneck Attribution for Visual Explanations of Diagnosis and Prognosis** [[link](http://arxiv.org/abs/2104.02869v2)] \
Ugur Demir, Ismail Irmakci, Elif Keles, Ahmet Topcu, Ziyue Xu, Concetto Spampinato, Sachin Jambawalikar, Evrim Turkbey, Baris Turkbey, Ulas Bagci\
*Preprint, 2021*

**State Predictive Information Bottleneck** [[link](https://arxiv.org/abs/2011.10127)] [[code](https://github.com/tiwarylab/State-Predictive-Information-Bottleneck)] \
Dedi Wang, Pratyush Tiwary\
*Preprint, 2021*

**Disentangled Variational Information Bottleneck for Multiview Representation Learning** [[link](http://arxiv.org/abs/2105.07599v1)] [[code](https://github.com/feng-bao-ucsf/DVIB)] \
Feng Bao\
*Preprint, 2021*

**Invariant Information Bottleneck for Domain Generalization** [[link](http://arxiv.org/abs/2106.06333v2)] \
Bo Li, Yifei Shen, Yezhen Wang, Wenzhen Zhu, Colorado J. Reed, Jun Zhang, Dongsheng Li, Kurt Keutzer, Han Zhao\
*Preprint, 2021*

**Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning** [[link](https://arxiv.org/abs/2109.14188)] \
Yue Jin, Shuangqing Wei, Jian Yuan, Xudong Zhang\
*Preprint, 2021*

**Generalization in Quantum Machine Learning: a Quantum Information Perspective** [[link](http://arxiv.org/abs/2102.08991)] \
Leonardo Banchi, Jason Pereira, Stefano Pirandola\
*Preprint, 2021*

**Causal Effect Estimation using Variational Information Bottleneck** [[link](https://arxiv.org/abs/2110.13705)] \
Zhenyu Lu, Yurong Cheng, Mingjun Zhong, George Stoian, Ye Yuan, Guoren Wang\
*Preprint, 2021*

**A Closer Look at the Adversarial Robustness of Information Bottleneck Models** [[link](https://arxiv.org/abs/2107.05712)] \
Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal\
*ICML Workshop on A Blessing in Disguise, 2021*

🐤 **Neuron Campaign for Initialization Guided by Information Bottleneck Theory** [[link](https://arxiv.org/abs/2108.06530)] \
Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang\
*CIKM, 2021*

**Improving Subgraph Recognition with Variational Graph Information Bottleneck** [[link](https://arxiv.org/abs/2112.09899)] \
Junchi Yu, Jie Cao, Ran He\
*CVPR, 2022*

**Graph Structure Learning with Variational Information Bottleneck** [[link](https://arxiv.org/abs/2112.08903)] \
Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Xingcheng Fu, Cheng Ji, Philip S. Yu\
*AAAI, 2022*

**Renyi Fair Information Bottleneck for Image Classification** [[link](https://arxiv.org/abs/2203.04950)] \
Adam Gronowski, William Paul, Fady Alajaji, Bahman Gharesifard, Philippe Burlina\
*Preprint, 2022*

**The Distributed Information Bottleneck reveals the explanatory structure of complex systems** [[link](https://arxiv.org/abs/2204.07576)] \
Kieran A. Murphy, Dani S. Bassett\
*Preprint, 2021*

**Sparsity-Inducing Categorical Prior Improves Robustness of the Information Bottleneck** [[link](https://arxiv.org/abs/2203.02592)] \
Anirban Samaddar, Sandeep Madireddy, Prasanna Balaprakash\
*Preprint, 2022*

**Pareto-optimal clustering with the primal deterministic information bottleneck** [[link](https://arxiv.org/abs/2204.02489)] \
Andrew K. Tan, Max Tegmark, Isaac L. Chuang\
*Preprint, 2022*

**Information-Theoretic Odometry Learning** [[link](https://arxiv.org/abs/2203.05724)] \
Sen Zhang, Jing Zhang, Dacheng Tao\
*Preprint, 2022*

**Causal Effect Estimation using Variational Information Bottleneck** [[link](https://arxiv.org/abs/2110.13705)] \
Zhenyu Lu, Yurong Cheng, Mingjun Zhong, George Stoian, Ye Yuan, Guoren Wang\
*Preprint, 2022*

## 6. Applications (RL)
**InfoBot: Transfer and Exploration via the Information Bottleneck** [[paper](https://openreview.net/forum?id=rJg8yhAqKm)] [[code](https://github.com/maximecb/gym-minigrid)]\
Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine\
*ICLR, 2019*
> The idea is simply to constrain the dependence on a certain goal, so that the agent can learn a *default behavior*.


**Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck** [[link](https://arxiv.org/abs/1910.12911)] [[code](https://github.com/microsoft/IBAC-SNI)] [[talk](https://www.youtube.com/watch?v=tWtM4Dq05ZA)]\
Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann\
*NeurIPS, 2019*

**Learning Task-Driven Control Policies via Information Bottlenecks** [[link](https://arxiv.org/abs/2002.01428)] [[spotlight talk](https://www.youtube.com/watch?v=nzLyRHON24E)]\
Vincent Pacelli, Anirudha Majumdar\
*RSS, 2020*

🐤 **The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach** [[journal '20](https://www.jair.org/index.php/jair/article/view/12463/26616)] [[arxiv '18](https://arxiv.org/abs/1807.04723)]
Iulian Vlad Serban, Chinnadhurai Sankar, Michael Pieper, Joelle Pineau, Yoshua Bengio\
*Journal of Artificial Intelligence Research (JAIR), 2020*

**Learning Robust Representations via Multi-View Information Bottleneck** [[link](https://openreview.net/forum?id=B1xwcyHFDr)] [[code](https://github.com/mfederici/Multi-View-Information-Bottleneck)] [[talk](https://iclr.cc/virtual_2020/poster_B1xwcyHFDr.html)]\
Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata\
*ICLR, 2020*

**DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck** [[paper](https://openreview.net/forum?id=Py8WbvKH_wv)] [[code](https://github.com/JmfanBU/DRIBO)]\
Jiameng Fan, Wenchao Li\
*ICML, 2022*

**Learning Representations in Reinforcement Learning: an Information Bottleneck Approach** [[link](https://openreview.net/forum?id=Syl-xpNtwS)] [[code](https://github.com/AnonymousSubmittedCode/SVIB)]\
Yingjun Pei, Xinwen Hou\
*Rejected by ICLR, 2020*

**Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning** [[link](https://arxiv.org/abs/2008.00614)]\
Xingyu Lu, Kimin Lee, Pieter Abbeel, Stas Tiomkin\
*ArXiv, 2020*

**Dynamic Bottleneck for Robust Self-Supervised Exploration** [[paper](https://openreview.net/forum?id=-t6TeG3A6Do)] [[code](https://github.com/Baichenjia/DB)]\
Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye HAO, Peng Liu, Zhaoran Wang\
*NeurIPS, 2021*

**Regret Bounds for Information-Directed Reinforcement Learning** [[paper](https://arxiv.org/abs/2206.04640)]\
Botao Hao, Tor Lattimore\
*ArXiv, 2022*

## 7. Methods for Mutual Information Estimation
> 😣😣😣 Mutual information is notoriously hard to estimate!

🐤 **Benchmarking Mutual Information** [[link](https://arxiv.org/pdf/2306.11078.pdf)] [[code](https://github.com/cbg-ethz/bmi)] [[doc](https://cbg-ethz.github.io/bmi/#getting-started)] \
Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx\
*NeurIPS, 2023*

**Variational f-Divergence and Derangements for Discriminative Mutual Information Estimation** [[link](https://arxiv.org/abs/2305.20025)] [[code](https://github.com/tonellolab/fDIME)] \
Nunzio A. Letizia, Nicola Novello, Andrea M. Tonello\
*ArXiv, 2023*

**Estimating Mutual Information** [[link](https://arxiv.org/abs/cond-mat/0305641)] [[code](https://github.com/ravidziv/IDNNs)] \
Alexander Kraskov, Harald Stoegbauer, Peter Grassberger\
*Physical Review, 2004*

**Efficient Estimation of Mutual Information for Strongly Dependent Variables** [[link](https://arxiv.org/abs/1411.2003)] [[code](https://github.com/ravidziv/IDNNs)] \
Shuyang Gao, Greg Ver Steeg, Aram Galstyan\
*AISTATS, 2015*
> - This shows that KNN-based estimators requires number of samples which scales *exponentially* with the true MI; that is, they become inaccurate as MI gets large.
> - Thus, as the relationship become more dependent, the MI estimation becomes more inaccurate. Or in other words, KNN-based estimators are only good at *detecting independence of variables*.


🐤 **`MINE`: Mutual Information Neural Estimation** [[link](https://arxiv.org/abs/1801.04062)] [[code](https://github.com/gtegner/mine-pytorch)] \
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm\
*ICML, 2018*

**Evaluating Capability of Deep Neural Networks for Image Classification via Information Plane** [[link](https://openaccess.thecvf.com/content_ECCV_2018/html/Hao_Cheng_Evaluating_Capability_of_ECCV_2018_paper.html)] [[code](https://github.com/haochenglouis/ib_cnn)] \
Hao Cheng, Dongze Lian, Shenghua Gao, Yanlin Geng\
*ECCV, 2018*

🐤 **`InfoMax`: Learning Deep representations by Mutual Information Estimation and Maximization** [[link](https://arxiv.org/abs/1808.06670)] [[code](https://github.com/rdevon/DIM)] \
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio\
*ICLR, 2019 (Oral)*

🐤 **On Variational Bounds of Mutual Information** [[link](https://arxiv.org/pdf/1905.06922.pdf)] [[PyTorch](https://github.com/arashabzd/milib)] \
Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker\
*ICML, 2019*

🐤 **Estimating Information Flow in Deep Neural Networks** [[link](https://arxiv.org/abs/1810.05728)] [[PyTorch](https://github.com/ankithmo/estimateMI)] \
Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy\
*ICML, 2019*

**Neural Estimators for Conditional Mutual Information Using Nearest Neighbors Sampling** [[link](https://arxiv.org/abs/2006.07225v3)] [[code](https://github.com/smolavipour/CMI_Neural_Estimator)] \
Sina Molavipour, Germán Bassi, Mikael Skoglund\
*Preprint, 2020*

**`CCMI`: Classifier based Conditional Mutual Information Estimation** [[link](https://arxiv.org/abs/1906.01824)] [[code](https://github.com/sudiptodip15/CCMI)] \
Sudipto Mukherjee, Himanshu Asnani, Sreeram Kannan\
*UAI, 2020*

**`MIGE`: Mutual Information Gradient Estimation for Representation Learning** [[link](https://arxiv.org/abs/2005.01123)] [[code](https://github.com/zhouyiji/MIGE)] \
Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu\
*ICLR, 2020*

🐤 **Information Bottleneck: Exact Analysis of (Quantized) Neural Networks** [[link](https://arxiv.org/abs/2106.12912v1)]\
Stephan Sloth Lorenzen, Christian Igel, Mads Nielsen\
*Preprint, 2021*
> - This paper shows that different ways of binning when computing the mutual information leads to qualitatively different results.
> - It then confirms then original IB paper's results of the fitting & compression phase using quantized nets with exact computation for mutual information.


🐤 **Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization** [[link](https://arxiv.org/abs/2107.01131)] [[code](https://github.com/qingguo666/FLO)]\
Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao\
*Preprint, 2021*

**Entropy and mutual information in models of deep neural networks** [[link](https://arxiv.org/abs/1805.09785)] \
Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová\
*NeurIPS, 2018*

🐤 **Understanding the Limitations of Variational Mutual Information Estimators** [[link](https://iclr.cc/virtual_2020/poster_B1x62TNtDS.html)] [[PyTorch](https://github.com/ermongroup/smile-mi-estimator)] \
Jiaming Song, Stefano Ermon \
*ICLR, 2020*
> - This implementation includes `InfoNCE`, `NWJ`, `NWJ-JS`, `MINE`, and their own method `SMILE`.
> - Basically, they show that the *variance* of traditional MI estimation can grow exponentially with true MI. In other words, just as KNN estimators, the more dependent (the higher MI), the less accurate.
> - Also, those estimators ***does not satisfy*** some important self-consistency properties, such as *data processing inequality*.
> - They propose SMILE which aims to reduce the variance issue.


🐤🐤 **Sliced Mutual Information: A Scalable Measure of Statistical Dependence** [[link](https://openreview.net/forum?id=27qon5Ut4PSl)] \
Ziv Goldfeld, Kristjan Greenewald\
*NeurIPS, 2021 (**spotlight**)*

🐤 **TImproving Mutual Information Estimation with Annealed and Energy-Based Bounds** [[link](https://openreview.net/forum?id=T0B9AoM_bFg)]\
Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao\
*ICLR, 2022*

**Assessing Neural Network Representations During Training Using Noise-Resilient `Diffusion Spectral Entropy`** [[link](https://arxiv.org/abs/2312.04823)] [[code](https://github.com/ChenLiu-1996/DiffusionSpectralEntropy)]\
Danqi Liao*, Chen Liu*, Benjamin W Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, Smita Krishnaswamy\
*ICML Workshop, 2023*
> This paper leverages diffusion geometry to estimate Entropy and MI in high dimensional representations of modern neural networks.

## 8. Other Information Theory Driven Work
**f-GANs in an Information Geometric Nutshell** [[link](https://papers.nips.cc/paper/2017/hash/2f2b265625d76a6704b08093c652fd79-Abstract.html)] \
Richard Nock, Zac Cranko, Aditya K. Menon, Lizhen Qu, Robert C. Williamson\
*NeurIPS, 2017*

**Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach** [[link](https://papers.nips.cc/paper/2017/hash/8bb88f80d334b1869781beb89f7b73be-Abstract.html)] \
Roel Dobbe, David Fridovich-Keil, Claire Tomlin\
*NeurIPS, 2017*

**Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications** [[link](https://papers.nips.cc/paper/2017/hash/8fb5f8be2aa9d6c64a04e3ab9f63feee-Abstract.html)] \
Linus Hamilton, Frederic Koehler, Ankur Moitra\
*NeurIPS, 2017*

**Information-theoretic analysis of generalization capability of learning algorithms** [[link](https://papers.nips.cc/paper/2017/hash/ad71c82b22f4f65b9398f76d8be4c615-Abstract.html)] \
Aolin Xu, Maxim Raginsky\
*NeurIPS, 2017*

**Learning Discrete Representations via Information Maximizing Self-Augmented Training** [[link](https://proceedings.mlr.press/v70/hu17b.html)] \
Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama\
*ICML, 2017*

🐣 **Nonparanormal Information Estimation** [[link](https://proceedings.mlr.press/v70/singh17a.html)] \
Shashank Singh, Barnabás Póczos\
*ICML, 2017*
> This paper shows how to robustly estimate mutual information using i.i.d. samples from unknown distribution.


**Entropy and mutual information in models of deep neural networks** [[link](https://papers.nips.cc/paper/2018/hash/6d0f846348a856321729a2f36734d1a7-Abstract.html)] \
Marylou Gabrié, Andre Manoel, Clément Luneau, jean barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová\
*NeurIPS, 2018*

**Chaining Mutual Information and Tightening Generalization Bounds** [[link](https://papers.nips.cc/paper/2018/hash/8d7628dd7a710c8638dbd22d4421ee46-Abstract.html)] \
Amir Asadi, Emmanuel Abbe, Sergio Verdu\
*NeurIPS, 2018*

**Information Constraints on Auto-Encoding Variational Bayes** [[link](https://papers.nips.cc/paper/2018/hash/9a96a2c73c0d477ff2a6da3bf538f4f4-Abstract.html)] \
Romain Lopez, Jeffrey Regier, Michael I. Jordan, Nir Yosef\
*NeurIPS, 2018*

**Adaptive Learning with Unknown Information Flows** [[link](https://papers.nips.cc/paper/2018/hash/9e740b84bb48a64dde25061566299467-Abstract.html)] \
Yonatan Gur, Ahmadreza Momeni\
*NeurIPS, 2018*

**Information-based Adaptive Stimulus Selection to Optimize Communication Efficiency in Brain-Computer Interfaces** [[link](https://papers.nips.cc/paper/2018/hash/a3eb043e7bf775de87763e9f8121c953-Abstract.html)] \
Boyla Mainsah, Dmitry Kalika, Leslie Collins, Siyuan Liu, Chandra Throckmorton\
*NeurIPS, 2018*

**Information Theoretic Guarantees for Empirical Risk Minimization with Applications to Model Selection and Large-Scale Optimization** [[link](https://proceedings.mlr.press/v80/alabdulmohsin18a.html)] \
Ibrahim Alabdulmohsin\
*ICML, 2018*

**Mutual Information Neural Estimation** [[link](https://proceedings.mlr.press/v80/belghazi18a.html)] \
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm\
*ICML, 2018*

**Learning to Explain: An Information-Theoretic Perspective on Model Interpretation** [[link](https://proceedings.mlr.press/v80/chen18j.html)]\
Jianbo Chen, Le Song, Martin Wainwright, Michael Jordan\
*ICML, 2018*

**Fast Information-theoretic Bayesian Optimisation** [[link](https://proceedings.mlr.press/v80/ru18a.html)] \
Binxin Ru, Michael A. Osborne, Mark Mcleod, Diego Granziol\
*ICML, 2018*

**Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond** [[link](https://papers.nips.cc/paper/2019/hash/21b29648a47a45ad16bb0da0c004dfba-Abstract.html)] \
Lin Chen, Hossein Esfandiari, Gang Fu, Vahab Mirrokni\
*NeurIPS, 2019*

**Information-Theoretic Confidence Bounds for Reinforcement Learning** [[link](https://papers.nips.cc/paper/2019/hash/411ae1bf081d1674ca6091f8c59a266f-Abstract.html)] \
Xiuyuan Lu, Benjamin Van Roy\
*NeurIPS, 2019*

**L-DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise** [[link](https://papers.nips.cc/paper/2019/hash/8a1ee9f2b7abe6e88d1a479ab6a42c5e-Abstract.html)] \
Yilun Xu, Peng Cao, Yuqing Kong, Yizhou Wang\
*NeurIPS, 2019*

**Connections Between Mirror Descent, Thompson Sampling and the Information Ratio** [[link](https://papers.nips.cc/paper/2019/hash/92cf3f7ef90630755b955924254e6ec4-Abstract.html)] \
Julian Zimmert, Tor Lattimore\
*NeurIPS, 2019*

**Region Mutual Information Loss for Semantic Segmentation** [[link](https://papers.nips.cc/paper/2019/hash/a67c8c9a961b4182688768dd9ba015fe-Abstract.html)] \
Shuai Zhao, Yang Wang, Zheng Yang, Deng Cai\
*NeurIPS, 2019*

**Learning Representations by Maximizing Mutual Information Across Views** [[link](https://papers.nips.cc/paper/2019/hash/ddf354219aac374f1d40b7e760ee5bb7-Abstract.html)]\
Philip Bachman, R Devon Hjelm, William Buchwalter\
*NeurIPS, 2019*

**Icebreaker: Element-wise Efficient Information Acquisition with a Bayesian Deep Latent Gaussian Model** [[link](https://papers.nips.cc/paper/2019/hash/c055dcc749c2632fd4dd806301f05ba6-Abstract.html)] \
Wenbo Gong, Sebastian Tschiatschek, Sebastian Nowozin, Richard E. Turner, José Miguel Hernández-Lobato, Cheng Zhang\
*NeurIPS, 2019*

**Thompson Sampling with Information Relaxation Penalties** [[link](https://papers.nips.cc/paper/2019/hash/e5b294b70c9647dcf804d7baa1903918-Abstract.html)] \
Seungki Min, Costis Maglaras, Ciamac C. Moallemi\
*NeurIPS, 2019*

**InfoMax: Learning deep representations by mutual information estimation and maximization** [[link](https://openreview.net/forum?id=Bklr3j0cKX)][[code](https://github.com/rdevon/DIM)] \
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio\
*ICLR, 2019*

**Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds** [[link](https://openreview.net/forum?id=BJg9DoR9t7)] \
Peng Cao, Yilun Xu, Yuqing Kong, Yizhou Wang\
*ICLR, 2019*

**Information-Directed Exploration for Deep Reinforcement Learning** [[link](https://openreview.net/forum?id=Byx83s09Km)] \
Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause\
*ICLR, 2019*

**Soft Q-Learning with Mutual-Information Regularization** [[link](https://openreview.net/forum?id=HyEtjoCqFX)] \
Jordi Grau-Moya, Felix Leibfried, Peter Vrancx\
*ICLR, 2019*

**Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization** [[link](https://openreview.net/forum?id=Hyl_vjC5KQ)] \
Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama\
*ICLR, 2019*

**Information Asymmetry in KL-regularized RL** [[link](https://openreview.net/forum?id=S1lqMn05Ym)] \
Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess\
*ICLR, 2019*

**Adaptive Estimators Show Information Compression in Deep Neural Networks** [[link](https://openreview.net/forum?id=SkeZisA5t7)] \
Ivan Chelombiev, Conor Houghton, Cian O'Donnell\
*ICLR, 2019*

**Information Theoretic lower bounds on negative log likelihood** [[link](https://openreview.net/forum?id=rkemqsC9Fm)] \
Luis A. Lastras-Montaño\
*ICLR, 2019*

**New results on information theoretic clustering** [[link](https://proceedings.mlr.press/v97/cicalese19a.html)] [[code](https://github.com/lmurtinho/RatioGreedyClustering/tree/ICML_submission)] \
Ferdinando Cicalese, Eduardo Laber, Lucas Murtinho\
*ICML, 2019*

**Estimating Information Flow in Deep Neural Networks** [[link](https://proceedings.mlr.press/v97/goldfeld19a.html)] \
Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy\
*ICML, 2019*

🐣 **The information-theoretic value of unlabeled data in semi-supervised learning** [[link](https://proceedings.mlr.press/v97/golovnev19a.html)] \
Alexander Golovnev, David Pal, Balazs Szorenyi\
*ICML, 2019*

**EMI: Exploration with Mutual Information** [[link](https://proceedings.mlr.press/v97/kim19a.html)] [[code](https://github.com/snu-mllab/EMI)] \
Hyoungseok Kim, Jaekyeom Kim, Yeonwoo Jeong, Sergey Levine, Hyun Oh Song\
*ICML, 2019*

🐣 **On Variational Bounds of Mutual Information** [[link](https://proceedings.mlr.press/v97/poole19a.html)] \
Ben Poole, Sherjil Ozair, Aaron Van Den Oord, Alex Alemi, George Tucker\
*ICML, 2019*

**Where is the Information in a Deep Neural Network?** [[link](https://arxiv.org/abs/1905.12213)] \
Alessandro Achille, Giovanni Paolini, Stefano Soatto\
*Preprint, 2020*

**Information Maximization for Few-Shot Learning** [[link](https://papers.nips.cc/paper/2020/hash/196f5641aa9dc87067da4ff90fd81e7b-Abstract.html)] \
Malik Boudiaf, Imtiaz Ziko, Jérôme Rony, Jose Dolz, Pablo Piantanida, Ismail Ben Ayed\
*NeurIPS, 2020*

**Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information** [[link](https://papers.nips.cc/paper/2020/hash/7f2be1b45d278ac18804b79207a24c53-Abstract.html)] \
Genevieve Flaspohler, Nicholas A. Roy, John W. Fisher III\
*NeurIPS, 2020*

**Predictive Information Accelerates Learning in RL** [[link](https://papers.nips.cc/paper/2020/hash/89b9e0a6f6d1505fe13dea0f18a2dcfa-Abstract.html)] \
Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama\
*NeurIPS, 2020*
> "The **predictive information** is the **mutual information** between the *past* and the *future*, $I(X_{\text{past}}; X_{\text{future}})$."


**Information Theoretic Regret Bounds for Online Nonlinear Control** [[link](https://papers.nips.cc/paper/2020/hash/aee5620fa0432e528275b8668581d9a8-Abstract.html)] \
Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun\
*NeurIPS, 2020*

**Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds** [[link](https://papers.nips.cc/paper/2020/hash/befe5b0172188ad14d48c3ebe9cf76bf-Abstract.html)] \
Hassan Hafez-Kolahi, Zeinab Golgooni, Shohreh Kasaei, Mahdieh Soleymani\
*NeurIPS, 2020*

**Variational Interaction Information Maximization for Cross-domain Disentanglement** [[link](https://papers.nips.cc/paper/2020/hash/fe663a72b27bdc613873fbbb512f6f67-Abstract.html)] \
HyeongJoo Hwang, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim\
*NeurIPS, 2020*

**Information theoretic limits of learning a sparse rule** [[link](https://papers.nips.cc/paper/2020/hash/713fd63d76c8a57b16fc433fb4ae718a-Abstract.html)] \
Clément Luneau, jean barbier, Nicolas Macris\
*NeurIPS, 2020*

**Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks** [[link](https://papers.nips.cc/paper/2020/hash/7b41bfa5085806dfa24b8c9de0ce567f-Abstract.html)] \
Ryo Karakida, Kazuki Osawa\
*NeurIPS, 2020*

🐣 **On Mutual Information Maximization for Representation Learning** [[link](https://openreview.net/forum?id=rkxoh24FPH)] \
Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic\
*ICLR, 2020*

🐣 **Understanding the Limitations of Variational Mutual Information Estimators** [[link](https://openreview.net/forum?id=B1x62TNtDS)] \
Jiaming Song, Stefano Ermon\
*ICLR, 2020*

**Expected Information Maximization: Using the I-Projection for Mixture Density Estimation** [[link](https://openreview.net/forum?id=ByglLlHFDS)] \
Philipp Becker, Oleg Arenz, Gerhard Neumann \
*ICLR, 2020*

**Mutual Information Gradient Estimation for Representation Learning** [[link](https://openreview.net/forum?id=ByxaUgrFvH)] \
Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu\
*ICLR, 2020*

**InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization** [[link](https://openreview.net/forum?id=r1lfF2NYvH)] \
Fan-Yun Sun, Jordan Hoffman, Vikas Verma, Jian Tang\
*ICLR, 2020*

**A Mutual Information Maximization Perspective of Language Representation Learning** [[link](https://openreview.net/forum?id=Syx79eBKwr)] \
Lingpeng Kong, Cyprien de Masson d'Autume, Lei Yu, Wang Ling, Zihang Dai, Dani Yogatama\
*ICLR, 2020*

**CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information** [[link](https://proceedings.mlr.press/v119/cheng20b.html)] [[code](https://github.com/Linear95/CLUB)] \
Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, Lawrence Carin\
*ICML, 2020*

**Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains** [[link](https://proceedings.mlr.press/v119/fischer20a.html)] [[code](https://github.com/johannes-fischer/icml2020_ipft)] \
Johannes Fischer, Ömer Sahin Tas\
*ICML, 2020*

**Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation** [[link](https://proceedings.mlr.press/v119/kleinegesse20a.html)] [[code](https://github.com/stevenkleinegesse/minebed)] \
Steven Kleinegesse, Michael U. Gutmann\
*ICML, 2020*

**FR-Train: A Mutual Information-Based Approach to Fair and Robust Training** [[link](https://proceedings.mlr.press/v119/roh20a.html)] [[code](https://github.com/yuji-roh/fr-train)] \
Yuji Roh, Kangwook Lee, Steven Whang, Changho Suh\
*ICML, 2020*

**Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information** [[link](https://proceedings.mlr.press/v119/stratos20a.html)] [[code](https://github.com/karlstratos/ammi)] \
Karl Stratos, Sam Wiseman\
*ICML, 2020*

**Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective** [[link](https://proceedings.mlr.press/v119/zhang20m.html)] \
Ruixiang Zhang, Masanori Koyama, Katsuhiko Ishiguro\
*ICML, 2020*

**Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization** [[link](https://proceedings.mlr.press/v119/zhu20e.html)] [[code](https://github.com/schzhu/learning-adversarially-robust-representations)] \
Sicheng Zhu, Xiao Zhang, David Evans\
*ICML, 2020*

**Usable Information and Evolution of Optimal Representations During Training** [[link](https://openreview.net/forum?id=p8agn6bmTbr)] \
Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan Kao\
*ICLR, 2021*

**Domain-Robust Visual Imitation Learning with Mutual Information Constraints** [[link](https://openreview.net/forum?id=QubpWYfdNry)] \
Edoardo Cetin, Oya Celiktutan\
*ICLR, 2021*

**Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning** [[link](https://openreview.net/forum?id=AICNpd8ke-m)] \
Kanil Patel, William H. Beluch, Bin Yang, Michael Pfeiffer, Dan Zhang\
*ICLR, 2021*

**Graph Information Bottleneck for Subgraph Recognition** [[link](https://openreview.net/forum?id=bM4Iqfg8M2k)] \
Junchi Yu, Tingyang Xu, Yu Rong, Yatao Bian, Junzhou Huang, Ran He\
*ICLR, 2021*

**InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective** [[link](https://openreview.net/forum?id=hpH98mK5Puk)]\
Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu\
*ICLR, 2021*

**Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information** [[link](https://icml.cc/Conferences/2021/Schedule?showEvent=10675)] [[slides](https://icml.cc/media/icml-2021/Slides/10675.pdf)] \
Willie Neiswanger, Ke Alexander Wang, Stefano Ermon\
*ICML, 2021*

**Decomposed Mutual Information Estimation for Contrastive Representation Learning** [[link](http://proceedings.mlr.press/v139/sordoni21a/sordoni21a.pdf)] \
Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Philip Bachman, Remi Tachet Des Combes\
*ICML, 2021*

**ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction** [[link](https://arxiv.org/abs/2105.10446)] [[code](https://github.com/ryanchankh/ReduNet)] \
Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma\
*Preprint, 2021*

**Intelligence, physics and information – the tradeoff between accuracy and simplicity in machine learning** [[link](https://arxiv.org/abs/2001.03780)] \
Tailin Wu\
*PhD Thesis, 2021*

**The Information Geometry of Unsupervised Reinforcement Learning** [[link](https://arxiv.org/abs/2110.02719v1)] \
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine\
*Preprint, 2021*

## 9. Citation
If you would like to cite this repository 🐣:
```tex
@misc{git2022ib,
title = {Awesome Information Bottleneck},
author = {Ziyu Ye},
howpublished = {\url{https://github.com/ZIYU-DEEP/Awesome-Information-Bottleneck}},
year = 2022}
```