https://github.com/cbg-ethz/bmi
Mutual information estimators and benchmark
https://github.com/cbg-ethz/bmi
benchmark estimator mutual-information python
Last synced: 6 months ago
JSON representation
Mutual information estimators and benchmark
- Host: GitHub
- URL: https://github.com/cbg-ethz/bmi
- Owner: cbg-ethz
- License: mit
- Created: 2022-08-21T08:48:03.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-01-18T21:09:17.000Z (9 months ago)
- Last Synced: 2025-04-02T06:54:03.130Z (6 months ago)
- Topics: benchmark, estimator, mutual-information, python
- Language: Python
- Homepage: https://cbg-ethz.github.io/bmi/
- Size: 1.09 MB
- Stars: 46
- Watchers: 4
- Forks: 6
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://openreview.net/forum?id=LdflD41Gn8)
[](https://neurips.cc/virtual/2023/poster/72978)
[](https://www.repostatus.org/#active)
[](https://pypi.org/project/benchmark-mi/)
[](https://github.com/cbg-ethz/bmi/actions/workflows/build.yml)
[](https://github.com/psf/black)# Benchmarking Mutual Information
BMI is the package for estimation of mutual information between continuous random variables and testing new estimators.
- **Documentation**: [https://cbg-ethz.github.io/bmi/](https://cbg-ethz.github.io/bmi/)
- **Source code**: [https://github.com/cbg-ethz/bmi](https://github.com/cbg-ethz/bmi)
- **Bug reports**: [https://github.com/cbg-ethz/bmi/issues](https://github.com/cbg-ethz/bmi/issues)
- **PyPI package**: [https://pypi.org/project/benchmark-mi](https://pypi.org/project/benchmark-mi)## Getting started
While we recommend taking a look at the [documentation](https://cbg-ethz.github.io/bmi/) to learn about full package capabilities, below we present the main capabilities of the Python package.
(Note that BMI can also be used to test non-Python mutual information estimators.)You can install the package using:
```bash
$ pip install benchmark-mi
```Alternatively, you can use the development version from source using:
```bash
$ pip install "bmi @ https://github.com/cbg-ethz/bmi"
```Note: BMI uses [JAX](https://github.com/google/jax) and by default installs the CPU version of it.
If you have a device supporting CUDA, you can [install the CUDA version of JAX](https://github.com/google/jax#pip-installation-gpu-cuda-installed-via-pip-easier).Now let's take one of the predefined distributions included in the benchmark (named "tasks") and sample 1,000 data points.
Then, we will run two estimators on this task.```python
import bmitask = bmi.benchmark.BENCHMARK_TASKS['1v1-normal-0.75']
print(f"Task {task.name} with dimensions {task.dim_x} and {task.dim_y}")
print(f"Ground truth mutual information: {task.mutual_information:.2f}")X, Y = task.sample(1000, seed=42)
cca = bmi.estimators.CCAMutualInformationEstimator()
print(f"Estimate by CCA: {cca.estimate(X, Y):.2f}")ksg = bmi.estimators.KSGEnsembleFirstEstimator(neighborhoods=(5,))
print(f"Estimate by KSG: {ksg.estimate(X, Y):.2f}")
```## Evaluating a new estimator
The above code snippet may be convenient for estimating mutual information on a given data set or for the development of a new mutual information estimator.
However, for extensive benchmarking it may be more convenient to use one of the benchmark suites available in the `workflows/benchmark/` subdirectory.For example, you can install [Snakemake](https://snakemake.readthedocs.io/en/stable/) and run a small benchmark suite on several estimators using:
```bash
$ snakemake -c4 -s workflows/benchmark/demo/run.smk
```In about a minute it should generate minibenchmark results in the `generated/benchmark/demo` directory. Note that the configuration file, `workflows/benchmark/demo/config.py`, explicitly defines the estimators and tasks used, as well as the number of samples.
Hence, it is easy to benchmark a custom estimator by importing it and including it in the configuration dictionary.
More information is available [here](https://cbg-ethz.github.io/bmi/benchmarking-new-estimator/), where we cover evaluating new Python as well as non-Python estimators.Similarly, it is easy to change the number of samples or adjust the tasks included in the benchmark.
We defined several benchmark suites with shared structure.## List of implemented estimators
(Your estimator can be here too! Please, reach out to us if you would like to contribute.)
- The neighborhood-based KSG estimator proposed in [Estimating Mutual Information](https://arxiv.org/abs/cond-mat/0305641) by Kraskov et al. (2003).
- Donsker-Varadhan and MINE estimators proposed in [MINE: Mutual Information Neural Estimation](https://arxiv.org/abs/1801.04062) by Belghazi et al. (2018).
- InfoNCE estimator proposed in [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/abs/1807.03748) by Oord et al. (2018).
- NWJ estimator proposed in [Estimating divergence functionals and the likelihood ratio by convex risk minimization](https://arxiv.org/abs/0809.0853) by Nguyen et al. (2008).
- Estimator based on canonical correlation analysis described in [Feature discovery under contextual supervision using mutual information](https://ieeexplore.ieee.org/document/227286) by Kay (1992) and in [Some data analyses using mutual information](https://www.jstor.org/stable/43601047) by Brillinger (2004).## References
### ✨ New! ✨ On the properties and estimation of pointwise mutual information profiles
[](https://openreview.net/forum?id=LdflD41Gn8)
[](https://openreview.net/pdf?id=LdflD41Gn8)In this manuscript we discuss the *pointwise mutual information profile*, an invariant which can be used to diagnose limitations of the previous mutual information benchmark, and a flexible distribution family of *Bend and Mix Models*. These distributions can be used to create *more expressive benchmark tasks* and provide *model-based Bayesian estimates* of mutual information.
Workflows:
- To run the updated version of the benchmark, using Bend and Mix Models, see [`workflows/benchmark/v2`](./workflows/benchmark/v2/).
- To reproduce the experimental results from the manuscript, see [`workflows/projects/Mixtures`](./workflows/projects/Mixtures/).```
@article{
pmi-profiles-2025,
title={On the Properties and Estimation of Pointwise Mutual Information Profiles},
author={Czy{\.z}, Pawe{\l} and Grabowski, Frederic and Vogt, Julia and Beerenwinkel, Niko and Marx, Alexander},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=LdflD41Gn8},
note={}
}
```### Beyond normal: On the evaluation of the mutual information estimators
[](https://neurips.cc/virtual/2023/poster/72978)
[](https://papers.nips.cc/paper_files/paper/2023/hash/36b80eae70ff629d667f210e13497edf-Abstract-Conference.html)In this manuscript we discuss a benchmark for mutual information estimators.
Workflows:
- To run the benchmark, see [`workflows/benchmark/v1`](./workflows/benchmark/v1/).
- To reproduce the experimental results from the manuscript, see [`workflows/projects/Beyond_Normal`](./workflows/projects/Beyond_Normal/).```
@inproceedings{beyond-normal-2023,
title = {Beyond Normal: On the Evaluation of Mutual Information Estimators},
author = {Czy\.{z}, Pawe{\l} and Grabowski, Frederic and Vogt, Julia and Beerenwinkel, Niko and Marx, Alexander},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
pages = {16957--16990},
publisher = {Curran Associates, Inc.},
url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/36b80eae70ff629d667f210e13497edf-Paper-Conference.pdf},
volume = {36},
year = {2023}
}
```