https://github.com/oelin/ar-mnist
An autoregressive model for compressing MNIST digits.
https://github.com/oelin/ar-mnist
data-compression data-science machine-learning mnist
Last synced: about 1 year ago
JSON representation
An autoregressive model for compressing MNIST digits.
- Host: GitHub
- URL: https://github.com/oelin/ar-mnist
- Owner: oelin
- License: mit
- Created: 2023-01-09T17:01:42.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-19T17:45:46.000Z (about 3 years ago)
- Last Synced: 2023-02-27T20:20:34.915Z (about 3 years ago)
- Topics: data-compression, data-science, machine-learning, mnist
- Language: Jupyter Notebook
- Homepage:
- Size: 104 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AR MNIST
AR MNIST (autoregressive MNIST) is a simple autoregressive model for compressing MNIST handwritten digits. The model is mostly intended as an educational tool to demonstrate the close relationship between machine learning and data compression.

## Overview
Autoregressive models are statistical models which predict future data from past data. Most generally, they aim to approximate the distribution $p(x_n|x_1,\dots,x_{n-1})$ for some sample $x_1,\dots,x_n$. However using arbitrarily long contexts is usually impossible due to computational limitations. As such, most models operate on a fixed (or bounded) length context $x_{n-1-k},\dots,x_k$ where $k$ is the context (or history) length. Examples of autoregressive models include GPT3 and Facebook's Phrophet. Many autoregressive models are also generative models such that their predictions can be used recursively to sample from the approximate distribution $\hat p$.
To compress data using autoregressive models is rather straightforward. All that's needed is to entropy code the data using $\hat p$ as likelihood model. If the KL divergence between $\hat p$ and the true distribution $p$ is small, then this will result in very good compression. In the limit, this will result in **optimal** compression, according to Shannon's source coding theorem.
Hence, our aim will be to develop models which accurately approximate $p$ for images. Note that the value of pixel $x_n$ is not *entirely determined* by all previous pixel, so we shouldn't expect *any* autoregressive model to achieve perfect accuracy. Nonetheless, there's enough mutual information given from previous pixels to achieve considerable compression.