{"id":16832384,"url":"https://github.com/yanndubs/lossyless","last_synced_at":"2025-03-17T04:32:33.563Z","repository":{"id":44649753,"uuid":"314359653","full_name":"YannDubs/lossyless","owner":"YannDubs","description":"Generic image compressor for machine learning. Pytorch code for our paper \"Lossy compression for lossless prediction\".","archived":false,"fork":false,"pushed_at":"2022-08-19T08:54:14.000Z","size":3426,"stargazers_count":117,"open_issues_count":1,"forks_count":9,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-02-27T18:01:03.135Z","etag":null,"topics":["compression","deep-learning","machine-learning","python","pytorch","self-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YannDubs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-19T20:15:06.000Z","updated_at":"2025-01-03T07:06:33.000Z","dependencies_parsed_at":"2022-07-13T10:00:28.936Z","dependency_job_id":null,"html_url":"https://github.com/YannDubs/lossyless","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YannDubs%2Flossyless","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YannDubs%2Flossyless/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YannDubs%2Flossyless/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YannDubs%2Flossyless/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YannDubs","download_url":"https://codeload.github.com/YannDubs/lossyless/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243841223,"owners_count":20356446,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","deep-learning","machine-learning","python","pytorch","self-supervised-learning"],"created_at":"2024-10-13T11:48:46.762Z","updated_at":"2025-03-17T04:32:33.079Z","avatar_url":"https://github.com/YannDubs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Lossy Compression for Lossless Prediction [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/YannDubs/lossyless/blob/main/LICENSE) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)\n\n**Using:** [![Using](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/lossyless/blob/main/notebooks/Hub.ipynb) \n\n**Training:** [![Training](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/lossyless/blob/main/notebooks/minimal_code.ipynb)\n\nThis repostiory contains our implementation of the paper: [**Lossy Compression for Lossless Prediction**](https://arxiv.org/abs/2106.10800). That formalizes and empirically inverstigates unsupervised training for task-specific compressors.\n\n\n## Using the compressor \n\n[![Using](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/lossyless/blob/main/notebooks/Hub.ipynb)\n\nIf you want to use our compressor directly the easiest is to use the model from torch hub as seen in the google colab (or `notebooks/Hub.ipynb`) or th example below.\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eInstallation details\u003c/b\u003e\u003c/summary\u003e\n\n  ```bash\n  pip install torch torchvision tqdm numpy compressai sklearn git+https://github.com/openai/CLIP.git\n  ```\n\n  Using pytorch`\u003e1.7.1` : CLIP forces pytorch version `1.7.1`, this is because it needs this version to use JIT. If you don't need JIT (no JIT by default) you can alctually use more recent versions of torch and torchvision `pip install -U torch torchvision`. Make sure to update after having isntalled CLIP.\n\n----------------------\n\u003c/details\u003e\n\n```python\nimport time\n\nimport torch\nfrom sklearn.svm import LinearSVC\nfrom torchvision.datasets import STL10\n\nDATA_DIR = \"data/\"\n\n# list available compressors. b01 compresses the most (b01 \u003e b005 \u003e b001)\ntorch.hub.list('YannDubs/lossyless:main') \n# ['clip_compressor_b001', 'clip_compressor_b005', 'clip_compressor_b01']\n\n# Load the desired compressor and transformation to apply to images (by default on GPU if available)\ncompressor, transform = torch.hub.load('YannDubs/lossyless:main','clip_compressor_b005')\n\n# Load some data to compress and apply transformation\nstl10_train = STL10(\n    DATA_DIR, download=True, split=\"train\", transform=transform\n)\nstl10_test = STL10(\n    DATA_DIR, download=True, split=\"test\", transform=transform\n)\n\n# Compresses the datasets and save them to file (this requires GPU)\n# Rate: 1506.50 bits/img | Encoding: 347.82 img/sec\ncompressor.compress_dataset(\n    stl10_train,\n    f\"{DATA_DIR}/stl10_train_Z.bin\",\n    label_file=f\"{DATA_DIR}/stl10_train_Y.npy\",\n)\ncompressor.compress_dataset(\n    stl10_test,\n    f\"{DATA_DIR}/stl10_test_Z.bin\",\n    label_file=f\"{DATA_DIR}/stl10_test_Y.npy\",\n)\n\n# Load and decompress the datasets from file the datasets (does not require GPU)\n# Decoding: 1062.38 img/sec\nZ_train, Y_train = compressor.decompress_dataset(\n    f\"{DATA_DIR}/stl10_train_Z.bin\", label_file=f\"{DATA_DIR}/stl10_train_Y.npy\"\n)\nZ_test, Y_test = compressor.decompress_dataset(\n    f\"{DATA_DIR}/stl10_test_Z.bin\", label_file=f\"{DATA_DIR}/stl10_test_Y.npy\"\n)\n\n# Downstream STL10 evaluation. Accuracy: 98.65% | Training time: 0.5 sec\nclf = LinearSVC(C=7e-3)\nstart = time.time()\nclf.fit(Z_train, Y_train)\ndelta_time = time.time() - start\nacc = clf.score(Z_test, Y_test)\nprint(\n    f\"Downstream STL10 accuracy: {acc*100:.2f}%.  \\t Training time: {delta_time:.1f} \"\n)\n```\n\n\n## Minimal training code\n\n[![Training](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/lossyless/blob/main/notebooks/minimal_code.ipynb)\n\nIf your goal is to look at a minimal version of the code to simply understand what is going on, I would highly recommend starting from `notebooks/minimal_compressor.ipynb` (or google colab link above). This is a notebook version of the code provided in Appendix E.7. of the paper, to quickly train and evaluate our compressor. \n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eInstallation details\u003c/b\u003e\u003c/summary\u003e\n\n  1. `pip install git+https://github.com/openai/CLIP.git`\n  2. `pip uninstall -y torchtext` (probably not necessary but can cause issues if got installed as wrong pytorch version)\n  3. `pip install scikit-learn==0.24.2 lightning-bolts==0.3.4 compressai==1.1.5 pytorch-lightning==1.3.8`\n\n  Using pytorch`\u003e1.7.1` : CLIP forces pytorch version `1.7.1` you should be able to use a more recent versions.  E.g.:\n  1. `pip install git+https://github.com/openai/CLIP.git`\n  2. `pip install -U torch torchvision scikit-learn lightning-bolts compressai pytorch-lightning`\n\u003c/details\u003e\n\n## Results from the paper\n\nWe provide scripts to essentially replicate some results from the paper. The exact results will be a little different as we simplified and cleaned some of the code to help readability. All scripts can be found in `bin` and run using the command `bin/*/\u003cexperiment\u003e.sh`.\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eInstallation details\u003c/b\u003e\u003c/summary\u003e\n\n1. Clone repository\n2. Install [PyTorch](https://pytorch.org/) \u003e=  1.7\n3. `pip install -r requirements.txt`\n\n### Other installation\n- For the bare minimum packages: use `pip install -r requirements_mini.txt` instead.\n- For conda: use  `conda env update --file requirements/environment.yaml`.\n- For docker: we provide a dockerfile at `requirements/Dockerfile`.\n\n### Notes \n\n- CLIP forces pytorch version `1.7.1`, this is because it needs this version to use JIT. We don't use JIT so you can alctually use more recent versions of torch and torchvision `pip install -U torch torchvision`.\n- For better logging: `hydra` and `pytorch lightning` logging don't work great together, to have a better logging experience you should comment out the folowing lines in `pytorch_lightning/__init__.py` :\n\n```python\nif not _root_logger.hasHandlers():\n     _logger.addHandler(logging.StreamHandler())\n     _logger.propagate = False\n```\n\n### Test installation\n\nTo test your installation and that everything works as desired you can run `bin/test.sh`, which will run an epoch of BICNE and VIC on MNIST.\n\n----------------------\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eScripts details\u003c/b\u003e\u003c/summary\u003e\n\nAll scripts can be found in `bin` and run using the command `bin/*/\u003cexperiment\u003e.sh`. This will save all results, checkpoints, logs... The most important results (including summary resutls and figures) will be saved at `results/exp_\u003cexperiment\u003e`. Most important are the summarized metrics `results/exp_\u003cexperiment\u003e*/summarized_metrics_merged.csv` and any figures `results/exp_\u003cexperiment\u003e*/*.png`.\n\nThe key experiments that that do not require very large compute are:\n- VIC/VAE on rotation invariant Banana distribution: `bin/banana/banana_viz_VIC.sh`\n- VIC/VAE on augmentation invariant MNIST: `bin/mnist/augmist_viz_VIC.sh`\n- CLIP experiments: `bin/clip/main_linear.sh`\n\nBy default all scripts will log results on [weights and biases](https://wandb.ai/site). If you have an account (or make one) you should set your username in `conf/user.yaml` after `wandb_entity:`, the passwod should be set directly in your environment variables. If you prefer not logging, you can use the command `bin/*/\u003cexperiment\u003e.sh -a logger=csv` which changes (`-a` is for append) the default `wandb` logger to a `csv` logger.\n\nGenerally speaking you can change any of the parameters either directly in `conf/**/\u003cfile\u003e.yaml` or by adding `-a` to the script. We are using [Hydra](https://hydra.cc/) to manage our configurations, refer to their documentation if something is unclear.\n\nIf you are using [Slurm](https://slurm.schedmd.com/documentation.html) you can submit directly the script on servers by adding a config file under `conf/slurm/\u003cmyserver\u003e.yaml`, and then running the script as `bin/*/\u003cexperiment\u003e.sh -s \u003cmyserver\u003e`. For example configurations files for slurm see `conf/slurm/vector.yaml` or `conf/slurm/learnfair.yaml`. For more information check the documentation from [submitit's plugin](https://hydra.cc/docs/plugins/submitit_launcher) which we are using.\n\n\n----------------------\n\n\u003c/details\u003e\n\n\n### VIC/VAE on rotation invariant Banana\n\nCommand: \n```bash\nbin/banana/banana_viz_VIC.sh\n``` \n\nThe following figures are saved automatically at `results/exp_banana_viz_VIC/**/quantization.png`. On the left we see the quantization of the Banana distribution by a standard compressor (called `VAE` in code but VC in paper). On the right, by our (rotation) invariant compressor (`VIC`).\n\n\n\u003cp float=\"left\" align=\"middle\"\u003e\n  \u003cimg src=\"/results/exp_banana_viz_VIC/datafeat_banana_rot/feat_neural_feat/dist_VAE/enc_mlp_fancy/rate_H_factorized/optfeat_Adam_lr3.0e-04_w0.0e+00/schedfeat_expdecay1000/zdim_2/zs_1/beta_7.0e-02/seed_123/addfeat_None/quantization.png\" width=\"47%\" alt=\"Standard compression of Banana\" /\u003e\n  \u003cimg src=\"/results/exp_banana_viz_VIC/datafeat_banana_rot/feat_neural_feat/dist_VIC/enc_mlp_fancy/rate_H_factorized/optfeat_Adam_lr3.0e-04_w0.0e+00/schedfeat_expdecay1000/zdim_2/zs_1/beta_7.0e-02/seed_123/addfeat_None/quantization.png\" width=\"47%\"  alt=\"Invariant compression of Banana\" /\u003e \n\u003c/p\u003e\n\n### VIC/VAE on augmentend MNIST\n\nCommand: \n```bash\nbin/banana/augmnist_viz_VIC.sh\n``` \n\nThe following figure is saved automatically at `results/exp_augmnist_viz_VIC/**/rec_imgs.png`. It shows source augmented MNIST images as well as the reconstructions using our invariant compressor.\n\n![Invariant compression of augmented MNIST](/results/exp_augmnist_viz_VIC/datafeat_mnist_aug/feat_neural_rec/dist_VIC/enc_resnet18/rate_H_hyper/optfeat_AdamW_lr1.0e-03_w1.0e-05/schedfeat_expdecay100/zdim_128/zs_1/beta_1.0e-01/seed_123/addfeat_None/rec_imgs.png\n)\n\n\n### CLIP compressor\n\n\nCommand: \n```bash\nbin/clip/main_small.sh\n``` \n\nThe following table comes directly from the results which are automatically saved at `results/exp_clip_bottleneck_linear_eval/**/datapred_*/**/results_predictor.csv`. It shows the result of compression from our CLIP compressor on many datasets.\n\n|               | Cars196 | STL10 | Caltech101 | Food101 | PCam | Pets37 | CIFAR10 | CIFAR100 |\n|---------------|:-------:|:-----:|:----------:|:-------:|:----:|:------:|:-------:|:--------:|\n| Rate [bits]   |   1471  |  1342 |    1340    |   1266  | 1491 |  1209  |   1407  |   1413   |\n| Test Acc. [%] |   80.3  |  98.5 |    93.3    |   83.8  | 81.1 |  88.8  |   94.6  |   79.0   |\n\nNote: ImageNet is too large for training a SVM using SKlearn. You need to run MLP evaluation with `bin/clip/clip_bottleneck_mlp_eval`. Also you have to download ImageNet manually.\n\n\n## Cite\n\nYou can read the full paper [here](https://arxiv.org/abs/2106.10800). Please cite our paper if you use our model:\n\n```bibtex\n@inproceedings{\n    dubois2021lossy,\n    title={Lossy Compression for Lossless Prediction},\n    author={Yann Dubois and Benjamin Bloem-Reddy and Karen Ullrich and Chris J. Maddison},\n    booktitle={Advances in Neural Information Processing Systems (NeurIPS)},\n    year={2021}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyanndubs%2Flossyless","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyanndubs%2Flossyless","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyanndubs%2Flossyless/lists"}