{"id":19646655,"url":"https://github.com/amzn/sparse-vqvae","last_synced_at":"2025-04-28T15:30:53.565Z","repository":{"id":48146205,"uuid":"313583267","full_name":"amzn/sparse-vqvae","owner":"amzn","description":"Experimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper","archived":false,"fork":false,"pushed_at":"2023-10-27T21:04:17.000Z","size":66,"stargazers_count":34,"open_issues_count":0,"forks_count":14,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-05T09:11:37.681Z","etag":null,"topics":["sparse-coding","vqvae"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amzn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-17T10:26:05.000Z","updated_at":"2025-03-14T02:17:42.000Z","dependencies_parsed_at":"2022-08-26T09:11:17.260Z","dependency_job_id":null,"html_url":"https://github.com/amzn/sparse-vqvae","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amzn%2Fsparse-vqvae","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amzn%2Fsparse-vqvae/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amzn%2Fsparse-vqvae/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amzn%2Fsparse-vqvae/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amzn","download_url":"https://codeload.github.com/amzn/sparse-vqvae/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251338570,"owners_count":21573580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["sparse-coding","vqvae"],"created_at":"2024-11-11T14:39:40.152Z","updated_at":"2025-04-28T15:30:53.241Z","avatar_url":"https://github.com/amzn.png","language":"Python","readme":"# SparseVQVAE: Sparse Dictionary based Vector Quantized Variational AutoEncoder\nExperimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper\n(see: [Generating Diverse High-Fidelity Images with VQ-VAE-2](https://arxiv.org/abs/1906.00446))\n\nThis repository builds over PyTorch.\n\n## Authors\nYiftach Ginger ([iftachg](https://github.com/iftachg)), Or Perel ([orperel](https://github.com/orperel)), Roee Litman ([rlit](https://github.com/rlit))\n\n## Introduction\n\nVQ-VAE is a promising direction for image synthesis, that is completely separate from the GAN line of works.\nThe main idea of this codebase is to create a generalized VQ-VAE,\nby replacing the hard selection in the heart of the method to a softer selection by sparse coding.\nThis stems from the observation that hard selection is in essence the “sparsest code”,\ni.e. single non-zero element (or- one hot vector).\n\nIn this generalized implementation, we allow the VAE to code each patch with a small set sparse dictionary atoms,\nrather than a single code as done in the original work.\nWe therefore build over the VQVAE2 paper:\n 1. We perform sparse dictionary learning, to generate a set of atoms best describing the data.\n 2. During training, new images are encoded per patch, where each patch is encoded by a small set of atoms.\n 3. We then decode the image back from sparse-codes to pixel space using a learned encoder.\n \n During inference time images may get compressed by employing both encoder \u0026 decoder.\n Alternatively, new images can be synthesized by randomizing sparse codes and employing only the decoder.  \n\nWe summarize the main contributions of this repository as follows:\n1. Sparse dictionary over Pytorch:\n    - Sparse dictionary is learned via [Task-Driven Dictionary Learning][1], implemented to be compatible with PyTorch's auto-differentiation.\n    - Fast parallel implementations of the [FISTA][2] and [OMP][3] sparse-coding algorithms.\n2. A complete sparse-dictionary empowered VQ-VAE2 implementation, including training \u0026 evaluation code.   \n\n[1]: https://arxiv.org/abs/1009.5358\n[2]: https://people.rennes.inria.fr/Cedric.Herzet/Cedric.Herzet/Sparse_Seminar/Entrees/2012/11/12_A_Fast_Iterative_Shrinkage-Thresholding_Algorithmfor_Linear_Inverse_Problems_(A._Beck,_M._Teboulle)_files/Breck_2009.pdf\n[3]: http://www.cs.technion.ac.il/~ronrubin/Publications/KSVD-OMP-v2.pdf\n\n## Dictionary Learning\n\nThis sparse coding problem involves integer programming over a non-covex L0 norm, and therefore is NP-hard.\nIn practice, the solution is approximated using pursuit algorithms, where the atoms \"compete\" over which get to describe the input signal.\nGenerally speaking, there are two flavours of pursuit algorithms: greedy and convex-relaxation.\nWe provide one example from each family\n\n\n#### OMP\n\nThis method approximates the exact L0 norm solution in a greedy manner, selecting the next atom with the smallest (angular) residual w.r.t. the current code. \nThe benefit here is that we have a guaranteed number of `K` non-zer0 elements after `K` iterations.\nOn the other hand, the selection process makes the process itself less suitable for differentiable programming (aka back-prop).\n\n#### FISTA\n\nHere, the L0 is relaxed to its nearest convex counterpart, the L1 norm which is treated as an additive penalty.\nThe resulting LASSO problem is a convex one, and has several efficient methods to solve efficiently.\nThe iterative nature of this methos allows unrolling its structure and approximating it using a neural net (see [LISTA](http://yann.lecun.com/exdb/publis/pdf/gregor-icml-10.pdf))\nThe drawback here is that the resulting code can have arbitrary number of non-zero elements after a fixed number of iterations.\n\n#### Task-Driven Dictionary Learning\n\nWithout going into too many details, this paper proposes a way to calculate the derivative of the spase coding problem with respect to the dictionary.\nThis is opens the way for a bi-level optimisation procedure, where we optimize the result of an optimization process. \nUsing this method we can create a dictionary optimized for any task, specifically the one our vq-vae is meant to solve. \n\n## Applications\n\n#### Compression\n#### Synthesis\n\n## Limitations\n\nThis reporistory contains research materials of an unpublished work.\nTraining + Inference code based on FISTA and OMP over PyTorch is fully functional for compression use cases.\nPixelSnail synthesis functionality is partially supported.\n\n## Installation\n\nCode assumes Linux environment (tested on Ubuntu 16).\n\n### Prerequisites\n\n* python \u003e= 3.6\n* pytorch \u003e= 1.4\n* cuda 10 or higher (recommended)\n\n### Environment Setup\n\nAfter cloning this repository:\n\u003e cd Sparse_VAE\n\n\u003e pip install -r requirements.txt\n\n## Project Structure\n\nFISTAFunction and OMPFunction are fast GPU implementations of both algorithms over PyTorch.\nPractitioners are welcome to incorporate these functions into their repositories under the license terms of this repository.\n\nThe rest of the project structure can be briefly described as such:\n\n* checkpoint/\n    * MODELS SAVED HERE for vanilla and pixelsnail (both), as well as args used to generate them.\n* models/\n    * fista_pixelsnail - the implementation of the modifier pixelsnail based on FISTA\n    * model_utils - contains functions for genrating VQVAE objects and loading datasets (CIFAR, imagenet..). All files are downloaded relative to the project path.\n    * pixelsnail - the original pixelsnail model. fista_pixelsnail overrides this model and adds additional heads. The vanilla model generates this model twice (top and bottom)\n    * quantizers - Contains only the stuff that generates quantized codes: FISTA, OMP and Vanilla VQVAE quantization.\n    * vqvae - composed of Encoder / Decoder. Of interest here: we can change the stride to achieve different effects (0 - for decompression ; 1- for vqvae vanilla ; 2 - for compression). The stride should change for both Encoder / Decoder\n* scripts/\n    * calculate_jpg_psnr - a standalone script, accepts a dataset (hardcoded cifar) and runs compression for multiple quality levels. Outputs the psnr..\n    * calculate_model_psnr - similar to the above, only this one receives a model as input and prints it’s compression psnr. Note we have our own manual calculation of PSNR here. FISTA converges for multiple images at the same time, so the slowest image in the batch determines the bottleneck speed. If we run with batch size 1 we’re faster and more accurate.\n    * extract_dataset_unlearned_encodings - skip that (was used for experiments on alpha).\n    * graph_psnr - takes the PSNR tables we’ve created and generates plots.\n    * hyperparameter_alpha_search - convergence of alpha related to amount of nonzeros - calculated twice for random data and second time for the script we’ve just skipped. Most probably we shouldn’t be touching this script..\n    * visualize_encodings - a visualization script Yiftach have created for himself. Here we take a model and a dataset, run the model over the dataset and save the output image, to test it’s still valid. If all goes well we shouldn’t be using this file..\n* utils/\n    * pyfista is implemented here, both. dictionary learning and \n    * pyfista_test - generates fake data to train sparse coding.. We don’t do hyperparams search anymore so we have no additiona uses for this file.\n    * pyomp - Holds implementation of forwards for OMP for a single sample at a time (TODO: implement batch OMP if we want).\n    * util_funcs - lots of helper functions are stored here. Argument parsers are handled here, as well as seeding and experiments setup (general stuff like assigning an experiment name..)\n* dataset - all definitions for used datasets. These are definitions for datasets but there is nothing to configure here.\n* extract_code - main for extract_code (2nd step in the algorithm training..)\n* mt_sample - multi threaded sampling.. Currently broken. \n* sample - receives a PixelSnail and starts generating images..\n* scheduler - Scheduling definitions for number of schedulers, when to save a checkpoint file.. etc..\n* train_fista_pixelsnail / train_pixelsnail / train_vqvae - all neural net trainers we support..\n* scheduler - Scheduling definitions for number of schedulers, when to save a checkpoint file.. Etc\n\n## Usage\n\n1. Training the Sparse-VQVAE encoder-decoder:\n\nCurrently this codebase supports CIFAR10, CIFAR100, and ImageNet.\n\n* Train with original VQVAE:\n```\ntrain_vqvae.py --experiment_name=\"experiment_vq\" --selection_fn=vanilla \n```\n\n* Train with FISTA sparse-coding:\n\n```\ntrain_vqvae.py --experiment_name=\"experiment_fista\" --selection_fn=fista \n```\n\n* Train with OMP sparse-coding:\n\n```\ntrain_vqvae.py --experiment_name=\"experiment_omp\" --selection_fn=fista --num_strides=2\n```\n\n\nFor synthesis, additional steps are required:\n\n2. Extract codes for stage 2 training\n\n\u003e python extract_code.py --ckpt checkpoint/[VQ-VAE CHECKPOINT] --name [LMDB NAME] [DATASET PATH]\n\n\n3. Stage 2 (PixelSNAIL)\n\n\u003e python train_pixelsnail.py [LMDB NAME]\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famzn%2Fsparse-vqvae","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famzn%2Fsparse-vqvae","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famzn%2Fsparse-vqvae/lists"}