{"id":24098471,"url":"https://github.com/matlab-deep-learning/quantization-aware-training","last_synced_at":"2025-02-27T15:51:08.505Z","repository":{"id":118345455,"uuid":"607299334","full_name":"matlab-deep-learning/quantization-aware-training","owner":"matlab-deep-learning","description":"This example shows how to perform quantization aware training for transfer learned MobileNet-v2 network.","archived":false,"fork":false,"pushed_at":"2023-12-19T22:21:47.000Z","size":3881,"stargazers_count":10,"open_issues_count":0,"forks_count":1,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-01-10T14:45:48.698Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matlab-deep-learning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null}},"created_at":"2023-02-27T18:03:24.000Z","updated_at":"2024-11-10T11:15:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"38bbac11-9da3-42d6-a818-2f5919a3103d","html_url":"https://github.com/matlab-deep-learning/quantization-aware-training","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matlab-deep-learning%2Fquantization-aware-training","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matlab-deep-learning%2Fquantization-aware-training/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matlab-deep-learning%2Fquantization-aware-training/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matlab-deep-learning%2Fquantization-aware-training/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matlab-deep-learning","download_url":"https://codeload.github.com/matlab-deep-learning/quantization-aware-training/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241031450,"owners_count":19897293,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-10T14:46:11.770Z","updated_at":"2025-02-27T15:51:08.470Z","avatar_url":"https://github.com/matlab-deep-learning.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Quantization Aware Training with MobileNet-v2\n\n[![Open in MATLAB Online](https://www.mathworks.com/images/responsive/global/open-in-matlab-online.svg)](https://matlab.mathworks.com/open/github/v1?repo=matlab-deep-learning/quantization-aware-training\u0026file=QuantizationAwareTrainingWithMobilenetv2.mlx)\n[![View Quantization Aware Training with MobileNet-v2 on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://www.mathworks.com/matlabcentral/fileexchange/125420-quantization-aware-training-with-mobilenet-v2)\n\nThis example shows how to perform quantization aware training as a way to prepare a network for quantization. Quantization aware training is a method that can help recover accuracy lost due to quantizing a network to use 8-bit scaled integer weights and biases. Networks like MobileNet-v2 are especially sensitive to quantization due to the significant variation in range of values of the weight tensor of the convolution and grouped convolution layers.\n\nThis example shows how pre-processing a network with quantization aware training can produce a quantized network with accuracy on par with the original unquantized network. Note that the values in this table may differ slightly.\n\n| Network      | Accuracy |\n| ----------- | ----------- |\n| Original network      | **0.9101**       |\n| `int8` network via post-training quantization   | 0.2452        |\n| `int8` network via quantization aware training   | **0.8937**        |\n\n## **Running the Example**\n\nOpen and run the live script `QuantizationAwareTrainingWithMobilenetv2.mlx`.\n\nAdditional files:\n\n- `QuantizedConvolutionBatchNormTrainingLayer`: Custom layer that implements quantization aware fused convolution-batch normalization layer.\n- `QuantizedConvolutionTrainingLayer`: Custom layer unused in this example but can be applied to networks with convolution layers without batch normalization.\n- `IdentityTrainingLayer`: No-op layer that acts as a placeholder for batch normalization layers.\n- `quantizeToFloat`: Function to quantize the values to a floating point representation.\n- `bypassdlgradients`: Function to perform straight through estimation for a given operation. The source of this function is obfuscated because of the use of internal packages.\n- `foldBatchNormalizationParameters`: function to calculate the adjusted weights and bias for dlnetwork that contains a convolution layer followed by a batch normalization layer. The source of this function is obfuscated because of the use of internal packages.\n- `CustomStraightThroughEstimator`: helper class used by `bypassdlgradients` and should not be used directly.\n\n### Requirements\n\n- [MATLAB \u0026reg;](https://www.mathworks.com/products/matlab.html) version R2022b or later\n- [Deep Learning Toolbox  \u0026trade;](https://www.mathworks.com/products/deep-learning.html)\n- [Deep Learning Toolbox Model Quantization Library](https://www.mathworks.com/matlabcentral/fileexchange/74614-deep-learning-toolbox-model-quantization-library)\n\n## About Quantization Aware Training\n\nThis example focuses on the steps of a quantization workflow:\n\n- Replace quantizable layers in a floating-point network with quantization aware training layers.\n- Train with the quantizable training layers until reaching convergence.\n- Replace the quantizable training layers back with the original layers with updated learnables more robust to quantization.\n- Perform post-training quantization on this network to produce a quantized int8 network.\n\n![Quantization Aware Workflow Steps](./images/qat_workflow.png)\n\nDuring training, the quantization aware convolution layers quantized the weights and activations of the layer at each forward pass. The function, `quantizeToFloat` is used to quantize the values to a floating point representation using `single` type. This operation is akin to quantizing a value to integer and then immediately rescaling the value back to the real-world representation.\n\nAs an example, `quantizeToFloat`  would take an input value `365.247` and calculates a scaling factor that is used to scale the value to an integer representation of `91`. The integer value of `91` is then rescaled back to `364` introducing a absolute error of `-1.247`.\n\n$$\n\\begin{align}\n\\hat x \u0026=  quantizeToFloat\\left(\\mathrm{𝑥}\\right) \\\\\n\\ \u0026= \\mathrm{unquantize}\\left(\\mathrm{quantize}\\left(\\mathrm{𝑥}\\right)\\right) \\\\\n\\ \u0026= \\mathrm{rescale}\\cdot \\mathrm{saturate}\\left(\\mathrm{round}\\left(\\frac{\\mathrm{𝑥}}{\\mathrm{scale}}\\right)\\right)\n\\end{align}\n$$\n\nThe quantization step uses a non-differentiable operation `round` that would normally break the training workflow by zeroing out the gradients. During quantization aware training, bypass the gradient calculations for non-differentiable operations using an identity function. The diagram below \\[2\\] shows how the custom layer calculates the gradients for non-differentiable operations with the identity function via straight-through estimation.\n\n![Straight Through Estimation](./images/ste.png)\n\nAfter training, the network returned from the `trainNetwork` function still has the quantization aware training layers. Replace the quantization aware training operators with operators that are specific to inference. Whereas the training graph operates on pseudo-quantized 32-bit floating-point values, in the inference graph, the network applies the convolution using `int8` inputs and weights.\n\n| Conovolution Operation Graph at Training   | Convolution Operation Graph at Inference |\n| ----------- | ----------- |\n| ![Quantized operators during training](./images/quantized_training.png)   | ![Quantized operators during inference](./images/quantized_inference.png)|\n\n## **References**\n\n1. The TensorFlow Team. Flowers [http://download.tensorflow.org/example_images/flower_photos.tgz](http://download.tensorflow.org/example_images/flower_photos.tgz)\n2. Gholami, A., Kim, S., Dong, Z., Mahoney, M., \u0026 Keutzer, K. (2021). A Survey of Quantization Methods for Efficient Neural Network Inference. Retrieved from [https://arxiv.org/abs/2103.13630](https://arxiv.org/abs/2103.13630)\n3. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., \u0026 Kalenichenko, D. (2017). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. Retrieved from [https://arxiv.org/abs/1712.05877](https://arxiv.org/abs/1712.05877)\n\nCopyright 2023 The MathWorks, Inc.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatlab-deep-learning%2Fquantization-aware-training","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatlab-deep-learning%2Fquantization-aware-training","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatlab-deep-learning%2Fquantization-aware-training/lists"}