https://github.com/es7/quantization-in-ml
Quantizing LLMs for utlizing its power in efficient way
https://github.com/es7/quantization-in-ml
artificial-intelligence generative llms quantization
Last synced: 11 months ago
JSON representation
Quantizing LLMs for utlizing its power in efficient way
- Host: GitHub
- URL: https://github.com/es7/quantization-in-ml
- Owner: ES7
- License: mit
- Created: 2024-04-29T07:56:44.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-16T18:37:24.000Z (about 2 years ago)
- Last Synced: 2025-01-11T08:17:17.916Z (over 1 year ago)
- Topics: artificial-intelligence, generative, llms, quantization
- Language: Jupyter Notebook
- Homepage:
- Size: 4.51 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Quantization-in-ML
Large generative AI models like LLMs can be so huge that they are hard to run on consumer grade hardware. Quantization has emerged as a key tool for making this possible. How to decide whether we should use int8 or float16 to compress the model? AI models are getting bigger and bigger, so quantization has been recently exciting for the AI community because it enables us to shrink models to a small size, so that anyone can run it with their own computer with little to no performance degradation.
**1. `Data_Types_&_Sizes.ipynb`:** In this notebook I have explained the significance of different data types and how to convert parameters from one data type to another.
**2. `Loading_Models_in_different_dtypes.ipynb`:** In this notebook I have explained how we can load models in different data types to save memory and what are its effects on the model's performace.
**3. `Quantization_Theory.ipynb`:** In this notebook I have explained how to apply linear quantization to any model and the maths behind it.
**4. `Linear_Quantization.ipynb`:** In this notebook I have explained linear quantization in depth.
**5. `Scale_&_Zero_Point.ipynb`:** In this notebook I have explained how you can get the value of scale and zero point.
**6. `Symmetric_VS_Asymmetric.ipynb`:** In this notebook I have explained the two modes of linear quantization: symmetric and asymmetric.
**7. `Per_Channel_&_Per_Group_Quantization.ipynb`:** In this notebook I have explained how to perform per channel and per group quantization also how to quantize weights and activations.
**8. `Build_Custom_8-Bit_Quantizer.ipynb`:** In this notebook I have explained how to build our own quantizer pipeline that can quantize any model in 8-bit precision via per channel quantization.
**9. `Quantize_HF_Models.ipynb`:** In this notebook I have explained how we can quantize any open source PyTorch based models.
**10. `Load_Quantized_Weights_from_HF_Hub.ipynb`:** In this notebook I have explained how we directly load the quantize model from the Hugging Face Hub.
**11. `Packing_Unpacking_Weights.ipynb`:** In this notebook I have explained how we pack and unpack weights which is useful when we can store a parameter in 2-bit precision but PyTorch doesn't support it.