https://github.com/TimDettmers/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
https://github.com/TimDettmers/bitsandbytes
llm machine-learning pytorch qlora quantization
Last synced: 26 days ago
JSON representation
Accessible large language models via k-bit quantization for PyTorch.
- Host: GitHub
- URL: https://github.com/TimDettmers/bitsandbytes
- Owner: bitsandbytes-foundation
- License: mit
- Created: 2021-06-04T00:10:34.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2025-03-14T21:21:11.000Z (about 1 month ago)
- Last Synced: 2025-03-16T23:48:49.176Z (about 1 month ago)
- Topics: llm, machine-learning, pytorch, qlora, quantization
- Language: Python
- Homepage: https://huggingface.co/docs/bitsandbytes/main/en/index
- Size: 2.73 MB
- Stars: 6,806
- Watchers: 51
- Forks: 675
- Open Issues: 196
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- Awesome-LLM-Compression - [Code
README
# `bitsandbytes`
[](https://pepy.tech/project/bitsandbytes) [](https://pepy.tech/project/bitsandbytes) [](https://pepy.tech/project/bitsandbytes)
The `bitsandbytes` library is a lightweight Python wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and 8 & 4-bit quantization functions.
The library includes quantization primitives for 8-bit & 4-bit operations, through `bitsandbytes.nn.Linear8bitLt` and `bitsandbytes.nn.Linear4bit` and 8-bit optimizers through `bitsandbytes.optim` module.
There are ongoing efforts to support further hardware backends, i.e. Intel CPU + GPU, AMD GPU, Apple Silicon, hopefully NPU.
**Please head to the official documentation page:**
**[https://huggingface.co/docs/bitsandbytes/main](https://huggingface.co/docs/bitsandbytes/main)**
## License
`bitsandbytes` is MIT licensed.
We thank Fabio Cannizzo for his work on [FastBinarySearch](https://github.com/fabiocannizzo/FastBinarySearch) which we use for CPU quantization.