{"id":13789284,"url":"https://github.com/GreenBitAI/bitorch-engine","last_synced_at":"2025-05-12T05:32:09.038Z","repository":{"id":235254105,"uuid":"774980948","full_name":"GreenBitAI/bitorch-engine","owner":"GreenBitAI","description":"A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.","archived":false,"fork":false,"pushed_at":"2024-06-25T16:50:22.000Z","size":4182,"stargazers_count":28,"open_issues_count":1,"forks_count":5,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-11-18T03:36:55.476Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://greenbitai.github.io/bitorch-engine/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GreenBitAI.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-20T14:51:16.000Z","updated_at":"2024-10-26T13:44:06.000Z","dependencies_parsed_at":"2024-04-22T20:57:28.878Z","dependency_job_id":"3fc942d8-c704-4802-844d-5a1f8ba23cb7","html_url":"https://github.com/GreenBitAI/bitorch-engine","commit_stats":null,"previous_names":["greenbitai/bitorch-engine"],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GreenBitAI%2Fbitorch-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GreenBitAI%2Fbitorch-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GreenBitAI%2Fbitorch-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GreenBitAI%2Fbitorch-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GreenBitAI","download_url":"https://codeload.github.com/GreenBitAI/bitorch-engine/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253682539,"owners_count":21946957,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T21:01:01.088Z","updated_at":"2025-05-12T05:32:07.735Z","avatar_url":"https://github.com/GreenBitAI.png","language":"Python","readme":"# BITorch Engine (BIE)\n\nBitorch Engine is a cutting-edge computation library for neural networks that enhances PyTorch by integrating specialized\nlayers and functions tailored for **Low-Bit** quantized neural network operations.\nIt harnesses the robust capabilities of high-performance computing platforms, including GPUs and CPUs,\nand is designed with future adaptability in mind to extend support to emerging NPU hardware technologies.\n\n## More about BIE\n\nBitorch Engine offers a suite of optimized neural network components that are designed to leverage the full power of modern GPUs.\nThis includes custom CUDA kernels, quantization-aware training mechanisms, and a variety of layer types\nthat are specifically crafted to reduce computational overhead while maintaining high precision and accuracy in deep learning models.\n\nBuilding on the foundational strengths of Bitorch Engine, the technology has been employed in pioneering projects that\npush the boundaries of neural network training and inference.\nFor instance:\n\n- [green-bit-llm-trainer](https://github.com/GreenBitAI/green-bit-llm/tree/main/green_bit_llm/sft): In this project, BIE represents a significant leap in the field of Large Language Model (LLM) fine-tuning. Unlike traditional approaches that either quantize a fully trained model or introduce a few additional trainable parameters for [LoRA](https://github.com/microsoft/LoRA) style fine-tuning, this project innovates by directly fine-tuning the quantized parameters of LLMs. This paradigm shift allows for the full-scale quantization fine-tuning of LLMs, ensuring that the training process tightly integrates with the quantization schema from the outset.\n- [green-bit-llm-inference](https://github.com/GreenBitAI/green-bit-llm/tree/main/green_bit_llm/inference) also showcase the BIE's adeptness at supporting inference for models quantized from 4 to 2-bits without any significant loss in accuracy compared to the original 32 or 16-bits models. It stands as a testament to BIE's capability to maintain the delicate balance between model size, computational efficiency, and accuracy, addressing one of the key challenges in deploying sophisticated neural networks in resource-constrained environments.\n\nThese projects exemplify the practical applications of Bitorch Engine and underscore its flexibility and efficiency for modern AI research and development.\nHowever, keep in mind that BIE is still in an early beta stage, see our roadmap below. \n\n## Roadmap\n\nOur goals for BITorch engine in the future are (not necessarily in this order):\n\n- Add support for (Distributed) Data Parallel training strategies (for selected layers)\n- Provide better support for Metal kernels\n- Improve our existing code, so it becomes even faster, more memory-efficient and easier to use\n- Binary pip releases which include the built extensions\n\nWe are planning to release new features and improvements as they become available,\nbut this also means breaking changes can occur in the API during our beta stage.\n\n## Installation\n\nThe requirements are:\n\n- A compiler that fully supports C++17, such as clang or gcc (gcc 9.4.0 or newer is required, but gcc 12.x is not supported yet)\n- Python 3.9 or later\n- PyTorch 1.8 or later\n\nPlease check your operating system's options for the C++ compiler.\nFor more detailed information, you can check the [requirements to build PyTorch from source](https://github.com/pytorch/pytorch?tab=readme-ov-file#prerequisites).\nIn addition, for layers to speed up on specific hardware (such as CUDA devices, or MacOS M1/2/3 chips), we recommend installing:\n\n- CUDA Toolkit 11.8 or 12.1 for CUDA accelerated layers\n- **[MLX](https://github.com/ml-explore/mlx)** for mlx-based layers on MacOS\n- **[CUTLASS](https://github.com/NVIDIA/cutlass)** for cutlass-based layers\n\n### Binary Release\n\n**A first experimental binary release for Linux with CUDA 12.1 is ready.**\nIt only supports GPUs with CUDA compute capability with 8.6 or higher ([check here](https://developer.nvidia.com/cuda-gpus)).\nFor MacOS or lower compute capability, build the package from source (additional binary release options are planned in the future).\nWe recommend to create a conda environment to manage the installed CUDA version and other packages:\n\n1. Create Environment for Python 3.10 and activate it:\n```bash\nconda create -y --name bitorch-engine python=3.10\nconda activate bitorch-engine\n```\n\nAs an alternative, you can also store the environment in a relative path.\n\n\u003cdetails\u003e\u003csummary\u003eClick to here to expand the instructions for this.\u003c/summary\u003e\n\n```bash\nexport BITORCH_WORKSPACE=\"${HOME}/bitorch-workspace\"\nmkdir -p \"${BITORCH_WORKSPACE}\" \u0026\u0026 cd \"${BITORCH_WORKSPACE}\"\nconda create -y --prefix ./conda-env python=3.10\nconda activate ./conda-env\n```\n\n\u003c/details\u003e\n\n2. Install CUDA (if it is not installed already on the system):\n```bash\nconda install -y -c \"nvidia/label/cuda-12.1.0\" cuda-toolkit\n```\n3. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for CUDA 12.1\nand Python 3.10 - you can find other versions [here](https://packages.greenbit.ai/whl/)) together with bitorch engine:\n```bash\npip install \\\n  \"https://packages.greenbit.ai/whl/cu121/torch/torch-2.3.0-cp310-cp310-linux_x86_64.whl\" \\\n  \"https://packages.greenbit.ai/whl/cu121/bitorch-engine/bitorch_engine-0.2.6-cp310-cp310-linux_x86_64.whl\"\n```\n\n### Build From Source\n\nWe provide instructions for the following options:\n\n- [Conda + Linux](#conda-on-linux-with-cuda) (with CUDA and cutlass)\n- [Docker](#docker-with-cuda) (with CUDA and cutlass)\n- [Conda + MacOS](#conda-on-macos-with-mlx) (with MLX)\n\nWe recommend managing your BITorch Engine installation in a conda environment (otherwise you should adapt/remove certain variables, e.g. `CUDA_HOME`).\nYou may want to keep everything (environment, code, etc.) in one directory or use the default directory for conda environments.\nYou may wish to adapt the CUDA version to 12.1 where applicable.\n\n#### Conda on Linux (with CUDA)\n\nTo use these instructions, you need to have [conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html) and a suitable C++ compiler installed.\n\n1. Create Environment for Python 3.9 and activate it:\n```bash\nconda create -y --name bitorch-engine python=3.9\nconda activate bitorch-engine\n```\n2. Install CUDA\n```bash\nconda install -y -c \"nvidia/label/cuda-11.8.0\" cuda-toolkit\n```\n3. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for CUDA 11.8\nand Python 3.9 - you can find other versions [here](https://packages.greenbit.ai/whl/)):\n```bash\npip install \"https://packages.greenbit.ai/whl/cu118/torch/torch-2.1.0-cp39-cp39-linux_x86_64.whl\"\n```\n4. To use cutlass layers, you should also install CUTLASS 2.8.0 (from source), adjust `CUTLASS_HOME` (this is where we clone and install cutlass)\n(if you have older or newer GPUs you may need to add your [CUDA compute capability](https://developer.nvidia.com/cuda-gpus) in `CUTLASS_NVCC_ARCHS`):\n```bash\nexport CUTLASS_HOME=\"/some/path\"\nmkdir -p \"${CUTLASS_HOME}\"\ngit clone --depth 1 --branch \"v2.8.0\" \"https://github.com/NVIDIA/cutlass.git\" --recursive ${CUTLASS_HOME}/source\nmkdir -p \"${CUTLASS_HOME}/build\" \u0026\u0026 mkdir -p \"${CUTLASS_HOME}/install\"\ncd \"${CUTLASS_HOME}/build\"\ncmake ../source -DCMAKE_INSTALL_PREFIX=\"${CUTLASS_HOME}/install\" -DCUTLASS_ENABLE_TESTS=OFF -DCUTLASS_ENABLE_EXAMPLES=OFF -DCUTLASS_NVCC_ARCHS='75;80;86'\nmake -j 4\ncmake --install .\n```\nIf you have difficulties installing cutlass, you can check the [official documentation](https://github.com/NVIDIA/cutlass/tree/v2.8.0),\nuse the other layers without installing it or try the docker installation.\n\nAs an alternative to the instructions above, you can also store the environment and clone all repositories within one \"root\" directory.\n\n\u003cdetails\u003e\u003csummary\u003eClick to here to expand the instructions for this.\u003c/summary\u003e\n\n0. Set workspace dir (use an absolute path!):\n```bash\nexport BITORCH_WORKSPACE=\"${HOME}/bitorch-workspace\"\nmkdir -p \"${BITORCH_WORKSPACE}\" \u0026\u0026 cd \"${BITORCH_WORKSPACE}\"\n```\n1. Create Environment for Python 3.9 and activate it:\n```bash\nconda create -y --prefix ./conda-env python=3.9\nconda activate ./conda-env\n```\n2. Install CUDA\n```bash\nconda install -y -c \"nvidia/label/cuda-11.8.0\" cuda-toolkit\n```\n3. Install our customized torch that allows gradients on INT tensors and install it with pip (this url is for CUDA 11.8\nand Python 3.9 - you can find other versions [here](https://packages.greenbit.ai/whl/)):\n```bash\npip install \"https://packages.greenbit.ai/whl/cu118/torch/torch-2.1.0-cp39-cp39-linux_x86_64.whl\"\n```\n4. To use cutlass layers, you should also install CUTLASS 2.8.0\n(if you have older or newer GPUs you may need to add your [CUDA compute capability](https://developer.nvidia.com/cuda-gpus) in `CUTLASS_NVCC_ARCHS`):\n```bash\nexport CUTLASS_HOME=\"${BITORCH_WORKSPACE}/cutlass\"\nmkdir -p \"${CUTLASS_HOME}\"\ngit clone --depth 1 --branch \"v2.8.0\" \"https://github.com/NVIDIA/cutlass.git\" --recursive ${CUTLASS_HOME}/source\nmkdir -p \"${CUTLASS_HOME}/build\" \u0026\u0026 mkdir -p \"${CUTLASS_HOME}/install\"\ncd \"${CUTLASS_HOME}/build\"\ncmake ../source -DCMAKE_INSTALL_PREFIX=\"${CUTLASS_HOME}/install\" -DCUTLASS_ENABLE_TESTS=OFF -DCUTLASS_ENABLE_EXAMPLES=OFF -DCUTLASS_NVCC_ARCHS='75;80;86'\nmake -j 4\ncmake --install .\ncd \"${BITORCH_WORKSPACE}\"\n```\nIf you have difficulties installing cutlass, you can check the [official documentation](https://github.com/NVIDIA/cutlass/tree/v2.8.0),\nuse the other layers without installing it or try the docker installation.\n\u003c/details\u003e\n\nAfter setting up the environment, clone the code and build with pip (to hide the build output remove `-v`):\n\n```bash\n# make sure you are in a suitable directory, e.g. your bitorch workspace\ngit clone --recursive https://github.com/GreenBitAI/bitorch-engine\ncd bitorch-engine\n# only gcc versions 9.x, 10.x, 11.x are supported\n# to select the correct gcc, use:\n# export CC=gcc-11 CPP=g++-11 CXX=g++-11\nCPATH=\"${CUTLASS_HOME}/install/include\" CUDA_HOME=\"${CONDA_PREFIX}\" pip install -e . -v\n```\n\n#### Docker (with CUDA)\n\nYou can also use our prepared Dockerfile to build a docker image (which includes building the engine under `/bitorch-engine`):\n\n```bash\ncd docker\ndocker build -t bitorch/engine .\ndocker run -it --rm --gpus all --volume \"/path/to/your/project\":\"/workspace\" bitorch/engine:latest\n```\n\nCheck the [docker readme](docker/README.md) for options and more details.\n\n#### Conda on MacOS (with MLX)\n\n1. We recommend to create a virtual environment for and activate it. In the following example we use a conda environment for python 3.9, \nbut virtualenv should work as well.\n```bash\nconda create -y --name bitorch-engine python=3.9\nconda activate bitorch-engine\n```\n2. Install our customized torch that allows gradients on INT tensors and install it with pip (this URL is for macOS\nwith Python 3.9 - you can find other versions [here](https://packages.greenbit.ai/whl/)):\n```bash\npip install \"https://packages.greenbit.ai/whl/macosx/torch/torch-2.2.1-cp39-none-macosx_11_0_arm64.whl\"\n```\n3. For MacOS users and to use OpenMP acceleration, install OpenMP with Homebrew and configure the environment:\n```bash\nbrew install libomp\n# during libomp installation it should remind you, you need something like this:\nexport LDFLAGS=\"-L$(brew --prefix)/opt/libomp/lib\"\nexport CPPFLAGS=\"-I$(brew --prefix)/opt/libomp/include\"\n```\n4. To use the [mlx](https://github.com/ml-explore/mlx) accelerated `MPQLinearLayer`, you need to install the python library.\n```bash\n# use one of the following, to either install with pip or conda:\npip install mlx==0.4.0\nconda install conda-forge::mlx=0.4.0\n```\n Currently, we only tested version 0.4.0. However, newer versions might also work.\n To train the `MPQLinearLayer` you need to install our custom PyTorch version (see steps above).\n Without it, you need to specify `requires_grad=False` when initializing `MPQLinearLayer`.\n5. You should now be able to build with:\n```bash\ngit clone --recursive https://github.com/GreenBitAI/bitorch-engine\ncd bitorch-engine\npip install -e . -v\n```\n\n## Build options\n\n### Building Specific Extensions\n\nWhile developing, a specific cpp/cuda extension can be (re-)build, by using the environment variable `BIE_BUILD_ONLY`,\nlike so:\n```bash\nBIE_BUILD_ONLY=\"bitorch_engine/layers/qlinear/binary/cpp\" pip install -e . -v\n```\nIt needs to a relative path to one extension directory.\n\n### Building for a Specific CUDA Architecture\n\nTo build for a different CUDA Arch, use the environment variable `BIE_CUDA_ARCH` (e.g. use 'sm_75', 'sm_80', 'sm_86'):\n```bash\nBIE_CUDA_ARCH=\"sm_86\" pip install -e . -v\n```\n\n### Force Building CUDA Modules\n\nIf you have CUDA development libraries installed, but `torch.cuda.is_available()` is False, e.g. in HPC or docker environments,\nyou can still build the extensions that depend on CUDA, by setting `BIE_FORCE_CUDA=\"true\"`:\n```bash\nBIE_FORCE_CUDA=\"true\" pip install -e . -v\n```\n\n### Skip Library File Building\n\nIf you just want to avoid rebuilding any files, you can set `BIE_SKIP_BUILD`:\n```bash\nBIE_SKIP_BUILD=\"true\" python3 -m build --no-isolation --wheel\n```\nThis would create a wheel and package `.so` files without trying to rebuild them.\n\n## Development\n\nTo adjust the build options or address build failures, modify the configurations in \n[cpp_extension.py](bitorch_engine/utils/cpp_extension.py)/\n[cuda_extension.py](bitorch_engine/utils/cuda_extension.py).\n\nYou may want to clean the build output before rebuilding, which may help to avoid errors and/or install development requirements:\n```bash\npython setup.py clean\n# now build like usually, use \".[dev]\" for development requirements, e.g.\nCUDA_HOME=\"${CONDA_PREFIX}\" pip install -e \".[dev]\" -v\n```\n\nYou can run our tests with pytest:\n```bash\npytest\n```\n\n### Cuda Device Selection\n\nTo select a certain CUDA device, set the environment variable `BIE_DEVICE`, e.g.:\n```bash\nexport BIE_DEVICE=1  # This selects the second CUDA device, as indexing starts from 0.\n```\n\n## Documentation\n\nCheck out the [Documentation](https://greenbitai.github.io/bitorch-engine) for API reference.\n\n## Examples\n\n- Basic example scripts can be found directly in [examples](examples).\n- [green-bit-llm-trainer](https://github.com/GreenBitAI/green-bit-llm/tree/main/green_bit_llm/sft) showcases the fine-tuning training of LLMs with quantized parameters.\n- [green-bit-llm-inference](https://github.com/GreenBitAI/green-bit-llm/tree/main/green_bit_llm/inference) showcases the BIE's adeptness at supporting fast inference for 4 to 2-bits LLMs.\n\n## Contributors\n\nBIE is under active development and currently maintained by contributors: [Haojin Yang](https://github.com/yanghaojin), [Joseph Bethge](https://github.com/Jopyth), [Nianhui Guo](https://github.com/NicoNico6), [Maximilian Schulze](https://github.com/max-3l), Hong Guo, [Paul Mattes](https://github.com/Snagnar).\n\nCheck our [contributing guide](CONTRIBUTING.md) to learn about how to contribute to the project.\n\n## License\n\nBitorch Engine is made available under the [Apache 2.0 License](LICENSE). See the LICENSE file for details.\n\n## Citation\nIf you use our approach in your research, please cite our work as follows:\n```\n@article{bitorch_engine,\n  title={Bitorch Engine: Streamlining AI with Open-Source Low-Bit Quantization},\n  author={Yang, Haojin and Bethge, Joseph and Guo, Nianhui and Schulze, Maximilian and Guo, Hong},\n  journal={https://github.com/GreenBitAI/bitorch-engine},\n  year={2024}\n}\n```\n\n## References and Acknowledgements\n\nThis project builds upon or uses concepts from the following open-source projects:\n\n- **[PyTorch](https://github.com/pytorch/pytorch)**\n- **[CUTLASS](https://github.com/NVIDIA/cutlass)**\n- **[MLX](https://github.com/ml-explore/mlx)**\n- **[ExLlamaV2](https://github.com/turboderp/exllamav2)**\n- **[TCBNN](https://github.com/pnnl/TCBNN)**\n- **[GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)**\n\nWe extend our heartfelt gratitude to the developers of these projects for their invaluable contributions to the open-source community. Without their exceptional work, none of this would be possible.\nThe corresponding licenses of the reference projects can be found in the [licenses](licenses) directory of the source tree.\n\n### Open Source Software Acknowledgment\n\nThis project makes use of open source software (OSS) components. The original code of these components is kept under their respective licenses and copyrights. We are grateful to the open-source community for making these resources available. For specific information about each component's license, please refer to the corresponding sections within our project documentation or the direct references provided in the \"References\" section of this document.\n\nWe endeavor to comply with all open source licenses and their requirements, including proper acknowledgment and notice. If there are any concerns or questions regarding our license acknowledgments, please reach out to us for clarification.\n","funding_links":[],"categories":["A01_文本生成_文本对话","Tools"],"sub_categories":["大语言对话模型及数据","Other"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGreenBitAI%2Fbitorch-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGreenBitAI%2Fbitorch-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGreenBitAI%2Fbitorch-engine/lists"}