{"id":24493020,"url":"https://github.com/fastmachinelearning/qonnx","last_synced_at":"2025-05-16T03:06:48.549Z","repository":{"id":38191348,"uuid":"383864114","full_name":"fastmachinelearning/qonnx","owner":"fastmachinelearning","description":"QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX","archived":false,"fork":false,"pushed_at":"2025-05-11T18:12:53.000Z","size":5645,"stargazers_count":148,"open_issues_count":48,"forks_count":45,"subscribers_count":23,"default_branch":"main","last_synced_at":"2025-05-11T18:19:07.305Z","etag":null,"topics":["deep-learning","fpga","inference","machine-learning","onnx","quantization","quantized-neural-networks"],"latest_commit_sha":null,"homepage":"https://qonnx.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fastmachinelearning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-07-07T16:40:34.000Z","updated_at":"2025-05-11T11:01:14.000Z","dependencies_parsed_at":"2022-08-08T23:17:12.605Z","dependency_job_id":"56ffc0f9-2594-4478-855c-cacce37892ff","html_url":"https://github.com/fastmachinelearning/qonnx","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastmachinelearning%2Fqonnx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastmachinelearning%2Fqonnx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastmachinelearning%2Fqonnx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fastmachinelearning%2Fqonnx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fastmachinelearning","download_url":"https://codeload.github.com/fastmachinelearning/qonnx/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254459088,"owners_count":22074605,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","fpga","inference","machine-learning","onnx","quantization","quantized-neural-networks"],"created_at":"2025-01-21T19:18:54.280Z","updated_at":"2025-05-16T03:06:43.539Z","avatar_url":"https://github.com/fastmachinelearning.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX\n\n[![ReadTheDocs](https://readthedocs.org/projects/qonnx/badge/?version=latest\u0026style=plastic)](http://qonnx.readthedocs.io/)\n[![GitHub Discussions](https://img.shields.io/github/discussions/fastmachinelearning/qonnx)](https://github.com/fastmachinelearning/qonnx/discussions)\n![Tests](https://github.com/fastmachinelearning/qonnx/actions/workflows/test.yml/badge.svg)\n![Code style](https://img.shields.io/badge/code%20style-black-000000.svg)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7622236.svg)](https://doi.org/10.5281/zenodo.7622236)\n[![PyPI version](https://badge.fury.io/py/qonnx.svg)](https://badge.fury.io/py/qonnx)\n[![Downloads](https://static.pepy.tech/personalized-badge/qonnx?period=total\u0026units=international_system\u0026left_color=grey\u0026right_color=orange\u0026left_text=Downloads)](https://pepy.tech/project/qonnx)\n\n\u003cimg align=\"left\" src=\"https://xilinx.github.io/finn/img/TFC_1W2A.onnx.png\" alt=\"QONNX example\" style=\"margin-right: 20px\" width=\"200\"/\u003e\n\n\nQONNX (Quantized ONNX) introduces three new custom operators -- [`Quant`](docs/qonnx-custom-ops/quant_op.md), [`BipolarQuant`](docs/qonnx-custom-ops/bipolar_quant_op.md), and [`Trunc`](docs/qonnx-custom-ops/trunc_op.md) -- in order to represent arbitrary-precision uniform quantization in ONNX. This enables:\n* Representation of binary, ternary, 3-bit, 4-bit, 6-bit or any other quantization.\n* Quantization is an operator itself, and can be applied to any parameter or layer input.\n* Flexible choices for scaling factor and zero-point granularity.\n* Quantized values are carried using standard `float` datatypes to remain ONNX protobuf-compatible.\n\nThis repository contains a set of Python utilities to work with QONNX models, including but not limited to:\n* executing QONNX models for (slow) functional verification\n* shape inference, constant folding and other basic optimizations\n* summarizing the inference cost of a QONNX model in terms of mixed-precision MACs, parameter and activation volume\n* Python infrastructure for writing transformations and defining executable, shape-inferencable custom ops\n* (experimental) data layout conversion from standard ONNX NCHW to custom QONNX NHWC ops\n\n## Quickstart\n\n### Operator definitions\n\n* [Quant](docs/qonnx-custom-ops/quant_op.md) for 2-to-arbitrary-bit quantization, with scaling and zero-point\n* [BipolarQuant](docs/qonnx-custom-ops/bipolar_quant_op.md)  for 1-bit (bipolar) quantization, with scaling and zero-point\n* [Trunc](docs/qonnx-custom-ops/trunc_op.md) for truncating to a specified number of bits, with scaling and zero-point\n\n### Installation\n\n`pip install qonnx`\n\n### Export, Import and Model Zoo\n\nThe following quantization-aware training (QAT) frameworks support exporting to QONNX:\n\n* [Brevitas](https://github.com/Xilinx/brevitas)\n* [QKeras](https://github.com/google/qkeras) (beta, see [this PR](https://github.com/fastmachinelearning/qonnx/pull/7))\n* [HAWQ](https://github.com/Zhen-Dong/HAWQ/tree/main/utils/export)\n* [\u003cyour NN quantization framework here? please get in touch!\u003e](https://github.com/fastmachinelearning/qonnx/discussions)\n\nThe following NN inference frameworks support importing QONNX models for deployment:\n\n* [FINN](https://github.com/Xilinx/finn) (FPGA dataflow-style)\n* [hls4ml](https://github.com/fastmachinelearning/hls4ml) (FPGA dataflow-style)\n* [\u003cyour NN deployment framework here? please get in touch!\u003e](https://github.com/fastmachinelearning/qonnx/discussions)\n\nHead to the [QONNX model zoo](https://github.com/fastmachinelearning/QONNX_model_zoo) to download pre-trained QONNX models on various datasets.\n\n### Model Visualization\n\nWe recommend [Netron](https://netron.app/) for visualizing QONNX models.\n\n### Executing ONNX graph with QONNX custom nodes\n\nUsing the `qonnx-exec` command line utility, with top-level inputs supplied from `in0.npy` and `in1.npy`:\n\n`qonnx-exec my-qonnx-model.onnx in0.npy in1.npy`\n\nUsing the Python API:\n\n```\nfrom qonnx.core.modelwrapper import ModelWrapper\nfrom qonnx.core.onnx_exec import execute_onnx\n\nmodel = ModelWrapper(\"my-qonnx-model.onnx\")\nidict = {\"in0\" : np.load(\"in0.npy), \"in1\" : np.load(\"in1.npy\")}\nodict = execute_onnx(idict)\n```\n\n### Calculate inference cost for QONNX model\n\nUsing the `qonnx-inference-cost` command line utility for the [CNV_2W2A example](https://github.com/fastmachinelearning/qonnx_model_zoo/tree/main/models/CIFAR10/Brevitas_FINN_CNV):\n\n`qonnx-inference-cost CNV_2W2A.onnx`\n\nWhich will print a inference cost dictionary like the following:\n\n```\nInference cost for CNV_2W2A.onnx\n{\n  \"discount_sparsity\": true,    # discount MAC counts by layer sparsity (disregard zero-valued MACs and params)\n  # mem_o_X: number of layer outputs with datatype X\n  \"mem_o_INT32\": 142602.0,       # number of INT32 output elements\n  # mem_o_X: number of layer parameters (weights) with datatype X\n  \"mem_w_INT2\": 908033.0,      # number of INT2 parameters (weights)\n  # op_mac_X_Y: number of MAC operations, datatype X by datatype Y\n  # scaled integer datatypes have a tensor- or channelwise scale factor\n  \"op_mac_SCALEDINT\u003c8\u003e_INT2\": 1345500.0, # number of scaled int8 x int2 MACs\n  \"op_mac_INT2_INT2\": 35615771.0,   # number of int2 x int2 MACs\n  \"total_bops\": 163991084.0,        # total number of MACs normalized to bit-ops (BOPS)\n  \"total_mem_o_bits\": 4563264.0,    # total number of bits for layer outputs\n  \"total_mem_w_bits\": 1816066.0,    # total number of bits for layer parameters\n  \"unsupported\": \"set()\"\n}\n```\n\nYou can use the `--cost-breakdown` option to generate a more detailed report that covers per-node (by name) and per-op-type information.\nYou can read more about the BOPS metric in [this paper](https://www.frontiersin.org/articles/10.3389/frai.2021.676564/full), Section 4.2 Bit Operations.\n\n### Convert between different quantization representations\n\nUsing the `qonnx-convert` command line utility you can convert from QONNX to QCDQ-style quantization:\n\n`qonnx-convert CNV_2W2A.onnx`\n\nThis will convert `Quant` nodes to `QuantizeLinear -\u003e Clip -\u003e DequantizeLinear` nodes where possible.\nPlease see the documentation of the `QuantToQCDQ` transformation to learn more about the limitations.\n\n## Development\n\nInstall in editable mode in a Python virtual environment:\n\n```\ngit clone https://github.com/fastmachinelearning/qonnx\ncd qonnx\nvirtualenv -p python3.10 venv\nsource venv/bin/activate\npip install --upgrade pip\npip install -e .[qkeras,testing]\n```\n\n### Running tests\n\nRun entire test suite, parallelized across CPU cores:\n```\npytest -n auto --verbose\n```\n\nRun a particular test and fall into pdb if it fails:\n```\npytest --pdb -k \"test_extend_partition.py::test_extend_partition[extend_id1-2]\"\n```\n\n### Linting\n\nIf you plan to make pull requests to the qonnx repo, linting will be required.\nWe use a pre-commit hook to auto-format Python code and check for issues. See https://pre-commit.com/ for installation. Once you have `pre-commit`,\nyou can install the hooks into your local clone of the qonnx repo:\n\n```\ncd qonnx\nsource venv/bin/activate\npip install pre-commit\npre-commit install\n```\n\nEvery time you commit some code, the pre-commit hooks will first run, performing various checks and fixes. In some cases pre-commit won’t be able to\nfix the issues and you may have to fix it manually, then run git commit once again. The checks are configured in .pre-commit-config.yaml under the repo root.\n\n## Why QONNX?\n\nThe QONNX representation has several advantages compared to other alternatives, as summarized in the table below.\nThese include a compact but flexible, single-node quantization representation that avoids operator duplication\nand can support arbitrary precision up to the container datatype limit.\n\n\u003cimg align=\"left\" src=\"https://raw.githubusercontent.com/fastmachinelearning/qonnx/main/docs/qonnx-comparison.png\" alt=\"QONNX comparison table\" style=\"margin-right: 20px\" /\u003e\n\n## Community\n\nThe QONNX efforts were started by the FINN and hls4ml communities working together to create a common, arbitrary-precision representation that both frameworks could ingest. However, QONNX aims to build an open-source community for practitioners and researchers working with mixed-precision quantized neural networks by providing useful tools and a [discussion forum](https://github.com/fastmachinelearning/qonnx/discussions).\n\n\u003cdiv\u003e\n\u003cimg src=https://raw.githubusercontent.com/Xilinx/finn/github-pages/docs/img/finn-logo.png height=100/\u003e\n\u003cimg src=\"https://fastmachinelearning.github.io/hls4ml/img/logo.jpg\" alt=\"hls4ml\" height=\"128\"/\u003e\n\u003c/div\u003e\n\n## Resources\n\nYou can read more about QONNX in [this paper](https://arxiv.org/abs/2206.07527). If you find QONNX useful in your work, please consider citing:\n\n```bibtex\n@inproceedings{Pappalardo:2022nxk,\n    author = \"Pappalardo, Alessandro and Umuroglu, Yaman and Blott, Michaela and Mitrevski, Jovan and Hawks, Ben and Tran, Nhan and Loncar, Vladimir and Summers, Sioni and Borras, Hendrik and Muhizi, Jules and Trahms, Matthew and Hsu, Shih-Chieh Hsu and Hauck, Scott and Duarte, Javier\"\n    title = \"{QONNX: Representing Arbitrary-Precision Quantized Neural Networks}\",\n    booktitle = \"{4th Workshop on Accelerated Machine Learning (AccML) at HiPEAC 2022 Conference}\",\n    eprint = \"2206.07527\",\n    archivePrefix = \"arXiv\",\n    primaryClass = \"cs.LG\",\n    reportNumber = \"FERMILAB-CONF-22-471-SCD\",\n    month = \"6\",\n    year = \"2022\",\n    url = \"https://accml.dcs.gla.ac.uk/papers/2022/4thAccML_paper_1(12).pdf\"\n}\n\n@software{yaman_umuroglu_2023_7622236,\n  author       = \"Umuroglu, Yaman and Borras, Hendrik and Loncar, Vladimir, and Summers, Sioni and Duarte, Javier\",\n  title        = \"fastmachinelearning/qonnx\",\n  month        = {06},\n  year         = 2022,\n  publisher    = {Zenodo},\n  doi          = {10.5281/zenodo.7622236},\n  url          = {https://github.com/fastmachinelearning/qonnx}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffastmachinelearning%2Fqonnx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffastmachinelearning%2Fqonnx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffastmachinelearning%2Fqonnx/lists"}