{"id":13443254,"url":"https://github.com/google/qkeras","last_synced_at":"2025-03-20T16:30:47.389Z","repository":{"id":39312302,"uuid":"200931125","full_name":"google/qkeras","owner":"google","description":"QKeras: a quantization deep learning library for Tensorflow Keras","archived":false,"fork":false,"pushed_at":"2025-02-13T22:58:36.000Z","size":1602,"stargazers_count":558,"open_issues_count":48,"forks_count":104,"subscribers_count":30,"default_branch":"master","last_synced_at":"2025-03-16T03:16:29.799Z","etag":null,"topics":["accelerator","asic-design","deep-learning","fpga","fpga-accelerator","hardware-acceleration","keras","machine-learning","quantization","quantized-networks","quantized-neural-networks","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-06T22:11:58.000Z","updated_at":"2025-03-16T02:43:16.000Z","dependencies_parsed_at":"2024-03-21T18:43:58.533Z","dependency_job_id":"20f3fa21-c529-4e1d-be63-6f2abe2e1244","html_url":"https://github.com/google/qkeras","commit_stats":{"total_commits":387,"total_committers":29,"mean_commits":"13.344827586206897","dds":0.7441860465116279,"last_synced_commit":"ca554222677a25500c70f363882210811490a358"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fqkeras","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fqkeras/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fqkeras/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Fqkeras/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google","download_url":"https://codeload.github.com/google/qkeras/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244649727,"owners_count":20487478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accelerator","asic-design","deep-learning","fpga","fpga-accelerator","hardware-acceleration","keras","machine-learning","quantization","quantized-networks","quantized-neural-networks","tensorflow"],"created_at":"2024-07-31T03:01:58.171Z","updated_at":"2025-03-20T16:30:46.897Z","avatar_url":"https://github.com/google.png","language":"Python","funding_links":[],"categories":["Python","The Data Science Toolbox","Deep Learning","Tools","Tensor Flow"],"sub_categories":["Deep Learning Packages","TensorFlow","Approximations Frameworks","Automated Machine Learning"],"readme":"# QKeras\n\n[github.com/google/qkeras](https://github.com/google/qkeras)\n\n## Introduction\n\nQKeras is a quantization extension to Keras that provides drop-in\nreplacement for some of the Keras layers, especially the ones that\ncreates parameters and activation layers, and perform arithmetic\noperations, so that we can quickly create a deep quantized version of\nKeras network.\n\nAccording to Tensorflow documentation, Keras is a high-level API to\nbuild and train deep learning models. It's used for fast prototyping,\nadvanced research, and production, with three key advantages:\n\n- User friendly\n\nKeras has a simple, consistent interface optimized for common use\ncases. It provides clear and actionable feedback for user errors.\n\n- Modular and composable\n\nKeras models are made by connecting configurable building blocks\ntogether, with few restrictions.\n\n- Easy to extend\n\nWrite custom building blocks to express new ideas for research. Create\nnew layers, loss functions, and develop state-of-the-art models.\n\nQKeras is being designed to extend the functionality of Keras using\nKeras' design principle, i.e. being user friendly, modular and\nextensible, adding to it being \"minimally intrusive\" of Keras native\nfunctionality.\n\nIn order to successfully quantize a model, users need to replace\nvariable creating layers (Dense, Conv2D, etc) by their counterparts\n(QDense, QConv2D, etc), and any layers that perform math operations\nneed to be quantized afterwards.\n\n## Publications\n\n- Claudionor N. Coelho Jr, Aki Kuusela, Shan Li, Hao Zhuang, Jennifer Ngadiuba, Thea Klaeboe Aarrestad, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, \"Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors\", Nature Machine Intelligence (2021), https://www.nature.com/articles/s42256-021-00356-5\n\n- Claudionor N. Coelho Jr., Aki Kuusela, Hao Zhuang, Thea Aarrestad, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Sioni Summers, \"Ultra Low-latency, Low-area Inference Accelerators using Heterogeneous Deep Quantization with QKeras and hls4ml\", http://arxiv.org/abs/2006.10159v1\n\n- Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides, \"Enabling Binary Neural Network Training on the Edge\", https://arxiv.org/abs/2102.04270\n\n## Layers Implemented in QKeras\n\n- QDense\n\n- QConv1D\n\n- QConv2D\n\n- QDepthwiseConv2D\n\n- QSeparableConv1D (depthwise + pointwise convolution, without\nquantizing the activation values after the depthwise step)\n\n- QSeparableConv2D (depthwise + pointwise convolution, without\nquantizing the activation values after the depthwise step)\n\n- QMobileNetSeparableConv2D (extended from MobileNet SeparableConv2D\nimplementation, quantizes the activation values after the depthwise step)\n\n- QConv2DTranspose\n\n- QActivation\n\n- QAdaptiveActivation\n\n- QAveragePooling2D (in fact, an AveragePooling2D stacked with a \nQActivation layer for quantization of the result)\n\n- QBatchNormalization (is still in its experimental stage, as we\nhave not seen the need to use this yet due to the normalization \nand regularization effects of stochastic activation functions.)\n\n- QOctaveConv2D\n\n- QSimpleRNN, QSimpleRNNCell\n\n- QLSTM, QLSTMCell\n\n- QGRU, QGRUCell\n\n- QBidirectional\n\nIt is worth noting that not all functionality is safe at this time to\nbe used with other high-level operations, such as with layer\nwrappers. For example, Bidirectional layer wrappers are used with\nRNNs.  If this is required, we encourage users to use quantization\nfunctions invoked as strings instead of the actual functions as a way\nthrough this, but we may change that implementation in the future.\n\nA first attempt to create a safe mechanism in QKeras is the adoption\nof QActivation is a wrap-up that provides an encapsulation around the\nactivation functions so that we can save and restore the network\narchitecture, and duplicate them using Keras interface, but this\ninterface has not been fully tested yet.\n\n## Activation Layers Implemented in QKeras\n\n- smooth_sigmoid(x)\n\n- hard_sigmoid(x)\n\n- binary_sigmoid(x)\n\n- binary_tanh(x)\n\n- smooth_tanh(x)\n\n- hard_tanh(x)\n\n- quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=1)(x)\n\n- bernoulli(alpha=1.0)(x)\n\n- stochastic_ternary(alpha=1.0, threshold=0.33)(x)\n\n- ternary(alpha=1.0, threshold=0.33)(x)\n\n- stochastic_binary(alpha=1.0)(x)\n\n- binary(alpha=1.0)(x)\n\n- quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0)(x)\n\n- quantized_ulaw(bits=8, integer=0, symmetric=0, u=255.0)(x)\n\n- quantized_tanh(bits=8, integer=0, symmetric=0)(x)\n\n- quantized_po2(bits=8, max_value=-1)(x)\n\n- quantized_relu_po2(bits=8, max_value=-1)(x)\n\nThe stochastic_* functions, bernoulli as well as quantized_relu and\nquantized_tanh rely on stochastic versions of the activation\nfunctions. They draw a random number with uniform distribution from\n_hard_sigmoid of the input x, and result is based on the expected\nvalue of the activation function. Please refer to the papers if you\nwant to understand the underlying theory, or the documentation in\nqkeras/qlayers.py.\n\nThe parameters \"bits\" specify the number of bits for the quantization,\nand \"integer\" specifies how many bits of \"bits\" are to the left of the\ndecimal point. Finally, our experience in training networks with\nQSeparableConv2D, both quantized_bits and quantized_tanh that\ngenerates values between [-1, 1), required symmetric versions of the\nrange in order to properly converge and eliminate the bias.\n\nEvery time we use a quantization for weights and bias that can\ngenerate numbers outside the range [-1.0, 1.0], we need to adjust the\n*_range to the number. For example, if we have a\nquantized_bits(bits=6, integer=2) in a weight of a layer, we need to\nset the weight range to 2**2, which is equivalent to Catapult HLS\nac_fixed\u003c6, 3, true\u003e. Similarly, for quantization functions that accept an \nalpha parameter, we need to specify a range of alpha,\nand for po2 type of quantizers, we need to specify the range of\nmax_value.\n\n\n### Example\n\nSuppose you have the following network.\n\nAn example of a very simple network is given below in Keras.\n\n\n```python\nfrom keras.layers import *\n\nx = x_in = Input(shape)\nx = Conv2D(18, (3, 3), name=\"first_conv2d\")(x)\nx = Activation(\"relu\")(x)\nx = SeparableConv2D(32, (3, 3))(x)\nx = Activation(\"relu\")(x)\nx = Flatten()(x)\nx = Dense(NB_CLASSES)(x)\nx = Activation(\"softmax\")(x)\n```\n\nYou can easily quantize this network as follows:\n\n```python\nfrom keras.layers import *\nfrom qkeras import *\n\nx = x_in = Input(shape)\nx = QConv2D(18, (3, 3),\n        kernel_quantizer=\"stochastic_ternary\",\n        bias_quantizer=\"ternary\", name=\"first_conv2d\")(x)\nx = QActivation(\"quantized_relu(3)\")(x)\nx = QSeparableConv2D(32, (3, 3),\n        depthwise_quantizer=quantized_bits(4, 0, 1),\n        pointwise_quantizer=quantized_bits(3, 0, 1),\n        bias_quantizer=quantized_bits(3),\n        depthwise_activation=quantized_tanh(6, 2, 1))(x)\nx = QActivation(\"quantized_relu(3)\")(x)\nx = Flatten()(x)\nx = QDense(NB_CLASSES,\n        kernel_quantizer=quantized_bits(3),\n        bias_quantizer=quantized_bits(3))(x)\nx = QActivation(\"quantized_bits(20, 5)\")(x)\nx = Activation(\"softmax\")(x)\n```\n\nThe last QActivation is advisable if you want to compare results later on. \nPlease find more cases under the directory examples.\n\n\n## QTools\nThe purpose of QTools is to assist hardware implementation of the quantized\nmodel and model energy consumption estimation. QTools has two functions: data\ntype map generation and energy consumption estimation.\n\n- Data Type Map Generation:\nQTools automatically generate the data type map for weights, bias, multiplier,\nadder, etc. of each layer. The data type map includes operation type,\nvariable size, quantizer type and bits, etc. Input of the QTools is:\n1) a given quantized model;\n2) a list of input quantizers\nfor the model. Output of QTools json file that list the data type map of each\nlayer (stored in qtools_instance._output_dict)\nOutput methods include: qtools_stats_to_json, which is to output the data type\nmap to a json file; qtools_stats_print which is to print out the data type map.\n\n- Energy Consumption Estimation:\nAnother function of QTools is to estimate the model energy consumption in\nPico Joules (pJ). It provides a tool for QKeras users to quickly estimate\nenergy consumption for memory access and MAC operations in a quantized model\nderived from QKeras, especially when comparing power consumption of two models\nrunning on the same device.\n\nAs with any high-level model, it should be used with caution when attempting\nto estimate the absolute energy consumption of a model for a given technology,\nor when attempting to compare different technologies.\n\nThis tool also provides a measure for model tuning which needs to consider\nboth accuracy and model energy consumption. The energy cost provided by this\ntool can be integrated into a total loss function which combines energy\ncost and accuracy.\n\n- Energy Model:\nThe best work referenced by the literature on energy consumption was first\ncomputed by Horowitz M.: “1.1 computing’s energy problem (\nand what we can do about it)”; IEEE International Solid-State Circuits\nConference Digest of Technical Papers (ISSCC), 2014\n\nIn this work, the author attempted to estimate the energy\nconsumption for accelerators, and for 45 nm process, the data points he\npresented has since been used whenever someone wants to compare accelerator\nperformance. QTools energy consumption on a 45nm process is based on the\ndata published in this work.\n\n- Examples:\nExample of how to generate data type map can be found in qkeras/qtools/\nexamples/example_generate_json.py. Example of how to generate energy consumption\nestimation can be found in qkeras/qtools/examples/example_get_energy.py\n\n\n## AutoQKeras\n\nAutoQKeras allows the automatic quantization and rebalancing of deep neural\nnetworks by treating quantization and rebalancing of an existing deep neural\nnetwork as a hyperparameter search in Keras-Tuner using random search,\nhyperband or gaussian processes.\n\nIn order to contain the explosion of hyperparameters, users can group tasks by\npatterns, and perform distribute training using available resources.\n\nExtensive documentation is present in notebook/AutoQKeras.ipynb.\n\n\n## Related Work\n\nQKeras has been implemented based on the work of \"B.Moons et al. -\nMinimum Energy Quantized Neural Networks\", Asilomar Conference on\nSignals, Systems and Computers, 2017 and \"Zhou, S. et al. -\nDoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with\nLow Bitwidth Gradients,\" but the framework should be easily\nextensible. The original code from QNN can be found below.\n\nhttps://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow\n\nQKeras extends QNN by providing a richer set of layers (including\nSeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary\nquantizations), besides some functions to aid the estimation for the\naccumulators and conversion between non-quantized to quantized\nnetworks. Finally, our main goal is easy of use, so we attempt to make\nQKeras layers a true drop-in replacement for Keras, so that users can\neasily exchange non-quantized layers by quantized ones.\n\n### Acknowledgements\n\nPortions of QKeras were derived from QNN.\n\nhttps://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow\n\nCopyright (c) 2017, Bert Moons where it applies\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Fqkeras","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle%2Fqkeras","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Fqkeras/lists"}