{"id":13435763,"url":"https://github.com/pytorch/QNNPACK","last_synced_at":"2025-03-18T11:32:11.298Z","repository":{"id":43649964,"uuid":"154208543","full_name":"pytorch/QNNPACK","owner":"pytorch","description":"Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators","archived":true,"fork":false,"pushed_at":"2019-08-28T20:40:59.000Z","size":535,"stargazers_count":1537,"open_issues_count":9,"forks_count":220,"subscribers_count":99,"default_branch":"master","last_synced_at":"2025-03-17T11:54:04.091Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://code.fb.com/ml-applications/qnnpack/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pytorch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-22T20:01:29.000Z","updated_at":"2025-03-17T08:41:27.000Z","dependencies_parsed_at":"2022-07-12T21:10:56.449Z","dependency_job_id":null,"html_url":"https://github.com/pytorch/QNNPACK","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2FQNNPACK","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2FQNNPACK/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2FQNNPACK/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2FQNNPACK/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pytorch","download_url":"https://codeload.github.com/pytorch/QNNPACK/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244211156,"owners_count":20416589,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:00:38.853Z","updated_at":"2025-03-18T11:32:06.288Z","avatar_url":"https://github.com/pytorch.png","language":"C","readme":"# QNNPACK\nQNNPACK (Quantized Neural Networks PACKage) is a mobile-optimized library for low-precision high-performance neural network inference. QNNPACK provides implementation of common neural network operators on quantized 8-bit tensors.\n\nQNNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives for high-level deep learning frameworks. As of today, QNNPACK is integrated in [PyTorch 1.0](https://github.com/pytorch/pytorch) with Caffe2 graph representation.\n\n## Operator Coverage\n\nCurrently implemented and planned for implementation operators are below:\n\n- [x] 2D Convolution\n- [x] 2D Deconvolution\n- [x] Channel Shuffle\n- [x] Fully Connected\n- [ ] Locally Connected\n- [x] 2D Max Pooling\n- [x] 2D Average Pooling\n- [x] Global Average Pooling\n- [x] Sigmoid\n- [x] Leaky ReLU\n- [x] Clamp (can be used for ReLU, ReLU6 if it is not fused in another operator)\n- [x] SoftArgMax (aka SoftMax)\n- [ ] Group Normalization\n\n## Building\n\nQNNPACK provides standard CMake-based build scripts.\n\n### Native compilation\n\nUsers are recommended to use `scripts/build-local.sh` script to build QNNPACK for the host machine.\n\n### Cross-compilation for Android\n\nTo cross-compile for Android, set `$ANDROID_NDK` environment variable (where `$ANDROID_NDK` is the path to Android NDK directory, e.g. `/opt/android-ndk-r15c`) and use one of the scripts from the table below:\n\n| ABI         | Build script                     | Restrictions               |\n| ----------- | ---------------------------------| -------------------------- |\n| armeabi-v7a | `scripts/build-android-armv7.sh` | Requires CPU with ARM NEON |\n| arm64-v8a   | `scripts/build-android-arm64.sh` |                            |\n| x86         | `scripts/build-android-x86.sh`   |                            |\n\nNotes:\n- On **armeabi-v7a** `qnnp_initialize` will fail with `qnnp_status_unsupported_hardware` if the mobile CPU does not support ARM NEON. Don't set `-DANDROID_ARM_NEON=1` for QNNPACK compilation as it can make `qnnp_initialize` crash on CPUs without ARM NEON.\n\n### Cross-compilation for iOS\n\nTo cross-compile for iOS, clone [ios-cmake](https://github.com/leetal/ios-cmake), and set `$IOS_CMAKE_TOOLCHAIN_FILE` environment variable (where `$IOS_CMAKE_TOOLCHAIN_FILE` is the path to `ios.toolchain.cmake` file in [ios-cmake](https://github.com/leetal/ios-cmake)), and use one of the scripts from the table below:\n\n| Architecture | Build script                  | Notes                     |\n| ------------ | ----------------------------- | ------------------------- |\n| armv7        | `scripts/build-ios-armv7.sh`  | iPhone 3GS/4/4S           |\n| armv7        | `scripts/build-ios-armv7s.sh` | iPhone 5 and newer        |\n| arm64        | `scripts/build-ios-arm64.sh`  | iPhone 5S and newer       |\n| arm64e       | `scripts/build-ios-arm64e.sh` | iPhone XS/XR              |\n| i386         | `scripts/build-ios-i386.sh`   | iPhone Simulator (32-bit) |\n| x86_64       | `scripts/build-ios-x86_64.sh` | iPhone Simulator (64-bit) |\n\n## End-to-End Benchmarking\n\nCaffe2 backend of PyTorch 1.0 natively integrates QNNPACK, and provides a [pre-trained quantized MobileNet v2 model](https://github.com/caffe2/models/tree/master/mobilenet_v2_quantized). Below are instructions for benchmarking this model end-to-end with QNNPACK.\n\n### Raspberry Pi 2 or 3\n\n```bash\n# Clone PyTorch 1.0 repo\ngit clone --recursive https://github.com/pytorch/pytorch.git\ncd pytorch\n\n# Optional: update QNNPACK submodule to latest revision\ngit submodule update --remote third_party/QNNPACK\n\n# Build Caffe2 (including binaries) for the host system\n# Use only 1 thread for build to avoid out-of-memory failures\nMAX_JOBS=1 scripts/build_local.sh -DBUILD_BINARY=ON -DBUILD_PYTHON=OFF \\\n\t-DUSE_OBSERVERS=OFF -DUSE_DISTRIBUTED=OFF\n\n# Download model weights\nwget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/init_net.pb\n\n# Download model graph\nwget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/predict_net.pb\n\n# Run speed benchmark with 50 warm-up iterations and 10 measurement iterations\nbuild/bin/speed_benchmark --net predict_net.pb --init_net init_net.pb \\\n\t--input data --input_dims 1,3,224,224 --input_type float \\\n\t--warmup 50 --iter 10\n```\n\n### ARMv7 (32-bit) Android\n\n```bash\n# Clone PyTorch 1.0 repo\ngit clone --recursive https://github.com/pytorch/pytorch.git\ncd pytorch\n\n# Optional: update QNNPACK submodule to latest revision\ngit submodule update --remote third_party/QNNPACK\n\n# Build Caffe2 (including binaries) for Android, and push to device\nscripts/build_android.sh -DANDROID_TOOLCHAIN=clang -DBUILD_BINARY=ON\nadb push build_android/bin/speed_benchmark /data/local/tmp/speed_benchmark\n\n# Download model weights and copy them to Android device\nwget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/init_net.pb\nadb push init_net.pb /data/local/tmp/init_net.pb\n\n# Download model graph and copy it to Android device\nwget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/predict_net.pb\nadb push predict_net.pb /data/local/tmp/predict_net.pb\n\n# Run speed benchmark with 50 warm-up iterations and 10 measurement iterations\nadb shell /data/local/tmp/speed_benchmark \\\n\t--net /data/local/tmp/predict_net.pb \\\n\t--init_net /data/local/tmp/init_net.pb \\\n\t--input data --input_dims 1,3,224,224 --input_type float \\\n\t--warmup 50 --iter 10\n```\n\n### ARM64 (64-bit) Android\n\n```bash\n# Clone PyTorch 1.0 repo\ngit clone --recursive https://github.com/pytorch/pytorch.git\ncd pytorch\n\n# Optional: update QNNPACK submodule to latest revision\ngit submodule update --remote third_party/QNNPACK\n\n# Build Caffe2 (including binaries) for Android, and push to device\nscripts/build_android.sh -DANDROID_ABI=arm64-v8a -DANDROID_TOOLCHAIN=clang -DBUILD_BINARY=ON\nadb push build_android/bin/speed_benchmark /data/local/tmp/speed_benchmark\n\n# Download model weights and copy them to Android device\nwget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/init_net.pb\nadb push init_net.pb /data/local/tmp/init_net.pb\n\n# Download model graph and copy it to Android device\nwget https://s3.amazonaws.com/download.caffe2.ai/models/mobilenet_v2_1.0_224_quant/predict_net.pb\nadb push predict_net.pb /data/local/tmp/predict_net.pb\n\n# Run speed benchmark with 50 warm-up iterations and 10 measurement iterations\nadb shell /data/local/tmp/speed_benchmark \\\n\t--net /data/local/tmp/predict_net.pb \\\n\t--init_net /data/local/tmp/init_net.pb \\\n\t--input data --input_dims 1,3,224,224 --input_type float \\\n\t--warmup 50 --iter 10\n```\n\n### PEP (Performance Evaluation Platform) Method\n\n[Facebook AI Performance Evaluation Platform](https://github.com/facebook/FAI-PEP) is a framework and backend agnostic benchmarking platform to compare machine learning inferencing runtime metrics on a set of models and a variety of backends.\n\nWe use PEP to produce the results we have in our [blog](https://code.fb.com/ml-applications/qnnpack/)\n\nWith an ARMv7 device connected:\n\n```bash\n# Clone PyTorch 1.0 repo\nmkdir ~/Code \u0026\u0026 cd ~/Code\ngit clone --recursive https://github.com/pytorch/pytorch.git\ncd pytorch\n\n# Optional: update QNNPACK submodule to latest revision\ngit submodule update --remote third_party/QNNPACK\n\n# Clone PEP repo\ncd ~/Code\ngit clone --recursive https://github.com/facebook/FAI-PEP.git aibench\ncd aibench\n\n# Run PEP benchmark with cool specifications. Try changing that cmd with more specifications!\n# First time compile could take 20+ minutes\n./benchmarking/run_bench.py \\\n  --platform android \\\n  -b ~/Code/aibench/specifications/models/caffe2/mobilenet_v2/mobilenet_v2_quant.json \\\n  --platform android --repo_dir ~/Code/pytorch \\\n  --frameworks_dir ~/Code/aibench/specifications/frameworks --framework caffe2\n```\n\n## Acknowledgements\n\nQNNPACK is developed by Marat Dukhan, Yiming Wu, Hao Lu, and Bert Maher. We thank Andrew Tulloch and Yangqing Jia for advice during the development of QNNPACK.\n\n## License\n\nQNNPACK is BSD licensed, as found in the [`LICENSE`](LICENSE) file.\n","funding_links":[],"categories":["Toolbox","C","Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries"],"sub_categories":["Libraries","Other libraries｜其他库:","Other libraries:"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2FQNNPACK","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytorch%2FQNNPACK","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2FQNNPACK/lists"}