{"id":44338516,"url":"https://github.com/intel/npu-nn-cost-model","last_synced_at":"2026-04-02T11:57:38.325Z","repository":{"id":143102265,"uuid":"599779167","full_name":"intel/npu-nn-cost-model","owner":"intel","description":"Library for modelling performance costs of different Neural Network workloads on NPU devices","archived":false,"fork":false,"pushed_at":"2026-03-24T07:56:21.000Z","size":23888,"stargazers_count":34,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-03-25T09:40:11.291Z","etag":null,"topics":["cost-model","intel","machine-learning","vpu"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/intel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"security.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-02-09T21:30:53.000Z","updated_at":"2026-03-24T07:53:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"ae632aba-8471-4ab1-ae5e-3404a65ad71a","html_url":"https://github.com/intel/npu-nn-cost-model","commit_stats":null,"previous_names":["intel/npu-nn-cost-model","intel/vpu-nn-cost-model"],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/intel/npu-nn-cost-model","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fnpu-nn-cost-model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fnpu-nn-cost-model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fnpu-nn-cost-model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fnpu-nn-cost-model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/intel","download_url":"https://codeload.github.com/intel/npu-nn-cost-model/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intel%2Fnpu-nn-cost-model/sbom","scorecard":{"id":490772,"data":{"date":"2025-08-11","repo":{"name":"github.com/intel/npu-nn-cost-model","commit":"338a471d7db8d71a9db1dd214d23f38e01a782f8"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":6.8,"checks":[{"name":"Maintained","score":10,"reason":"14 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":7,"reason":"Found 8/11 approved changesets -- score normalized to 7","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":10,"reason":"security policy file detected","details":["Info: security policy file detected: security.md:1","Info: Found linked content: security.md:1","Info: Found disclosure, vulnerability, and/or timelines in security policy: security.md:1","Info: Found text in security policy: security.md:1"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Branch-Protection","score":5,"reason":"branch protection is not maximal on development and all release branches","details":["Info: 'allow deletion' disabled on branch 'main'","Info: 'force pushes' disabled on branch 'main'","Info: 'branch protection settings apply to administrators' is required to merge on branch 'main'","Warn: 'stale review dismissal' is disabled on branch 'main'","Warn: required approving review count is 1 on branch 'main'","Warn: codeowners review is not required on branch 'main'","Warn: 'last push approval' is disabled on branch 'main'","Warn: no status checks found to merge onto branch 'main'","Info: PRs are required in order to make changes on branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 27 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}}]},"last_synced_at":"2025-08-19T19:04:33.739Z","repository_id":143102265,"created_at":"2025-08-19T19:04:33.740Z","updated_at":"2025-08-19T19:04:33.740Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31305967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T09:48:21.550Z","status":"ssl_error","status_checked_at":"2026-04-02T09:48:19.196Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cost-model","intel","machine-learning","vpu"],"created_at":"2026-02-11T12:13:57.700Z","updated_at":"2026-04-02T11:57:38.307Z","avatar_url":"https://github.com/intel.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VPUNN cost model\n\nA NN-Based Cost Model for VPU Devices. For additional information about model setup and training, please refer [this paper](https://arxiv.org/abs/2205.04586)\n\nIf you find this work useful, please cite the following paper:\n\n```\n@article{DBLP:journals/corr/abs-2205-04586,\n  doi = {10.48550/ARXIV.2205.04586},\n  url = {https://arxiv.org/abs/2205.04586},\n  author = {Hunter, Ian Frederick Vigogne Goodbody and Palla, Alessandro and Nagy, Sebastian Eusebiu and Richmond, Richard and McAdoo, Kyle},\n  title = {Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to Infer Hardware Performances},\n  publisher = {arXiv},\n  year = {2022},\n  copyright = {arXiv.org perpetual, non-exclusive license}\n}\n```\n\n## Setup\n\nGCC version should be \u003e 9. You can check your GCC version by running `gcc --version` and `g++ --version`\n\nIf you do not set CC and CXX environment variables, `which gcc` and `which g++` are used by default.\n\nCompile the library by typing `cmake -H. -Bbuild \u0026\u0026 cmake --build build`\n\nGeneral architecture diagram of target interactions, using command `cmake -DCBLAS_LIB=openblas -DVPUNN_BUILD_HTTP_CLIENT=ON -DVPUNN_BUILD_APPS=ON -DCMAKE_BUILD_TYPE=Coverage -H. -Bbuild` is presented here:\n![Architecture Diagram](docs/deps.png)\n\n@TODO: environment compatible with newer compiler versions (gcc\u003e=10, clang \u003e10 )  \n\n### Use Intel oneAPI MKL\n\nInstall oneAPI base Toolkit ([instructions](https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html)). oneAPI is massive so feel free to install only the Math Kernel Library library.\n\nIf you have troubles with proxy, please export `no_proxy=127.0.0.1` in order to bypass any no_proxy env vs `*.intel.com` urls\n\nTo enable MKL you need to source this file `/opt/intel/oneapi/setvars.sh` to set the appropriate environment variables. Look [here](https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-oneapi-base-linux/top/run-a-sample-project-with-vscode.html) on how to get started with VSC\n\n### Select BLAS library\n\nYou can select which BLAS library to use (assume you have MKL installed) and the threading mode by using the following cmake variables\n\n- `-DCBLAS_LIB=\u003cvalue\u003e` (options: `mkl` for oneMKL and `openblas` for OpenBLAS)\n- `-DMKL_THREADING=\u003cvalue\u003e` (options: `tbb` for oneAPI Threading Building Blocks and `sequential` for no threading)\n\n## Using the cost model: C++\n\nTo use the VPUN cost model in a cmake project is quite simple. An example of a CMakeLists.txt file is shown below\n\n```cmake\ninclude_directories(${CMAKE_BINARY_DIR}/include)\ninclude_directories(${FLATBUFFERS_SRC_DIR}/include)\n\n...\n\ntarget_link_libraries(\u003cyour exe or lib\u003e inference)\n```\n\nThe following example code explains how to instantiate the cost model and how to run a simple query for a 3x3s1 convolution\n\n```c++\n#include \"vpu_cost_model.h\"\n\nauto model = VPUNN::VPUCostModel(model_path);\n\nauto dpu_cycles = model.DPU({VPUNN::VPUDevice::VPU_2_7,\n                             VPUNN::Operation::CONVOLUTION,\n                             {VPUNN::VPUTensor(56, 56, 16, 1, VPUNN::DataType::UINT8)}, // input dimensions\n                             {VPUNN::VPUTensor(56, 56, 16, 1, VPUNN::DataType::UINT8)}, // output dimensions\n                             {3, 3}, //kernels\n                             {1, 1}, //strides\n                             {1, 1}, //padding\n                             VPUNN::ExecutionMode::CUBOID_16x16} // execution mode\n                            );\n```\n\nThe `example` folder contains few examples on how to build and use the cost model in a C++ project. The following list is a WIP of the supported example:\n\n- `workload_mode_selection`:\n  - Selecting the optimal MPE mode for a VPU_2_0 workload\n  - Choosing the optimal workload split strategy amound multiple ones\n\n## Install\n\nTo test the install option, for find_package(VPUNN) functionality.\n\n```bash\ngit clone \u003crepo-url\u003e \u003crepo-name\u003e\ncd \u003crepo-name\u003e\n```\n```bash\ncmake -S . -B build\ncmake --build build -j\n```\n```bash\ncmake --install build --prefix \u003cinstall-path\u003e [--config \u003cconfig-type-for-multiconfig-systems (windows)\u003e]\n```\n```bash\ncd \u003cyour-custom-repo\u003e\ncmake -S . -B build -DCMAKE_PREFIX_PATH=\u003cinstall-path\u003e\ncmake --build build -j\n```\n\n## Using the cost model: Python\n\nTo install the Python package, run the following command from the root of the repository:\n\n```bash\npip install .\n```\n\nThis will use `scikit-build` to compile the C++ extension (bindings) and install the Python package.\n\nIt is recommended to do this in a python virtual environment.\n\n## Developer guide\n\n### Git hooks\n\nAll developers should install the git hooks that are tracked in the .githooks directory. We use the pre-commit framework for hook management. The recommended way of installing it is using pip:\n\n```bash\npip install pre-commit\n```\n\nThe hooks can then be installed into your local clone using:\n\n```bash\npre-commit install --allow-missing-config\n```\n\n--allow-missing-config is an optional argument that will allow users to have the hooks installed and be functional even if using an older branch that does not have them tracked. A warning will be displayed for such cases when the hooks are ran.\n\nIf you want to manually run all pre-commit hooks on a repository, run `pre-commit run --all-files`. To run individual hooks use `pre-commit run \u003chook_id\u003e`.\n\nUninstalling the hooks can be done using\n\n```bash\npre-commit uninstall\n```\n\n## Testing the library\n\n### Cost model test (C++)\n\nTests uses [Google test suite](https://github.com/google/googletest) for automatizing tests\nTo run the test suite: `ctest --test-dir build/tests/cpp/`\n\nExample: running only cost model integration test: `./tests/cpp/test_cost_model`\n\n### E2E Python test\n\n`pytest tests/python/test_e2e.py -v`\n\n### Code coverage\n\nTo generate Code coverage report you need to enable it in CMake\n\n```shell\ncmake -DCMAKE_BUILD_TYPE=Coverage  .. \u0026\u0026 make coverage -j\n```\n\nThis commands generate a `coverage` folder into the build one with all the coverage information\n\nDependencies:\n\n- Gcov-9 and Gcovr tools are needed in order to generate the report\n- Only GCC is supported (no WASM/Visual Studio)\n\n## Notice about configurations not covered by training, or with greater errors.\n### NPU2.0\nNot Available\n### NPU2.7\n- ISI=CLUSTERING + OWT=2    : replaced at runtime with SOK. runtime should be the same, no input halo used\n- Elementwise + ISI=SOK     : replaced at runtime with clustering + owt=1,  time is a little undervalued, but its the best approximation available\n- CM_CONV (compress convolution) + InputChannels=1\n- SOH (HALO) split with Kernel =1 has probably not been part of training, doesn't make sense to have kernel=1 and input halo.NN predictions are problematic. :   replaced at runtime with Clustering.\n- SOH Halo split , at least when H is small, K small, produces much bigger results than SOH Overlapped. This is not realistic, might be a NN limitation. See VPULayerCostModelTest.Unet_perf_SOH_SOK_after_SOHO\n- Output write tiles is limited to 2. EG also when used as mock for NPU4.0 where more than 2 tiles are present and used for split.\n\n- NPU2.7 splits by H with Halo  were trained  to NN using the memory tensor instead of the general rule for compute tensor (memory tensor is smaller  with half a kernel in general). Calling NN with compute tensor introduces errors by reporting smaller values. To get corrected values (closer to Ground Truth) when generating the descriptor for NNs with interface 11 and SOH isi strategy, we are using not the input tensor, but a computed memory input tensor that mimics the one used at training\n\n### NPU4.0 (in development)\nReusing:when using the 2.7 trained version as mock please read the NPU2.7 section above.\n  - DW_CONV (depthwise convolution)with kernel 3x3 is optimized in NPU4.0, but not in NPU2.7. The NN reported runtime is adjusted with a factor depending on datatype, channels and kernel size\nTrained NN for 4.0: \n  - WIP\n\n### Known problems:\n- NPU2.7: NN was not trained to discriminate the sporadic high runtime for swizzling. EISXW-98656 not solved (ELt wise add with big profiled CLUSTERING, but small SOH) Test: RuntimeELT_CONV_SOH_SOK_EISXW_98656. \nElementwise accepts (at NN run) SWizzling ON or OFF but has to be the same for all in/out/wts  all 0 (OFF), all 5(ON) combinations not trained. *To consider:* training of NN with swizzlings combinations (profiling shows runtime is different)\n\n\n\n## SHAVE operators available\n\nShave version interface 1 (the old one) will be deleted in the near future, do not use it.\nSHAVE v2 interface is active. \n\nDetails of any operator can be obtained by  calling: ShaveOpExecutor::toString() method. \n\nFor most updated list of operators and their details see also the unit tests: TestSHAVE.SHAVE_v2_ListOfOperators, TestSHAVE.SHAVE_v2_ListOfOperatorsDetails_27,... .\n\nFor information about the profiled operators and extraparameters you can consult this [document](src/shave/Readme.md#shave-current-operators)\n\n## Cost providers\n\nThe cost model is designed to be extensible. The cost providers are the classes that implement the cost model for a specific device. The cost providers are selected at runtime based on the device type. The following cost providers are available:\n- NN based cost provider - is a learned performance model.\n- Theoretical cost provider - is a simple mathematical model.\n- \"Oracle\" cost provider - a LUT of measured performance for specific workloads.\n- Profiled cost provider - it's an http service that can be queried to get the measured performance of a specific workload.\n    - Currently it supports only DPU costs and it can be configured using the following env. variables\n        - `ENABLE_VPUNN_PROFILING_SERVICE` -- `TRUE` to enable the profiling service\n        - `VPUNN_PROFILING_SERVICE_BACKEND` -- `silicon` to use the RVP for profiling, `vpuem` to use VPUEM as a cost provider.\n        - `VPUNN_PROFILING_SERVICE_HOST` -- address of the profiling service host, default is `irlccggpu04.ir.intel.com`\n        - `VPUNN_PROFILING_SERVICE_PORT` -- port of the profiling service, default is `5000`\n\nTo see a list of all queried workloads and which cost provider was used for each, set the environment variable `ENABLE_VPUNN_DATA_SERIALIZATION` to `TRUE`.\nThis will generate a couple of `csv` files in the directory where vpunn is used.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintel%2Fnpu-nn-cost-model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintel%2Fnpu-nn-cost-model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintel%2Fnpu-nn-cost-model/lists"}