{"id":13514897,"url":"https://github.com/KomputeProject/kompute","last_synced_at":"2025-03-31T04:36:09.637Z","repository":{"id":37392677,"uuid":"283406173","full_name":"KomputeProject/kompute","owner":"KomputeProject","description":"General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA \u0026 friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.","archived":false,"fork":false,"pushed_at":"2024-11-16T07:21:27.000Z","size":26533,"stargazers_count":2023,"open_issues_count":79,"forks_count":156,"subscribers_count":33,"default_branch":"master","last_synced_at":"2024-12-05T09:07:30.804Z","etag":null,"topics":["cpp","deep-learning","deep-learning-gpu","gpgpu","gpu-computing","machine-learning","machine-learning-gpu","python","vulkan","vulkan-compute","vulkan-compute-example","vulkan-compute-framework","vulkan-compute-tutorial","vulkan-demos","vulkan-example","vulkan-tutorial"],"latest_commit_sha":null,"homepage":"http://kompute.cc/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KomputeProject.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":"GOVERNANCE.md","roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-29T05:23:33.000Z","updated_at":"2024-12-04T14:59:17.000Z","dependencies_parsed_at":"2022-07-14T23:46:01.287Z","dependency_job_id":"b217fc85-e81b-4697-a5ae-1daa554bdf1d","html_url":"https://github.com/KomputeProject/kompute","commit_stats":null,"previous_names":["axsaucedo/vulkan-kompute","ethicalml/vulkan-kompute"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KomputeProject%2Fkompute","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KomputeProject%2Fkompute/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KomputeProject%2Fkompute/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KomputeProject%2Fkompute/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KomputeProject","download_url":"https://codeload.github.com/KomputeProject/kompute/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246418658,"owners_count":20773934,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","deep-learning","deep-learning-gpu","gpgpu","gpu-computing","machine-learning","machine-learning-gpu","python","vulkan","vulkan-compute","vulkan-compute-example","vulkan-compute-framework","vulkan-compute-tutorial","vulkan-demos","vulkan-example","vulkan-tutorial"],"created_at":"2024-08-01T05:01:03.356Z","updated_at":"2025-03-31T04:36:04.580Z","avatar_url":"https://github.com/KomputeProject.png","language":"C++","readme":"\n![GitHub](https://img.shields.io/badge/Version-0.7.0-green.svg)\n![GitHub](https://img.shields.io/badge/C++-14—20-purple.svg)\n![GitHub](https://img.shields.io/badge/Build-cmake-red.svg)\n![GitHub](https://img.shields.io/badge/Python-3.7—3.9-blue.svg)\n![GitHub](https://img.shields.io/badge/License-Apache-black.svg)\n[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/4834/badge)](https://bestpractices.coreinfrastructure.org/projects/4834)\n\n\u003ctable\u003e\n\u003ctr\u003e\n\n\u003ctd width=\"20%\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/kompute.jpg\"\u003e\n\u003c/td\u003e\n\n\u003ctd\u003e\n\n\u003ch1\u003eKompute\u003c/h1\u003e\n\u003ch3\u003eThe general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA \u0026 friends)\u003c/h3\u003e\n\n\u003c/td\u003e\n\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\u003ch4\u003eBlazing fast, mobile-enabled, asynchronous, and optimized for advanced GPU acceleration usecases.\u003c/h4\u003e\n\n💬 [Join the Discord \u0026 Community Calls](https://kompute.cc/overview/community.html) 🔋 [Documentation](https://kompute.cc) 💻 [Blog Post](https://medium.com/@AxSaucedo/machine-learning-and-data-processing-in-the-gpu-with-vulkan-kompute-c9350e5e5d3a) ⌨ [Examples](#more-examples) 💾\n\n\u003chr\u003e\n\n##### Kompute is backed by the Linux Foundation as a \u003ca href=\"https://lfaidata.foundation/blog/2021/08/26/kompute-joins-lf-ai-data-as-new-sandbox-project/\"\u003ehosted project\u003c/a\u003e by the LF AI \u0026 Data Foundation.\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\n\u003ca href=\"https://www.linuxfoundation.org/projects/\"\u003e\n\u003cimg src=\"https://upload.wikimedia.org/wikipedia/commons/b/b5/Linux_Foundation_logo.png\"\u003e\n\u003c/a\u003e\n\u003c/td\u003e\n\u003ctd\u003e\n\u003ca href=\"https://lfaidata.foundation/projects/\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/lfai/artwork/main/lfaidata-assets/lfaidata/horizontal/color/lfaidata-horizontal-color.png\"\u003e\n\u003c/a\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\n## Principles \u0026 Features\n\n* [Flexible Python module](#your-first-kompute-python) with [C++ SDK](#your-first-kompute-c) for optimizations\n* [Asynchronous \u0026 parallel processing](#asynchronous-and-parallel-operations) support through GPU family queues\n* [Mobile enabled](#mobile-enabled) with examples via Android NDK across several architectures\n* BYOV: [Bring-your-own-Vulkan design](#motivations) to play nice with existing Vulkan applications\n* Explicit relationships for GPU and host [memory ownership and memory management](https://kompute.cc/overview/memory-management.html)\n* Robust codebase with [90% unit test code coverage](https://kompute.cc/codecov/)\n* Advanced use-cases on [machine learning 🤖](https://towardsdatascience.com/machine-learning-and-data-processing-in-the-gpu-with-vulkan-kompute-c9350e5e5d3a), [mobile development 📱](https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617) and [game development 🎮](https://towardsdatascience.com/supercharging-game-development-with-gpu-accelerated-ml-using-vulkan-kompute-the-godot-game-engine-4e75a84ea9f0).\n* Active community with [monthly calls, discord chat and more](https://kompute.cc/overview/community.html)\n\n![](https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/komputer-logos.gif)\n\n## Projects using Kompute ❤️  🤖\n\n* [GPT4ALL](https://github.com/nomic-ai/gpt4all) ![](https://img.shields.io/github/stars/nomic-ai/gpt4all.svg?style=social) - An ecosystem of open-source on-edge large language models that run locally on your CPU and nearly any GPU.\n* [llama.cpp](https://github.com/ggerganov/llama.cpp) ![](https://img.shields.io/github/stars/ggerganov/llama.cpp.svg?style=social) - Port of Facebook's LLaMA model in C/C++.\n* [tpoisonooo/how-to-optimize-gemm](https://github.com/tpoisonooo/how-to-optimize-gemm) ![](https://img.shields.io/github/stars/tpoisonooo/how-to-optimize-gemm.svg?style=social) - row-major matmul optimization.\n* [vkJAX](https://github.com/alexander-g/vkJAX) ![](https://img.shields.io/github/stars/alexander-g/vkJAX.svg?style=social) - JAX interpreter for Vulkan.\n\n## Getting Started\n\nBelow you can find a GPU multiplication example using the C++ and Python Kompute interfaces.\n\nYou can [join the Discord](https://discord.gg/MaH5Jv5zwv) for questions / discussion, open a [github issue](https://github.com/KomputeProject/kompute/issues/new), or read [the documentation](https://kompute.cc/).\n\n### Your First Kompute (C++)\n\nThe C++ interface provides low level access to the native components of Kompute, enabling for [advanced optimizations](https://kompute.cc/overview/async-parallel.html) as well as [extension of components](https://kompute.cc/overview/reference.html).\n\n```c++\n\nvoid kompute(const std::string\u0026 shader) {\n\n    // 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)\n    kp::Manager mgr; \n\n    // 2. Create and initialise Kompute Tensors through manager\n\n    // Default tensor constructor simplifies creation of float values\n    auto tensorInA = mgr.tensor({ 2., 2., 2. });\n    auto tensorInB = mgr.tensor({ 1., 2., 3. });\n    // Explicit type constructor supports uint32, int32, double, float and bool\n    auto tensorOutA = mgr.tensorT\u003cuint32_t\u003e({ 0, 0, 0 });\n    auto tensorOutB = mgr.tensorT\u003cuint32_t\u003e({ 0, 0, 0 });\n\n    std::vector\u003cstd::shared_ptr\u003ckp::Memory\u003e\u003e params = {tensorInA, tensorInB, tensorOutA, tensorOutB};\n\n    // 3. Create algorithm based on shader (supports buffers \u0026 push/spec constants)\n    kp::Workgroup workgroup({3, 1, 1});\n    std::vector\u003cfloat\u003e specConsts({ 2 });\n    std::vector\u003cfloat\u003e pushConstsA({ 2.0 });\n    std::vector\u003cfloat\u003e pushConstsB({ 3.0 });\n\n    auto algorithm = mgr.algorithm(params,\n                                   // See documentation shader section for compileSource\n                                   compileSource(shader),\n                                   workgroup,\n                                   specConsts,\n                                   pushConstsA);\n\n    // 4. Run operation synchronously using sequence\n    mgr.sequence()\n        -\u003erecord\u003ckp::OpSyncDevice\u003e(params)\n        -\u003erecord\u003ckp::OpAlgoDispatch\u003e(algorithm) // Binds default push consts\n        -\u003eeval() // Evaluates the two recorded operations\n        -\u003erecord\u003ckp::OpAlgoDispatch\u003e(algorithm, pushConstsB) // Overrides push consts\n        -\u003eeval(); // Evaluates only last recorded operation\n\n    // 5. Sync results from the GPU asynchronously\n    auto sq = mgr.sequence();\n    sq-\u003eevalAsync\u003ckp::OpSyncLocal\u003e(params);\n\n    // ... Do other work asynchronously whilst GPU finishes\n\n    sq-\u003eevalAwait();\n\n    // Prints the first output which is: { 4, 8, 12 }\n    for (const float\u0026 elem : tensorOutA-\u003evector()) std::cout \u003c\u003c elem \u003c\u003c \"  \";\n    // Prints the second output which is: { 10, 10, 10 }\n    for (const float\u0026 elem : tensorOutB-\u003evector()) std::cout \u003c\u003c elem \u003c\u003c \"  \";\n\n} // Manages / releases all CPU and GPU memory resources\n\nint main() {\n\n    // Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header\n    // files). This shader shows some of the main components including constants, buffers, etc\n    std::string shader = (R\"(\n        #version 450\n\n        layout (local_size_x = 1) in;\n\n        // The input tensors bind index is relative to index in parameter passed\n        layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };\n        layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };\n        layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };\n        layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };\n\n        // Kompute supports push constants updated on dispatch\n        layout(push_constant) uniform PushConstants {\n            float val;\n        } push_const;\n\n        // Kompute also supports spec constants on initalization\n        layout(constant_id = 0) const float const_one = 0;\n\n        void main() {\n            uint index = gl_GlobalInvocationID.x;\n            out_a[index] += uint( in_a[index] * in_b[index] );\n            out_b[index] += uint( const_one * push_const.val );\n        }\n    )\");\n\n    // Run the function declared above with our raw string shader\n    kompute(shader);\n}\n\n```\n\n### Your First Kompute (Python)\n\nThe [Python package](https://kompute.cc/overview/python-package.html) provides a [high level interactive interface](https://kompute.cc/overview/python-reference.html) that enables for experimentation whilst ensuring high performance and fast development workflows.\n\n```python\n\nfrom .utils import compile_source # using util function from python/test/utils\n\ndef kompute(shader):\n    # 1. Create Kompute Manager with default settings (device 0, first queue and no extensions)\n    mgr = kp.Manager()\n\n    # 2. Create and initialise Kompute Tensors through manager\n\n    # Default tensor constructor simplifies creation of float values\n    tensor_in_a = mgr.tensor([2, 2, 2])\n    tensor_in_b = mgr.tensor([1, 2, 3])\n    # Explicit type constructor supports uint32, int32, double, float and bool\n    tensor_out_a = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))\n    tensor_out_b = mgr.tensor_t(np.array([0, 0, 0], dtype=np.uint32))\n\n    params = [tensor_in_a, tensor_in_b, tensor_out_a, tensor_out_b]\n\n    # 3. Create algorithm based on shader (supports buffers \u0026 push/spec constants)\n    workgroup = (3, 1, 1)\n    spec_consts = [2]\n    push_consts_a = [2]\n    push_consts_b = [3]\n\n    # See documentation shader section for compile_source\n    spirv = compile_source(shader)\n\n    algo = mgr.algorithm(params, spirv, workgroup, spec_consts, push_consts_a)\n\n    # 4. Run operation synchronously using sequence\n    (mgr.sequence()\n        .record(kp.OpTensorSyncDevice(params))\n        .record(kp.OpAlgoDispatch(algo)) # Binds default push consts provided\n        .eval() # evaluates the two recorded ops\n        .record(kp.OpAlgoDispatch(algo, push_consts_b)) # Overrides push consts\n        .eval()) # evaluates only the last recorded op\n\n    # 5. Sync results from the GPU asynchronously\n    sq = mgr.sequence()\n    sq.eval_async(kp.OpTensorSyncLocal(params))\n\n    # ... Do other work asynchronously whilst GPU finishes\n\n    sq.eval_await()\n\n    # Prints the first output which is: { 4, 8, 12 }\n    print(tensor_out_a)\n    # Prints the first output which is: { 10, 10, 10 }\n    print(tensor_out_b)\n\nif __name__ == \"__main__\":\n\n    # Define a raw string shader (or use the Kompute tools to compile to SPIRV / C++ header\n    # files). This shader shows some of the main components including constants, buffers, etc\n    shader = \"\"\"\n        #version 450\n\n        layout (local_size_x = 1) in;\n\n        // The input tensors bind index is relative to index in parameter passed\n        layout(set = 0, binding = 0) buffer buf_in_a { float in_a[]; };\n        layout(set = 0, binding = 1) buffer buf_in_b { float in_b[]; };\n        layout(set = 0, binding = 2) buffer buf_out_a { uint out_a[]; };\n        layout(set = 0, binding = 3) buffer buf_out_b { uint out_b[]; };\n\n        // Kompute supports push constants updated on dispatch\n        layout(push_constant) uniform PushConstants {\n            float val;\n        } push_const;\n\n        // Kompute also supports spec constants on initalization\n        layout(constant_id = 0) const float const_one = 0;\n\n        void main() {\n            uint index = gl_GlobalInvocationID.x;\n            out_a[index] += uint( in_a[index] * in_b[index] );\n            out_b[index] += uint( const_one * push_const.val );\n        }\n    \"\"\"\n\n    kompute(shader)\n\n```\n\n### Interactive Notebooks \u0026 Hands on Videos\n\nYou are able to try out the interactive Colab Notebooks which allow you to use a free GPU. The available examples are the Python and C++ examples below:\n\n\u003ctable\u003e\n\u003ctr\u003e\n\n\u003ctd width=\"50%\"\u003e\n\u003ch5\u003eTry the interactive \u003ca href=\"https://colab.research.google.com/drive/1l3hNSq2AcJ5j2E3YIw__jKy5n6M615GP?usp=sharing\"\u003eC++ Colab\u003c/a\u003e from \u003ca href=\"https://towardsdatascience.com/machine-learning-and-data-processing-in-the-gpu-with-vulkan-kompute-c9350e5e5d3a\"\u003eBlog Post\u003c/a\u003e\u003c/h5\u003e\n\u003c/td\u003e\n\n\u003ctd\u003e\n\u003ch5\u003eTry the interactive \u003ca href=\"https://colab.research.google.com/drive/15uQ7qMZuOyk8JcXF-3SB2R5yNFW21I4P\"\u003ePython Colab\u003c/a\u003e from \u003ca href=\"https://towardsdatascience.com/beyond-cuda-gpu-accelerated-python-for-machine-learning-in-cross-vendor-graphics-cards-made-simple-6cc828a45cc3\"\u003eBlog Post\u003c/a\u003e\u003c/h5\u003e\n\u003c/td\u003e\n\n\u003c/tr\u003e\n\u003ctr\u003e\n\n\u003ctd width=\"50%\"\u003e\n\u003ca href=\"https://colab.research.google.com/drive/1l3hNSq2AcJ5j2E3YIw__jKy5n6M615GP?authuser=1#scrollTo=1BipBsO-fQRD\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/binder-cpp.jpg\"\u003e\n\u003c/a\u003e\n\u003c/td\u003e\n\n\u003ctd\u003e\n\u003ca href=\"https://colab.research.google.com/drive/15uQ7qMZuOyk8JcXF-3SB2R5yNFW21I4P\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/binder-python.jpg\"\u003e\n\u003c/a\u003e\n\u003c/td\u003e\n\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\nYou can also check out the two following talks presented at the FOSDEM 2021 conference. \n\nBoth videos have timestamps which will allow you to skip to the most relevant section for you - the intro \u0026 motivations for both is almost the same so you can skip to the more specific content.\n\n\u003ctable\u003e\n\u003ctr\u003e\n\n\u003ctd width=\"50%\"\u003e\n\u003ch5\u003eWatch the video for \u003ca href=\"https://www.youtube.com/watch?v=Xz4fiQNmGSA\"\u003eC++ Enthusiasts\u003c/a\u003e \u003c/h5\u003e\n\u003c/td\u003e\n\n\u003ctd\u003e\n\u003ch5\u003eWatch the video for \u003ca href=\"https://www.youtube.com/watch?v=AJRyZ09IUdg\"\u003ePython \u0026 Machine Learning\u003c/a\u003e Enthusiasts\u003c/h5\u003e\n\u003c/td\u003e\n\n\u003c/tr\u003e\n\u003ctr\u003e\n\n\u003ctd width=\"50%\"\u003e\n\u003ca href=\"https://www.youtube.com/watch?v=Xz4fiQNmGSA\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/kompute-cpp-video.png\"\u003e\n\u003c/a\u003e\n\u003c/td\u003e\n\n\u003ctd\u003e\n\u003ca href=\"https://www.youtube.com/watch?v=AJRyZ09IUdg\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/kompute-python-video.png\"\u003e\n\u003c/a\u003e\n\u003c/td\u003e\n\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\n## Architectural Overview\n\nThe core architecture of Kompute includes the following:\n* [Kompute Manager](https://kompute.cc/overview/reference.html#manager) - Base orchestrator which creates and manages device and child components\n* [Kompute Sequence](https://kompute.cc/overview/reference.html#sequence) - Container of operations that can be sent to GPU as batch\n* [Kompute Operation (Base)](https://kompute.cc/overview/reference.html#algorithm) - Base class from which all operations inherit\n* [Kompute Tensor](https://kompute.cc/overview/reference.html#tensor) - Tensor structured data used in GPU operations\n* [Kompute Algorithm](https://kompute.cc/overview/reference.html#algorithm) - Abstraction for (shader) logic executed in the GPU\n\nTo see a full breakdown you can read further in the [C++ Class Reference](https://kompute.cc/overview/reference.html).\n\n\u003ctable\u003e\n\u003cth\u003e\nFull Architecture\n\u003c/th\u003e\n\u003cth\u003e\nSimplified Kompute Components\n\u003c/th\u003e\n\u003ctr\u003e\n\u003ctd width=30%\u003e\n\n\n\u003cimg width=\"100%\" src=\"https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/kompute-vulkan-architecture.jpg\"\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n(very tiny, check the \u003ca href=\"https://ethicalml.github.io/vulkan-kompute/overview/reference.html\"\u003efull reference diagram in docs for details\u003c/a\u003e)\n\u003cbr\u003e\n\u003cbr\u003e\n\n\u003cimg width=\"100%\" src=\"https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/suspicious.jfif\"\u003e\n\n\u003c/td\u003e\n\u003ctd\u003e\n\u003cimg width=\"100%\" src=\"https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/kompute-architecture.jpg\"\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\n## Asynchronous and Parallel Operations\n\nKompute provides flexibility to run operations in an asynrchonous way through vk::Fences. Furthermore, Kompute enables for explicit allocation of queues, which allow for parallel execution of operations across queue families.\n\nThe image below provides an intuition on how Kompute Sequences can be allocated to different queues to enable parallel execution based on hardware. You can see the [hands on example](https://kompute.cc/overview/advanced-examples.html#parallel-operations), as well as the [detailed documentation page](https://kompute.cc/overview/async-parallel.html) describing how it would work using an NVIDIA 1650 as an example. \n\n![](https://raw.githubusercontent.com/KomputeProject/kompute/master/docs/images/queue-allocation.jpg)\n\n## Mobile Enabled\n\nKompute has been optimized to work in mobile environments. The [build system](#build-overview) enables for dynamic loading of the Vulkan shared library for Android environments, together with a working [Android NDK wrapper](https://github.com/KomputeProject/kompute/tree/master/vk_ndk_wrapper_include) for the CPP headers.\n\n\u003ctable\u003e\n\u003ctr\u003e\n\n\u003ctd width=\"70%\"\u003e\n\u003cp\u003e\nFor a full deep dive you can read the blog post \"\u003ca href=\"https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617\"\u003eSupercharging your Mobile Apps with On-Device GPU Accelerated Machine Learning\u003c/a\u003e\". \n\nYou can also access the \u003ca href=\"https://github.com/KomputeProject/kompute/tree/v0.4.0/examples/android/android-simple\"\u003eend-to-end example code\u003c/a\u003e in the repository, which can be run using android studio.\n\n\u003c/p\u003e\n\n\n\u003cimg src=\"https://raw.githubusercontent.com/KomputeProject/kompute/android-example/docs/images/android-editor.jpg\"\u003e\n\n\u003c/td\u003e\n\n\n\u003ctd width=\"30%\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/KomputeProject/kompute/android-example/docs/images/android-kompute.jpg\"\u003e\n\u003c/td\u003e\n\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## More examples\n\n### Simple examples\n\n* [Simple multiplication example](https://kompute.cc/overview/advanced-examples.html#simple-shader-example)\n* [Record batch commands with a Kompute Sequence](https://kompute.cc/overview/advanced-examples.html#record-batch-commands)\n* [Run Asynchronous Operations](https://kompute.cc/overview/advanced-examples.html#asynchronous-operations)\n* [Run Parallel Operations Across Multiple GPU Queues](https://kompute.cc/overview/advanced-examples.html#parallel-operations)\n* [Create your custom Kompute Operations](https://kompute.cc/overview/advanced-examples.html#your-custom-kompute-operation)\n* [Implementing logistic regression from scratch](https://kompute.cc/overview/advanced-examples.html#logistic-regression-example)\n\n### End-to-end examples\n\n* [Machine Learning Logistic Regression Implementation](https://towardsdatascience.com/machine-learning-and-data-processing-in-the-gpu-with-vulkan-kompute-c9350e5e5d3a)\n* [Parallelizing GPU-intensive Workloads via Multi-Queue Operations](https://towardsdatascience.com/parallelizing-heavy-gpu-workloads-via-multi-queue-operations-50a38b15a1dc)\n* [Android NDK Mobile Kompute ML Application](https://towardsdatascience.com/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617)\n* [Game Development Kompute ML in Godot Engine](https://towardsdatascience.com/supercharging-game-development-with-gpu-accelerated-ml-using-vulkan-kompute-the-godot-game-engine-4e75a84ea9f0)\n\n## Python Package\n\nBesides the C++ core SDK you can also use the Python package of Kompute, which exposes the same core functionality, and supports interoperability with Python objects like Lists, Numpy Arrays, etc.\n\nThe only dependencies are Python 3.5+ and Cmake 3.4.1+. You can install Kompute from the [Python pypi package](https://pypi.org/project/kp/) using the following command.\n\n```\npip install kp\n```\n\nYou can also install from master branch using:\n\n```\npip install git+git://github.com/KomputeProject/kompute.git@master\n```\n\nFor further details you can read the [Python Package documentation](https://kompute.cc/overview/python-package.html) or the [Python Class Reference documentation](https://kompute.cc/overview/python-reference.html).\n\n## C++ Build Overview\n\nThe build system provided uses `cmake`, which allows for cross platform builds.\n\nThe top level `Makefile` provides a set of optimized configurations for development as well as the docker image build, but you can start a build with the following command:\n\n```\n   cmake -Bbuild\n```\n\nYou also are able to add Kompute in your repo with `add_subdirectory` - the [Android example CMakeLists.txt file](https://github.com/KomputeProject/kompute/blob/7c8c0eeba2cdc098349fcd999102bb2cca1bf711/examples/android/android-simple/app/src/main/cpp/CMakeLists.txt#L3) shows how this would be done.\n\nFor a more advanced overview of the build configuration check out the [Build System Deep Dive](https://kompute.cc/overview/build-system.html) documentation.\n\n## Kompute Development\n\nWe appreciate PRs and Issues. If you want to contribute try checking the \"Good first issue\" tag, but even using Kompute and reporting issues is a great contribution!\n\n### Contributing\n\n#### Dev Dependencies\n\n* Testing\n    + GTest\n* Documentation\n    + Doxygen (with Dot)\n    + Sphynx\n\n#### Development\n\n* Follows Mozilla C++ Style Guide https://www-archive.mozilla.org/hacking/mozilla-style-guide.html\n    + Uses post-commit hook to run the linter, you can set it up so it runs the linter before commit\n    + All dependencies are defined in vcpkg.json \n* Uses cmake as build system, and provides a top level makefile with recommended command\n* Uses xxd (or xxd.exe windows 64bit port) to convert shader spirv to header files\n* Uses doxygen and sphinx for documentation and autodocs\n* Uses vcpkg for finding the dependencies, it's the recommended set up to retrieve the libraries\n\nIf you want to run with debug layers you can add them with the `KOMPUTE_ENV_DEBUG_LAYERS` parameter as:\n\n```\nexport KOMPUTE_ENV_DEBUG_LAYERS=\"VK_LAYER_LUNARG_api_dump\"\n```\n\n##### Updating documentation\n\nTo update the documentation you will need to:\n* Run the gendoxygen target in the build system\n* Run the gensphynx target in the build-system \n* Push to github pages with `make push_docs_to_ghpages`\n\n##### Running tests\n\nRunning the unit tests has been significantly simplified for contributors.\n\nThe tests run on CPU, and can be triggered using the ACT command line interface (https://github.com/nektos/act) - once you install the command line (And start the Docker daemon) you just have to type:\n\n```\n$ act\n\n[Python Tests/python-tests] 🚀  Start image=axsauze/kompute-builder:0.2\n[C++ Tests/cpp-tests      ] 🚀  Start image=axsauze/kompute-builder:0.2\n[C++ Tests/cpp-tests      ]   🐳  docker run image=axsauze/kompute-builder:0.2 entrypoint=[\"/usr/bin/tail\" \"-f\" \"/dev/null\"] cmd=[]\n[Python Tests/python-tests]   🐳  docker run image=axsauze/kompute-builder:0.2 entrypoint=[\"/usr/bin/tail\" \"-f\" \"/dev/null\"] cmd=[]\n...\n```\n\nThe repository contains unit tests for the C++ and Python code, and can be found under the `test/` and `python/test` folder.\n\nThe tests are currently run through the CI using Github Actions. It uses the images found in `docker-builders/`.\n\nIn order to minimise hardware requirements the tests can run without a GPU, directly in the CPU using [Swiftshader](https://github.com/google/swiftshader).\n\nFor more information on how the CI and tests are setup, you can go to the [CI, Docker and Tests Section](https://kompute.cc/overview/ci-tests.html) in the documentation.\n\n## Motivations\n\nThis project started after seeing that a lot of new and renowned ML \u0026 DL projects like Pytorch, Tensorflow, Alibaba DNN, Tencent NCNN - among others - have either integrated or are looking to integrate the Vulkan SDK to add mobile (and cross-vendor) GPU support.\n\nThe Vulkan SDK offers a great low level interface that enables for highly specialized optimizations - however it comes at a cost of highly verbose code which requires 500-2000 lines of code to even begin writing application code. This has resulted in each of these projects having to implement the same baseline to abstract the non-compute related features of the Vulkan SDK. This large amount of non-standardised boiler-plate can result in limited knowledge transfer, higher chance of unique framework implementation bugs being introduced, etc.\n\nWe are currently developing Kompute not to hide the Vulkan SDK interface (as it's incredibly well designed) but to augment it with a direct focus on the Vulkan SDK's GPU computing capabilities. [This article](https://towardsdatascience.com/machine-learning-and-data-processing-in-the-gpu-with-vulkan-kompute-c9350e5e5d3a) provides a high level overview of the motivations of Kompute, together with a set of hands on examples that introduce both GPU computing as well as the core Kompute architecture.\n","funding_links":[],"categories":["C++","Software","Computation and Communication Optimisation","GPU实用程序"],"sub_categories":["Trends"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKomputeProject%2Fkompute","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FKomputeProject%2Fkompute","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKomputeProject%2Fkompute/lists"}