{"id":13809652,"url":"https://github.com/charles-r-earp/krnl","last_synced_at":"2025-10-05T07:41:34.622Z","repository":{"id":152486606,"uuid":"524713091","full_name":"charles-r-earp/krnl","owner":"charles-r-earp","description":"Safe, portable, high performance compute (GPGPU) kernels.","archived":false,"fork":false,"pushed_at":"2025-09-23T20:47:47.000Z","size":18584,"stargazers_count":237,"open_issues_count":3,"forks_count":15,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-09-23T22:26:48.753Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/charles-r-earp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-08-14T15:45:16.000Z","updated_at":"2025-09-23T20:47:46.000Z","dependencies_parsed_at":"2024-02-27T04:33:00.475Z","dependency_job_id":"02291ed0-2ecf-43d9-a28b-357001221083","html_url":"https://github.com/charles-r-earp/krnl","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":"charles-r-earp/rust-template","purl":"pkg:github/charles-r-earp/krnl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-r-earp%2Fkrnl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-r-earp%2Fkrnl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-r-earp%2Fkrnl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-r-earp%2Fkrnl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/charles-r-earp","download_url":"https://codeload.github.com/charles-r-earp/krnl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/charles-r-earp%2Fkrnl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278425461,"owners_count":25984685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T02:00:33.633Z","updated_at":"2025-10-05T07:41:34.605Z","avatar_url":"https://github.com/charles-r-earp.png","language":"Rust","funding_links":[],"categories":["Rust","Frameworks","GPU Programming","GPU Computing"],"sub_categories":[],"readme":"[![DocsBadge]][Docs]\n[![build](https://github.com/charles-r-earp/krnl/actions/workflows/ci.yaml/badge.svg)](https://github.com/charles-r-earp/krnl/actions/workflows/ci.yaml)\n\n[Docs]: https://docs.rs/krnl\n[DocsBadge]: https://docs.rs/krnl/badge.svg\n\n# krnl\n\nSafe, portable, high performance compute (GPGPU) kernels.\n\nDeveloped for [autograph](https://github.com/charles-r-earp/autograph).\n\n- Similar functionality to CUDA and OpenCL.\n- Supports GPU's and other Vulkan 1.2 capable devices.\n- MacOS / iOS supported via [MoltenVK](https://github.com/KhronosGroup/MoltenVK).\n- Kernels are written inline, entirely in Rust.\n  - Simple iterator patterns can be implemented without unsafe.\n  - Supports inline [SPIR-V](https://www.khronos.org/spir) assembly.\n  - DebugPrintf integration, generates backtraces for panics.\n- Buffers on the host can be accessed natively as Vecs and slices.\n\n# krnlc\n\nKernel compiler for krnl.\n\n- Built on [spirv-builder](https://github.com/EmbarkStudios/rust-gpu/tree/main/crates/spirv-builder).\n- Supports dependencies defined in Cargo.toml.\n- Uses [spirv-tools](https://github.com/EmbarkStudios/spirv-tools-rs) to validate and optimize.\n- Compiles to \"krnl-cache.rs\", so the crate will build on stable Rust.\n\nSee the docs for installation and usage instructions.\n\n# Installing\n\nFor device functionality (kernels), install [Vulkan](https://www.vulkan.org) for your platform.\n\n- For development, it's recomended to install the [LunarG Vulkan SDK](https://www.lunarg.com/vulkan-sdk/), which includes additional tools:\n  - vulkaninfo\n  - Validation layers\n    - DebugPrintf\n  - spirv-tools\n    - This is used by krnlc for spirv validation and optimization.\n      - krnlc builds by default without needing spirv-tools to be installed.\n\n## Test\n\n- Check that `vulkaninfo --summary` shows your devices.\n  - Instance version should be \u003e= 1.2.\n- Alternatively, check that `cargo test --test integration_tests -- --exact none` shows your devices.\n  - You can run all the tests with `cargo test --all-features`.\n\n# Getting Started\n\nSee the [docs](https://docs.rs/krnl) or build them locally with `cargo doc --all-features`.\n\n# Example\n\n```rust\nuse krnl::{\n    macros::module,\n    anyhow::Result,\n    device::Device,\n    buffer::{Buffer, Slice, SliceMut},\n};\n\n#[module]\nmod kernels {\n    #[cfg(not(target_arch = \"spirv\"))]\n    use krnl::krnl_core;\n    use krnl_core::macros::kernel;\n\n    pub fn saxpy_impl(alpha: f32, x: f32, y: \u0026mut f32) {\n        *y += alpha * x;\n    }\n\n    // Item kernels for iterator patterns.\n    #[kernel]\n    pub fn saxpy(alpha: f32, #[item] x: f32, #[item] y: \u0026mut f32) {\n        saxpy_impl(alpha, x, y);\n    }\n\n    // General purpose kernels like CUDA / OpenCL.\n    #[kernel]\n    pub fn saxpy_global(alpha: f32, #[global] x: Slice\u003cf32\u003e, #[global] y: UnsafeSlice\u003cf32\u003e) {\n        use krnl_core::buffer::UnsafeIndex;\n\n        let global_id = kernel.global_id();\n        if global_id \u003c x.len().min(y.len()) {\n            saxpy_impl(alpha, x[global_id], unsafe { y.unsafe_index_mut(global_id) });\n        }\n    }\n}\n\nfn saxpy(alpha: f32, x: Slice\u003cf32\u003e, mut y: SliceMut\u003cf32\u003e) -\u003e Result\u003c()\u003e {\n    if let Some((x, y)) = x.as_host_slice().zip(y.as_host_slice_mut()) {\n        x.iter()\n            .copied()\n            .zip(y.iter_mut())\n            .for_each(|(x, y)| kernels::saxpy_impl(alpha, x, y));\n        return Ok(());\n    }\n    if true {\n        kernels::saxpy::builder()?\n            .build(y.device())?\n            .dispatch(alpha, x, y)\n    } else {\n        // or\n        kernels::saxpy_global::builder()?\n            .build(y.device())?\n            .with_global_threads(y.len() as u32)\n            .dispatch(alpha, x, y)\n    }\n}\n\nfn main() -\u003e Result\u003c()\u003e {\n    let x = vec![1f32];\n    let alpha = 2f32;\n    let y = vec![0f32];\n    let device = Device::builder().build().ok().unwrap_or(Device::host());\n    let x = Buffer::from(x).into_device(device.clone())?;\n    let mut y = Buffer::from(y).into_device(device.clone())?;\n    saxpy(alpha, x.as_slice(), y.as_slice_mut())?;\n    let y = y.into_vec()?;\n    println!(\"{y:?}\");\n    Ok(())\n}\n```\n\n# Performance\n\n_NVIDIA GeForce GTX 1060 with Max-Q Design_\n\n[benches/compute-benches](benches/compute-benches)\n\n## alloc\n\n|                  | `krnl`                     | `cuda`                            | `ocl`                           |\n| :--------------- | :------------------------- | :-------------------------------- | :------------------------------ |\n| **`1,000,000`**  | `316.90 ns` (✅ **1.00x**) | `112.84 us` (❌ _356.06x slower_) | `495.45 ns` (❌ _1.56x slower_) |\n| **`10,000,000`** | `318.15 ns` (✅ **1.00x**) | `1.10 ms` (❌ _3454.98x slower_)  | `506.82 ns` (❌ _1.59x slower_) |\n| **`64,000,000`** | `317.56 ns` (✅ **1.00x**) | `6.31 ms` (❌ _19854.77x slower_) | `506.15 ns` (❌ _1.59x slower_) |\n\n## upload\n\n|                  | `krnl`                     | `cuda`                            | `ocl`                           |\n| :--------------- | :------------------------- | :-------------------------------- | :------------------------------ |\n| **`1,000,000`**  | `332.66 us` (✅ **1.00x**) | `359.18 us` (✅ **1.08x slower**) | `773.51 us` (❌ _2.33x slower_) |\n| **`10,000,000`** | `4.83 ms` (✅ **1.00x**)   | `3.69 ms` (✅ **1.31x faster**)   | `8.76 ms` (❌ _1.81x slower_)   |\n| **`64,000,000`** | `25.24 ms` (✅ **1.00x**)  | `24.34 ms` (✅ **1.04x faster**)  | `57.02 ms` (❌ _2.26x slower_)  |\n\n## download\n\n|                  | `krnl`                     | `cuda`                            | `ocl`                           |\n| :--------------- | :------------------------- | :-------------------------------- | :------------------------------ |\n| **`1,000,000`**  | `584.39 us` (✅ **1.00x**) | `447.38 us` (✅ **1.31x faster**) | `20.17 ms` (❌ _34.52x slower_) |\n| **`10,000,000`** | `5.67 ms` (✅ **1.00x**)   | `4.03 ms` (✅ **1.41x faster**)   | `20.15 ms` (❌ _3.55x slower_)  |\n| **`64,000,000`** | `28.82 ms` (✅ **1.00x**)  | `25.57 ms` (✅ **1.13x faster**)  | `37.01 ms` (❌ _1.28x slower_)  |\n\n## zero\n\n|                  | `krnl`                     | `cuda`                            | `ocl`                             |\n| :--------------- | :------------------------- | :-------------------------------- | :-------------------------------- |\n| **`1,000,000`**  | `38.15 us` (✅ **1.00x**)  | `25.28 us` (✅ **1.51x faster**)  | `34.12 us` (✅ **1.12x faster**)  |\n| **`10,000,000`** | `250.90 us` (✅ **1.00x**) | `242.95 us` (✅ **1.03x faster**) | `251.86 us` (✅ **1.00x slower**) |\n| **`64,000,000`** | `1.53 ms` (✅ **1.00x**)   | `1.55 ms` (✅ **1.01x slower**)   | `1.56 ms` (✅ **1.02x slower**)   |\n\n## saxpy\n\n|                  | `krnl`                     | `cuda`                            | `ocl`                             |\n| :--------------- | :------------------------- | :-------------------------------- | :-------------------------------- |\n| **`1,000,000`**  | `90.76 us` (✅ **1.00x**)  | `81.16 us` (✅ **1.12x faster**)  | `88.94 us` (✅ **1.02x faster**)  |\n| **`10,000,000`** | `746.92 us` (✅ **1.00x**) | `770.03 us` (✅ **1.03x slower**) | `779.90 us` (✅ **1.04x slower**) |\n| **`64,000,000`** | `4.71 ms` (✅ **1.00x**)   | `4.90 ms` (✅ **1.04x slower**)   | `4.91 ms` (✅ **1.04x slower**)   |\n\n# License\n\nDual-licensed to be compatible with the Rust project.\n\nLicensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.\n\n# Contribution\n\nUnless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcharles-r-earp%2Fkrnl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcharles-r-earp%2Fkrnl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcharles-r-earp%2Fkrnl/lists"}