{"id":13809654,"url":"https://github.com/elftausend/custos","last_synced_at":"2025-05-07T13:07:01.482Z","repository":{"id":37542966,"uuid":"467426780","full_name":"elftausend/custos","owner":"elftausend","description":"A minimal OpenCL, CUDA, Vulkan and host CPU array manipulation engine / framework.","archived":false,"fork":false,"pushed_at":"2025-03-21T23:13:14.000Z","size":3093,"stargazers_count":73,"open_issues_count":3,"forks_count":9,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-07T13:06:55.210Z","etag":null,"topics":["array-manipulations","autograd","automatic-differentiation","cpu","cuda","cuda-support","custos","framework","gpu","lazy-evaluation","no-std","opencl","rust","vulkan","wgsl"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elftausend.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-03-08T08:32:39.000Z","updated_at":"2025-03-21T17:26:20.000Z","dependencies_parsed_at":"2024-04-21T11:33:19.720Z","dependency_job_id":"bc634ffa-40fc-4924-85bd-c7ecd3b63538","html_url":"https://github.com/elftausend/custos","commit_stats":{"total_commits":1032,"total_committers":3,"mean_commits":344.0,"dds":0.3343023255813954,"last_synced_commit":"efa548a0d4e0338babf90cee83d33733d7270c03"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elftausend%2Fcustos","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elftausend%2Fcustos/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elftausend%2Fcustos/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elftausend%2Fcustos/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elftausend","download_url":"https://codeload.github.com/elftausend/custos/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252883204,"owners_count":21819160,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["array-manipulations","autograd","automatic-differentiation","cpu","cuda","cuda-support","custos","framework","gpu","lazy-evaluation","no-std","opencl","rust","vulkan","wgsl"],"created_at":"2024-08-04T02:00:33.660Z","updated_at":"2025-05-07T13:07:01.456Z","avatar_url":"https://github.com/elftausend.png","language":"Rust","funding_links":[],"categories":["Frameworks","GPU Programming","Rust","Neural Networks"],"sub_categories":[],"readme":"![custos logo](assets/custos.png)\n\n\u003chr/\u003e\n\n[![Crates.io version](https://img.shields.io/crates/v/custos.svg)](https://crates.io/crates/custos)\n[![Docs](https://docs.rs/custos/badge.svg?version=0.7.0)](https://docs.rs/custos/0.7.0/custos/)\n[![Rust](https://github.com/elftausend/custos/actions/workflows/rust.yml/badge.svg)](https://github.com/elftausend/custos/actions/workflows/rust.yml)\n[![GPU](https://github.com/elftausend/custos/actions/workflows/gpu.yml/badge.svg)](https://github.com/elftausend/custos/actions/workflows/gpu.yml)\n[![rust-clippy](https://github.com/elftausend/custos/actions/workflows/rust-clippy.yml/badge.svg)](https://github.com/elftausend/custos/actions/workflows/rust-clippy.yml)\n[![Android NNAPI](https://github.com/elftausend/custos/actions/workflows/android.yml/badge.svg)](https://github.com/elftausend/custos/actions/workflows/android.yml)\n\nA minimal, extensible OpenCL, Vulkan (with WGSL), CUDA, NNAPI (Android) and host CPU array manipulation engine / framework written in Rust. \nThis crate provides tools for executing custom array and automatic differentiation operations.\u003cbr\u003e\n\n\n## Installation\n\nThe latest published version is of `0.7.x` (April 14th, 2023). A lot has changed since then. `0.7.x` can be found in the `custos-0.7` branch.\n\nAdd \"custos\" as a dependency:\n```toml\n[dependencies]\ncustos = \"0.7.0\"\n\n# to disable the default features (cpu, cuda, opencl, static-api, blas, macro) and use an own set of features:\n#custos = {version = \"0.7.0\", default-features=false, features=[\"opencl\", \"blas\"]}\n```\n\n### Available features: \n\nTo make specific devices useable, activate the corresponding features:\n\nFeature | Device | Notes\n--- | --- | ---\ncpu | `CPU` | Uses heap allocations.\nstack | `Stack` | Useable in `no-std` environments as it uses stack allocated `Buffer`s without requiring `alloc` or `std`. Practically only supports the `Base` module.\nopencl | `OpenCL` | Automatically maps unified memory. \ncuda | `CUDA` |\nvulkan | `Vulkan` | Shaders are written in WGSL. + unified memory\nnnapi | `NnapiDevice` | `Lazy` module is mandatory.\nuntyped | `Untyped` | Removes the need of `Buffer`'s generic parameters. (CPU and CUDA only for now)\n\ncustos ships combineable modules. Different selected modules result in different behaviour when executing operations.\nNew modules can be added in user code.\n```rust\nuse custos::prelude::*; \n// Autograd, Base = Modules\nlet device = CPU::\u003cAutograd\u003cBase\u003e\u003e::new();\n```\nTo make specific modules useable for building a device, activate the corresponding features:\n\nFeature | Module | Description\n--- | --- | ---\n*on by default* | `Base` | Default behaviour.\nautograd | `Autograd` | Enables running automatic differentiation.\ncached | `Cached` | Reuses allocations on demand.\nfork | `Fork` | Decides whether the CPU or GPU is faster for an operation. It then uses the faster device for following computations. (unified memory devices)\nlazy | `Lazy` | Lazy execution of operations and lazy intermediate allocations. Enables support for CUDA graphs.\ngraph | `Graph` | Adds a memory usage optimizeable graph and fusing of unary operations in combination with `Lazy`.\n\nUsage of these modules when writing custom operations: [`modules.md`](modules.md) and [`modules_usage.rs`](examples/modules_usage.rs).\n\nIf an operations wants to be affected by a module, specific custos code must be called in that operation.\n\nRemaining features: \n\nFeature | Description\n--- | --- \nstatic-api | Enables the creation of `Buffer`s without providing a device.\nstd | Adds standard library support.\nno-std | For no std environments, activates `stack` feature.\nmacro | Reexport of [custos-macro]\nblas | Adds gemm functions of the system's (selected) BLAS library.\nhalf | Adds support for half precision floats.\nserde | Adds serialization and deserialization support.\njson | Adds convenience functions for serialization and deserialization to and from json.\n\n[custos-macro]: https://github.com/elftausend/custos-macro\n\n## [Examples]\n\n\n[examples]: https://github.com/elftausend/custos/tree/main/examples\n[unary]: https://github.com/elftausend/custos/blob/main/src/unary.rs\n\nImplement an operation for `CPU`:\u003cbr\u003e\n- If you want to implement your own operations for all compute devices, consider looking here: [implement_operations.rs](examples/implement_operations.rs) or [\"modules_usage.rs\"](examples/modules_usage.rs)\u003cbr\u003e\nor to see it at a larger scale, look here [`custos-math`](https://github.com/elftausend/custos-math) (outdated, requires custos 0.7) or here [`sliced`](https://github.com/elftausend/sliced) (for automatic diff examples).\n\nThis operation is only affected by the `Cached` module (and partially `Autograd`).\n\n```rust\nuse custos::prelude::*;\nuse std::ops::{Deref, Mul};\n\npub trait MulBuf\u003cT: Unit, S: Shape = (), D: Device = Self\u003e: Sized + Device {\n    fn mul(\u0026self, lhs: \u0026Buffer\u003cT, D, S\u003e, rhs: \u0026Buffer\u003cT, D, S\u003e) -\u003e Buffer\u003cT, Self, S\u003e;\n}\n\nimpl\u003cMods, T, S, D\u003e MulBuf\u003cT, S, D\u003e for CPU\u003cMods\u003e\nwhere\n    Mods: Retrieve\u003cSelf, T, S\u003e,\n    T: Unit + Mul\u003cOutput = T\u003e + Copy + 'static,\n    S: Shape,\n    D: Device,\n    D::Base\u003cT, S\u003e: Deref\u003cTarget = [T]\u003e,\n{\n    fn mul(\u0026self, lhs: \u0026Buffer\u003cT, D, S\u003e, rhs: \u0026Buffer\u003cT, D, S\u003e) -\u003e Buffer\u003cT, Self, S\u003e {\n        let mut out = self.retrieve(lhs.len(), (lhs, rhs)).unwrap(); // unwrap or return error (update trait)\n\n        for ((lhs, rhs), out) in lhs.iter().zip(rhs.iter()).zip(\u0026mut out) {\n            *out = *lhs * *rhs;\n        }\n\n        out\n    }\n}\n```\n\nA lot more usage examples can be found in the [tests] and [examples] folders.\n(Or in the [unary] operation file, [custos-math](https://github.com/elftausend/custos-math) and [`sliced`](https://github.com/elftausend/sliced))\n\n[tests]: https://github.com/elftausend/custos/tree/main/tests","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felftausend%2Fcustos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felftausend%2Fcustos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felftausend%2Fcustos/lists"}