{"id":33315946,"url":"https://github.com/paiml/trueno","last_synced_at":"2026-04-01T19:24:11.744Z","repository":{"id":324421490,"uuid":"1097148443","full_name":"paiml/trueno","owner":"paiml","description":"Speed boost using:  Assembly, GPU and WASM","archived":false,"fork":false,"pushed_at":"2026-03-04T17:59:01.000Z","size":29887,"stargazers_count":23,"open_issues_count":16,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-05T05:13:30.592Z","etag":null,"topics":["apr","gpu","paiml","ptx","ruchy","rust","simd","wasm","wgpu"],"latest_commit_sha":null,"homepage":"https://paiml.github.io/trueno/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paiml.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-15T16:17:25.000Z","updated_at":"2026-03-04T17:59:10.000Z","dependencies_parsed_at":"2026-01-11T02:02:29.345Z","dependency_job_id":null,"html_url":"https://github.com/paiml/trueno","commit_stats":null,"previous_names":["paiml/trueno"],"tags_count":29,"template":false,"template_full_name":null,"purl":"pkg:github/paiml/trueno","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paiml%2Ftrueno","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paiml%2Ftrueno/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paiml%2Ftrueno/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paiml%2Ftrueno/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paiml","download_url":"https://codeload.github.com/paiml/trueno/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paiml%2Ftrueno/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30326883,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T05:25:20.737Z","status":"ssl_error","status_checked_at":"2026-03-10T05:25:17.430Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apr","gpu","paiml","ptx","ruchy","rust","simd","wasm","wgpu"],"created_at":"2025-11-19T14:04:00.459Z","updated_at":"2026-04-01T19:24:11.736Z","avatar_url":"https://github.com/paiml.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\".github/trueno-hero.svg\" alt=\"trueno\" width=\"600\"\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003etrueno\u003c/h1\u003e\n\n**Multi-Target High-Performance Compute Library**\n\n[![CI](https://github.com/paiml/trueno/actions/workflows/ci.yml/badge.svg)](https://github.com/paiml/trueno/actions)\n[![Coverage](https://img.shields.io/badge/coverage-97%25-brightgreen)](https://github.com/paiml/trueno)\n[![Crates.io](https://img.shields.io/crates/v/trueno.svg)](https://crates.io/crates/trueno)\n[![Documentation](https://docs.rs/trueno/badge.svg)](https://docs.rs/trueno)\n\n\u003c/div\u003e\n\n---\n\n**trueno** (Spanish: \"thunder\") provides unified compute primitives across CPU SIMD, GPU, and WebAssembly.\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Performance](#performance)\n- [trueno-gpu: Pure Rust CUDA](#trueno-gpu-pure-rust-cuda)\n- [Training (WGPU)](#training-wgpu)\n- [Operations](#operations)\n- [Development](#development)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Features\n\n- **CPU SIMD**: x86 (SSE2/AVX/AVX2/AVX-512), ARM (NEON), WASM (SIMD128)\n- **GPU**: Pure Rust PTX generation via `trueno-gpu` (no nvcc required)\n- **Cross-platform GPU**: Vulkan/Metal/DX12/WebGPU via `wgpu`\n- **Auto-dispatch**: Runtime selection of optimal backend\n- **Zero unsafe in public API**: Safety via type system\n\n## Installation\n\n```toml\n[dependencies]\ntrueno = \"0.16\"\n\n# Optional: GPU support for large matrices\ntrueno = { version = \"0.16\", features = [\"gpu\"] }\n\n# Optional: Pure Rust CUDA PTX generation\ntrueno-gpu = \"0.4\"\n```\n\n## Quick Start\n\n```rust\nuse trueno::{Vector, Matrix, SymmetricEigen};\n\n// Vector operations - auto-selects best SIMD backend\nlet a = Vector::from_slice(\u0026[1.0, 2.0, 3.0, 4.0]);\nlet b = Vector::from_slice(\u0026[5.0, 6.0, 7.0, 8.0]);\n\nlet sum = a.add(\u0026b).unwrap();           // [6.0, 8.0, 10.0, 12.0]\nlet dot = a.dot(\u0026b).unwrap();           // 70.0\nlet activated = a.relu().unwrap();      // ReLU activation\n\n// Matrix operations\nlet m = Matrix::from_vec(2, 2, vec![1.0, 2.0, 3.0, 4.0]).unwrap();\nlet product = m.matmul(\u0026m).unwrap();    // Matrix multiplication\nlet transposed = m.transpose();          // Transpose\n\n// Batched matmul for transformers (Q @ K^T pattern)\nlet batch = 2; let heads = 4; let seq = 8; let dim = 64;\nlet q: Vec\u003cf32\u003e = vec![0.1; batch * heads * seq * dim];\nlet kt: Vec\u003cf32\u003e = vec![0.1; batch * heads * dim * seq];\nlet attn = Matrix::batched_matmul_4d(\u0026q, \u0026kt, batch, heads, seq, dim, seq).unwrap();\n\n// Eigendecomposition (PCA, spectral analysis)\nlet cov = Matrix::from_vec(2, 2, vec![3.0, 1.0, 1.0, 3.0]).unwrap();\nlet eigen = SymmetricEigen::new(\u0026cov).unwrap();\nlet eigenvalues = eigen.eigenvalues();  // [4.0, 2.0]\n```\n\n## Performance\n\n| Operation | SIMD Speedup | Notes |\n|-----------|--------------|-------|\n| Dot product | 6-17x | AVX-512 for compute-bound |\n| Matrix multiply | 2-10x | GPU for 500x500+ |\n| Reductions (sum, max, min) | 3-12x | AVX-512 optimal |\n| Element-wise (add, mul) | 1-2x | Memory-bound |\n| Convolution 2D | 5-8x | AVX2/AVX-512 optimized |\n\n### Benchmark Results (AMD Ryzen 9 7950X)\n\n| Benchmark | Throughput |\n|-----------|------------|\n| Vector recip (AVX-512, 10K) | 10.0 Gelem/s |\n| Vector recip (AVX2, 10K) | 9.7 Gelem/s |\n| PTX module emit | 3.1 µs |\n| PTX kernel build | 81 ns |\n| Launch config | 1.7 ns |\n\n**GPU Note**: GPU acceleration benefits matrix multiply only.\nElement-wise operations use CPU SIMD\n(GPU transfer overhead exceeds compute time).\n\n## trueno-gpu: Pure Rust CUDA\n\nGenerate CUDA PTX kernels without nvcc, LLVM, or external toolchains:\n\n```rust\nuse trueno_gpu::kernels::{GemmKernel, Kernel, SoftmaxKernel};\n\n// Generate optimized GEMM kernel\nlet gemm = GemmKernel::tensor_core(1024, 1024, 1024);\nlet ptx = gemm.emit_ptx();  // Pure Rust PTX generation\n\n// Generate softmax with warp shuffle reduction\nlet softmax = SoftmaxKernel::new(4096);\nlet ptx = softmax.emit_ptx();\n\n// Available kernels: GEMM, Softmax, LayerNorm, Attention, Quantize (Q4K/Q5K/Q6K)\n```\n\n## Training (WGPU)\n\ntrueno now supports **backward pass computation** via WGSL compute shaders, enabling neural network training on AMD, Intel Arc, and Apple Silicon GPUs through Vulkan, Metal, DX12, and WebGPU -- no CUDA required.\n\n**7 backward ops implemented**:\n- `silu_backward` -- SiLU activation gradient\n- `gemm_backward_a` -- weight gradient (dL/dA)\n- `gemm_backward_b` -- input gradient (dL/dB)\n- `rmsnorm_backward` -- RMSNorm gradient\n- `rope_backward` -- rotary position embedding gradient\n- `adamw_step` -- AdamW optimizer parameter update\n- `nf4_dequant` -- NF4 4-bit dequantization for QLoRA\n\nAll 7 shaders verified on AMD Radeon Pro W5700X via Vulkan with 8 FALSIFY contract tests passing.\n\n```rust\nuse trueno::backends::gpu::GpuDevice;\n\nlet dev = GpuDevice::new()?;\n\n// Backward pass: compute SiLU gradient\ndev.silu_backward(\u0026input, \u0026grad_output, \u0026mut grad_input)?;\n\n// Optimizer step: AdamW update\ndev.adamw_step(\u0026mut params, \u0026grads, \u0026mut m, \u0026mut v, lr, beta1, beta2, eps, weight_decay, step)?;\n```\n\n## Operations\n\n**Vector**: add, sub, mul, div, dot, sum, min, max, argmin,\nargmax, norm_l1, norm_l2, normalize, recip, sqrt, abs, clamp\n\n**Activations**: relu, leaky_relu, elu, sigmoid, tanh, gelu, swish, softmax, log_softmax, silu\n\n**Matrix**: matmul, batched_matmul, batched_matmul_4d, transpose, matvec, convolve2d, pooling (max/avg), topk, gather, pad\n\n**Statistics**: mean, variance, stddev, covariance, correlation, zscore\n\n**Eigen**: symmetric eigendecomposition (Jacobi algorithm)\n\n**GPU Kernels**: GEMM (naive/tiled/tensor core), Softmax, LayerNorm, RMSNorm, Attention, GEMV, Quantization\n\n## Development\n\n```bash\ncargo test                  # Run tests\ncargo bench                 # Run benchmarks\nmake coverage              # Coverage report (requires cargo-llvm-cov)\ncargo run --example backend_detection  # Check available backends\n```\n\n## Ecosystem\n\nPart of the Pragmatic AI Labs stack:\n- [trueno-gpu](https://crates.io/crates/trueno-gpu) - Pure Rust PTX generation (no nvcc)\n[![Documentation](https://docs.rs/trueno/badge.svg)](https://docs.rs/trueno)\n- [trueno-db](https://crates.io/crates/trueno-db) - GPU-first analytics database\n[![Documentation](https://docs.rs/trueno/badge.svg)](https://docs.rs/trueno)\n- [trueno-graph](https://crates.io/crates/trueno-graph) - Graph algorithms\n[![Documentation](https://docs.rs/trueno/badge.svg)](https://docs.rs/trueno)\n- [trueno-rag](https://crates.io/crates/trueno-rag) - RAG pipeline\n[![Documentation](https://docs.rs/trueno/badge.svg)](https://docs.rs/trueno)\n- 🤖 [Coursera Hugging Face AI Development Specialization](https://www.coursera.org/specializations/hugging-face-ai-development) - Build Production AI systems with Hugging Face in Pure Rust\n\n## Usage\n\nAdd trueno to your `Cargo.toml`:\n\n```toml\n[dependencies]\ntrueno = \"0.16\"\n```\n\nThen use it in your code:\n\n```rust\nuse trueno::Vector;\n\nlet a = Vector::from_slice(\u0026[1.0, 2.0, 3.0]);\nlet b = Vector::from_slice(\u0026[4.0, 5.0, 6.0]);\nlet result = a.add(\u0026b).unwrap();\n```\n\nThe library auto-selects the best SIMD backend at runtime. No configuration needed.\n\n## Contributing\n\nContributions are welcome. Please ensure:\n\n1. All tests pass: `cargo test --all-features`\n2. Coverage stays above 90%: `make coverage`\n3. No clippy warnings: `cargo clippy --all-features -- -D warnings`\n4. Code is formatted: `cargo fmt`\n\n\n## MSRV\n\nMinimum Supported Rust Version: **1.89**\n\n## See Also\n\n- [Cookbook](examples/) — 34 runnable examples\n\n## License\n\nMIT - see [LICENSE](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaiml%2Ftrueno","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaiml%2Ftrueno","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaiml%2Ftrueno/lists"}