An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/rocm/rocprim

ROCm Parallel Primitives

amd cuda gpu hip parallel primitive rocm

Last synced: 08 Apr 2025

https://github.com/librapid/librapid

A highly optimised C++ library for mathematical applications and neural networks.

array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd

Last synced: 08 Apr 2025

https://github.com/vm6502q/qrack

Comprehensive, GPU accelerated framework for developing universal virtual quantum processors

cuda distributed-quantum-computing gpu hpc opencl physics physics-simulation quantum quantum-computer-simulator quantum-computing quantum-information quantum-simulator qubits

Last synced: 08 Feb 2025

https://github.com/qengineering/install-opencv-jetson-nano

OpenCV installation script with CUDA and cuDNN support

cuda cudnn jetson-nano jetson-xavier opencv opencv4

Last synced: 04 Apr 2025

https://github.com/LibRapid/librapid

A highly optimised C++ library for mathematical applications and neural networks.

array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd

Last synced: 06 Dec 2024

https://github.com/rocm/gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify

cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm

Last synced: 19 Dec 2024

https://github.com/deftruth/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 06 Apr 2025

https://github.com/ROCm/gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify

cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm

Last synced: 11 Mar 2025

https://github.com/proger/accelerated-scan

Accelerated First Order Parallel Associative Scan

cuda cumulative-sum recurrent-neural-networks state-space-model torch

Last synced: 03 Apr 2025

https://github.com/zju3dv/envgs

[CVPR 2025] EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

2dgs 3dgs cuda optix path-tracing ray-tracing reflection

Last synced: 05 Apr 2025

https://github.com/dvlab-research/SparseTransformer

A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).

3d-point-cloud cuda sparse-transformer transformer

Last synced: 20 Mar 2025

https://github.com/xlite-dev/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 30 Mar 2025

https://github.com/pythonlessons/tensorflow-object-detection-tutorial

The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch

classifier cuda cudnn detection detection-api detection-classifier detection-tutorial gpu grabscreen labels object-detection pil python-mss tensorflow tensorflow-cpu tensorflow-gpu tensorflow-models tutorial

Last synced: 09 Feb 2025

https://github.com/kibae/onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime

Last synced: 05 Apr 2025

https://github.com/eth-cscs/implicitglobalgrid.jl

Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid

cuda distributed gpu julia julia-mpi-wrapper mpi multi-gpu staggered-grids stencil-codes

Last synced: 04 Apr 2025

https://github.com/patwie/cuda-design-patterns

Some CUDA design patterns and a bit of template magic for CUDA

bazel cpp11 cuda cuda-development cuda-device cuda-kernels cuda-utils gpu template-metaprogramming

Last synced: 14 Apr 2025

https://github.com/hijkzzz/cuda-neural-network

Convolutional Neural Network with CUDA (MNIST 99.23%)

cnn cpp cuda mnist neural-network

Last synced: 01 May 2025

https://github.com/dr-noob/gpufetch

Simple yet fancy GPU architecture fetching tool

cuda gpu igpu intel nvidia

Last synced: 13 Apr 2025

https://github.com/dizcza/docker-hashcat

Latest hashcat docker for CUDA, OpenCL, and POCL. Deployed on Vast.ai

cuda docker hashcat nvidia opencl pocl vast-ai

Last synced: 01 Apr 2025

https://github.com/chenhunghan/ialacol

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

ai cloudnative cuda ggml gptq gpu helm kubernetes langchain llamacpp llm llm-inference llm-serving openai python

Last synced: 20 Jan 2025

https://github.com/bobmcdear/attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

cuda deep-learning machine-learning openai openai-triton pytorch triton

Last synced: 20 Dec 2024

https://github.com/BobMcDear/attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

cuda deep-learning machine-learning openai openai-triton pytorch triton

Last synced: 21 Dec 2024

https://github.com/libmir/dcompute

DCompute: Native execution of D on GPUs and other Accelerators

cuda d fpga gpgpu gpu ldc opencl

Last synced: 19 Dec 2024

https://github.com/fangq/mcx

Monte Carlo eXtreme (MCX) - GPU-accelerated photon transport simulator

3d c cuda matlab monte-carlo optical-imaging pascal photon-transport physics-simulation ray-tracing volumetric-rendering voxel-based

Last synced: 07 Apr 2025

https://github.com/1461521844lijin/trt_yolo_video_pipeline

TensorRT+YOLO系列的 多路 多卡 多实例 并行视频分析处理案例

cuda ffmpeg opencv video-processing yolo yolov8

Last synced: 26 Nov 2024

https://github.com/Dr-Noob/gpufetch

Simple yet fancy GPU architecture fetching tool

cuda gpu igpu intel nvidia

Last synced: 01 Apr 2025

https://github.com/chonspqx/modulated-deform-conv

deformable convolution 2D 3D DeformableConvolution DeformConv Modulated Pytorch CUDA

cuda cuda-extension deform-conv3d deformable-convolutional deformable-convolutional-networks python pytorch

Last synced: 18 Mar 2025

https://github.com/rocm/hipblas

ROCm BLAS marshalling library

blas cuda hip rocm

Last synced: 12 Apr 2025

https://github.com/mathiasbourgoin/spoc

Stream Processing with OCaml

cuda gpgpu ocaml opencl spoc

Last synced: 10 Apr 2025

https://github.com/fhamborg/newsmtsc

Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.

cuda dataset deep-learning news-articles pytorch sentiment-analysis sentiment-classification text-classification tsc

Last synced: 07 Apr 2025

https://github.com/goofit/goofit

Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP

cuda fitting gpu gpu-computing omp physics root-cern thrust

Last synced: 10 Apr 2025

https://github.com/roastduck/FreeTensor

A language and compiler for irregular tensor programs.

ast automatic-differentiation code-generation cuda gpu jit openmp tensor

Last synced: 11 Apr 2025

https://github.com/anicetngrt/jiro-nn

A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.

adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd

Last synced: 09 Apr 2025

https://github.com/JulianAssmann/opencv-cuda-docker

Dockerfiles for OpenCV compiled with CUDA, opencv_contrib modules and Python 3 bindings

cuda docker gpu nvidia opencv

Last synced: 06 Apr 2025

https://github.com/openmlsys/openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

cuda gpu machine-learning

Last synced: 15 Apr 2025

https://github.com/rsnk96/Ubuntu-Setup-Scripts

Scripts to help you set up your Ubuntu quickly, especially if you're in any subfield of Data Science or AI!

anaconda cuda deep-learning deeplearning dl ffmpeg installers ml opencv python pytorch tensorflow tensorflow-setup ubuntu zsh

Last synced: 07 Apr 2025

https://github.com/AnicetNgrt/jiro-nn

A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.

adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd

Last synced: 16 Jan 2025

https://github.com/acdslab/mppi-generic

Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)

cpp cuda model-predictive-control model-predictive-path-integral robotics stochastic-optimization

Last synced: 05 Apr 2025

https://github.com/GooFit/GooFit

Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP

cuda fitting gpu gpu-computing omp physics root-cern thrust

Last synced: 08 Apr 2025

https://github.com/jamjamjon/usls

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.

clip cuda florence2 grounding-dino moondream ocr onnx onnxruntime rust rust-yolo sam sapiens smolvlm tensorrt yolo yolo-rs yolo-rust yolov10 yolov11 yolov8

Last synced: 12 Apr 2025

https://github.com/ROCm/hipBLAS

ROCm BLAS marshalling library

blas cuda hip rocm

Last synced: 30 Nov 2024

https://github.com/naeioi/pbf-cuda

Position Based Fluids CUDA implementation

cuda fluid-solver opengl real-time simulation

Last synced: 26 Apr 2025

https://github.com/qdLMF/LIO-SAM-GPU-ScanToMapOpt

A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.

3d-mapping cuda faster-lio gpu ivox knn lidar lidar-inertial-odometry lidar-slam lio lio-sam loam slam

Last synced: 18 Mar 2025

https://github.com/inoryy/tensorflow-optimized-wheels

TensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA

avx2 cuda cudnn python sse tensorflow tensorflow-gpu tensorflow-wheels wheels xla

Last synced: 02 Apr 2025

https://github.com/ihhub/penguinv

Computer vision library with focus on heterogeneous systems

avx computer-vision cpp cuda gpu hacktoberfest heterogeneous-systems image-processing opencl python simd sse thread-pool

Last synced: 09 Apr 2025

https://github.com/MuGdxy/muda

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

cuda cuda-cpp cuda-programming

Last synced: 20 Nov 2024

https://github.com/gpmueller/eigen-cuda

MWE for using the Eigen library in CUDA kernels

cuda eigen eigen-cuda mwe

Last synced: 14 Apr 2025

https://github.com/sniklaus/pytorch-extension

an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors

cuda cupy deep-learning python pytorch

Last synced: 14 Nov 2024

https://github.com/rocm/rocrand

RAND library for HIP programming language

cuda gpu hip random rng rocm

Last synced: 12 Apr 2025

https://github.com/src-d/minhashcuda

Weighted MinHash implementation on CUDA (multi-gpu).

cuda lsh machine-learning minhash

Last synced: 09 Apr 2025

https://github.com/glotzerlab/fresnel

Publication quality path tracing in real time.

cuda optix path-tracing python simulation soft-matter

Last synced: 05 Apr 2025

https://github.com/mseitzer/gpu-monitor

Script to remotely check GPU servers for free GPUs

cuda cudnn deep-learning gpu nvidia-smi remote ssh

Last synced: 12 Apr 2025

https://github.com/cgtuebingen/Flex-Convolution

Source code for: Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds), accepted at ACCV 2018

3d-point-clouds cuda deep-learning deep-neural-networks machine-learning pointcloud pointcloudprocessing research research-paper segmentation shapenet tensorflow

Last synced: 20 Mar 2025

https://github.com/arbor-sim/arbor

The Arbor multi-compartment neural network simulation library.

cuda gpu hip hpc modern-cpp mpi neuroscience

Last synced: 08 Apr 2025

https://github.com/pytorch/extension-script

Example repository for custom C++/CUDA operators for TorchScript

cpp cuda pytorch

Last synced: 19 Jan 2025

https://github.com/juliagpu/acceleratedkernels.jl

Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.

amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library

Last synced: 04 Apr 2025

https://github.com/pwhiddy/fat-clouds

GPU Fluid Simulation with Volumetric Rendering

blaze cuda fluid gpu navier-stokes raymarching smoke

Last synced: 13 Apr 2025

https://github.com/ROCm/rocRAND

RAND library for HIP programming language

cuda gpu hip random rng rocm

Last synced: 18 Dec 2024

https://github.com/windqaq/mpm

Simulating on GPU using Material Point Method and rendering.

cuda gvdb-voxels material-point-method raytracer simulation thrust

Last synced: 23 Mar 2025

https://github.com/pennylaneai/pennylane-lightning

The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.

cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm

Last synced: 15 Apr 2025

https://github.com/JuliaGPU/AcceleratedKernels.jl

Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.

amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library

Last synced: 17 Mar 2025

https://github.com/cryinkfly/rhinoceros-3d-for-linux

This is a project, where I give you a way to use Rhino 3D on Linux!

archlinux cuda fedora international linux linuxmint manjaro nvidia opengl opensuse ubuntu wine

Last synced: 06 Apr 2025

https://github.com/cair/tmu

Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-vs-one multi-class classifier. TMU is written in Python with wrappers for C and CUDA-based clause evaluation and updating.

absorbing-states autoencoder convolution cuda gpu incremental incremental-computation multi-output pattern-recognition propositional-logic regression relational-logic sparse tsetlin-machine

Last synced: 09 Apr 2025

https://github.com/ANNetGPGPU/ANNetGPGPU

A GPU (CUDA) based Artificial Neural Network library

c-plus-plus-11 cuda propagation-network self-organizing-map

Last synced: 23 Apr 2025

https://github.com/leimao/CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

cuda

Last synced: 06 Dec 2024

https://github.com/cuihaoleo/gpg-fingerprint-filter-gpu

Generate OpenPGP keys with fingerprints that match a specific pattern (a.k.a. vanity keys)

cuda gnupg gpg gpu pgp vanity

Last synced: 15 Apr 2025

https://github.com/upul/aurora

Minimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.

cplusplus cuda cudnn cython deep-learning python3 system-design

Last synced: 18 Nov 2024

https://github.com/nvidia-merlin/hierarchicalkv

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.

cuda dynamic-embedding embedding-storage gpu hashtable key-value-store recommender-system

Last synced: 10 Apr 2025

https://github.com/mitmul/pynvvl

A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python

cuda cupy gpu numpy nvidia-video-loader nvvl python video video-processing

Last synced: 25 Jan 2025

https://github.com/harrism/mini-nbody

A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.

astrophysics benchmark cuda nbody

Last synced: 22 Mar 2025

https://github.com/coreylowman/llama-dfdx

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

cuda deep-learning inference language-model llama neural-network rust rust-lang

Last synced: 13 Apr 2025

https://github.com/osai-ai/dokai

Collection of Docker images for ML/DL and video processing projects

cuda deep-learning docker docker-image ffmpeg opencv python pytorch tensorrt video-processing

Last synced: 04 Apr 2025

https://github.com/meiqua/pose_refine

cuda icp for 6D pose estimation

cuda icp renderer

Last synced: 14 Apr 2025

https://github.com/nvidia/nvimagecodec

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

computer-vision cpp cuda dali data-processing deep-learning fast-data-pipeline gpu image-processing machine-learning nvidia python pytorch

Last synced: 12 Apr 2025

https://github.com/sonots/cumo

Cumo (pronounced like "koomo") is CUDA aware numerical library whose interface is highly compatible with Ruby Numo

cuda numo ruby scicentific-computing

Last synced: 10 Apr 2025

https://github.com/bdusell/singularity-tutorial

Tutorial for using Singularity containers

container cuda cudnn pytorch singularity

Last synced: 09 Apr 2025

https://github.com/1ytic/pytorch-edit-distance

Levenshtein edit-distance on PyTorch and CUDA

asr cuda edit-distance levenshtein nlp pytorch

Last synced: 09 Apr 2025

https://github.com/sasagawa888/deeppipe2

Deep Learning library using GPU(CUDA/cuBLAS)

cublas cuda deep-learning elixir gpu

Last synced: 15 Mar 2025

https://github.com/ashvardanian/parallelreductionsbenchmark

Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!

apple avx512 cuda glsl gpgpu gpu gpu-acceleration gpu-computing hpc intel metal nvidia opencl openmp parallel simd stl tbb thrust

Last synced: 06 Apr 2025

https://github.com/braintwister/docker-devel-env

Fast, reproducible, and portable software development environments

clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode

Last synced: 08 Feb 2025

https://github.com/mitsuba-renderer/drjit-core

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering (core library)

cuda jit llvm

Last synced: 05 Apr 2025

https://github.com/eomii/rules_ll

An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming

bazel bleeding-edge build-system clang clang-tidy cpp cuda gpu-programming hermetic hip llvm nix openmp remote-caching remote-execution reproducible sanitizers

Last synced: 06 Apr 2025

https://github.com/BrainTwister/docker-devel-env

Fast, reproducible, and portable software development environments

clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode

Last synced: 09 Dec 2024

https://github.com/PennyLaneAI/pennylane-lightning

The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane

cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm

Last synced: 17 Nov 2024