An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/PINTO0309/Open3D-build

Provide Docker build sequences of Open3D for various environments.

cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow

Last synced: 20 Mar 2025

https://github.com/YdrMaster/cuda-driver

基于 CUDA Driver API 的 cuda 运行时环境

cuda nvidia

Last synced: 14 May 2025

https://github.com/Bruce-Lee-LY/matrix_multiply

Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.

coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling

Last synced: 14 May 2025

https://github.com/dark-art108/gpu-docker-deployment-text-summarization

Text Summarization using Transformer on GPU Docker Deployment

cuda docker fastapi gpu-acceleration huggingface

Last synced: 02 May 2025

https://github.com/lona-cn/vision-simple

a lightweight C++ cross-platform vision inference library,support YOLOv10 YOLOv11 PaddleOCR EasyOCR ,using ONNXRuntime/TVM with multiple exectuion providers.

cuda directml easyocr ocr onnxruntime paddleocr tensorrt-inference tvm yolo

Last synced: 16 Oct 2025

https://github.com/lnstadrum/fastaugment

A handy data augmentation toolkit for image classification put in a single efficient TensorFlow/PyTorch op.

augmentation-transformations brightness-correction cuda cutout data-augmentation gamma-correction gpu mixup perspective-distortions tensorflow-op

Last synced: 23 Mar 2025

https://github.com/ivanrs297/cuda-spmv-csr

Parallel SpMV using CSR representation, built in CUDA

csr cuda parallel-computing spmv

Last synced: 20 Jun 2025

https://github.com/krassowski/gsea-api

Pandas API for multiple Gene Set Enrichment Analysis implementations in Python (GSEApy, cudaGSEA, GSEA)

bioinformatics cuda enrichment gene-set-enrichment gene-sets gsea pandas pathway-analysis python3 transcriptomics

Last synced: 13 Apr 2025

https://github.com/bruce-lee-ly/matrix_multiply

Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.

coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling

Last synced: 13 Apr 2025

https://github.com/pinto0309/open3d-build

Provide Docker build sequences of Open3D for various environments.

cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow

Last synced: 06 May 2025

https://github.com/lennyerik/cutransform

CUDA kernels in any language supported by LLVM

c cuda gpgpu gpu-compute llvm llvm-ir nvidia ptx rust zig

Last synced: 06 May 2025

https://github.com/owensgroup/mvgpubtree

GPU B-Tree with support for versioning (snapshots).

b-tree concurrent cuda gpu snapshot versioning

Last synced: 15 Jun 2025

https://github.com/shivaraj-bh/ollama-flake

Run ollama natively - powered by Nix

cuda flakes nix ollama open-webui rocm services

Last synced: 01 May 2025

https://github.com/shadyboukhary/gpu-research-fft-openacc-cuda

Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.

acceleration cuda fast-fourier-transform fft gpu-acceleration gpu-computing gpu-programming nvcc openacc parallel-computing pgi pgi-compiler radix-2

Last synced: 07 Aug 2025

https://github.com/jishanshaikh4/cuda-programs

CUDA Programs for Hadoop/CUDA Lab at MANIT, Bhopal

c cuda hadoop

Last synced: 25 Apr 2025

https://github.com/romnn/microgpusim

Cycle-level, trace-driven, parallel GPU simulator for NVIDIA Pascal.

cuda cycle-level design-space-exploration gpgpu gpu nvbit nvidia performance-engineering rust simulation trace-driven

Last synced: 27 Jul 2025

https://github.com/cea-hpc/HARP

Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA

cuda gpgpu-computing hpc opencl rust

Last synced: 14 May 2025

https://github.com/johnaparker/pybind_examples

Examples of pybind11 based projects (using cmake)

cmake cpp cuda mpi openmp pybind11 python

Last synced: 19 Apr 2025

https://github.com/thecomputekid/premake5-cuda

Premake5 module that enables CUDA development in Visual Studio using the native CUDA Toolkit integration.

cuda premake-module premake5 visual-studio

Last synced: 14 Apr 2025

https://github.com/pinto0309/jetson-tensorflow-pytorch-build

Provides an environment for compiling TensorFlow or PyTorch with CUDA for aarch64 on an x86 machine. This is for Jetson. If you build using an EC2 m6g.16xlarge (aarch64) instance, TensorFlow can be fully built in about 30 minutes. It can be used as a cross-compilation environment not only for TensorFlow and PyTorch, but also for various other packages and libraries.

cross-compile cuda docker jetson jetson-nano l4t pytorch tensorflow

Last synced: 07 May 2025

https://github.com/krk/cuda-webcam

Webcam Image Processing with CUDA using OpenCV

c-plus-plus cuda opencv

Last synced: 04 Apr 2025

https://github.com/prg-titech/ikra-ruby

A Rubygem for array-based scientific computations using GPGPU

arrays compiler cuda gpgpu ruby

Last synced: 25 Apr 2025

https://github.com/miladfa7/install-tensorflow-gpu-2.1.0-on-linux-ubuntu-18.04

Easily Install Tensorflow-GPU 2.1.0 on Linux Ubuntu 18.04 -Cuda 10 & Cudnn 7.6.5 | Download package dependencies with direct link

cuda cudnn install-tensorflow linux python tensoflow tensorflow-gpu ubuntu1804

Last synced: 15 Aug 2025

https://github.com/marshallward/optiflop

Optiflop measures the optimally achievable FLOPs for mathematical operations on various platforms.

avx avx2 avx512 cuda roofline vectorization x86

Last synced: 02 Apr 2026

https://github.com/ai-dock/pytorch

PyTorch docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.

ai cuda docker jupyter machine-learning python pytorch rocm runpod syncthing vast

Last synced: 09 May 2025

https://github.com/nolmoonen/cuda-lbvh

CUDA implementation of a linear bounding volume hierarchy (LBVH).

cuda lbvh path-tracer

Last synced: 29 Jul 2025

https://github.com/AnyDSL/traversal

AnyDSL traversal code

amdgpu bvh cuda gpu nvvm raytracing traversal

Last synced: 29 Jul 2025

https://github.com/ogrecave/ogre-gpgpu

GPGPU compute with Ogre using CUDA or OpenCL

cuda gpgpu-computing ogre3d opencl

Last synced: 25 Aug 2025

https://github.com/elinliu0/studentbehaviordetection

沈阳大学-学生行为检测代码仓库(基于YoloV8+CVCUDA+TensorRT)

cuda cv-cuda python tensorrt yolov8

Last synced: 13 Jun 2025

https://github.com/egecetin/libkaleidoscope

A library to create kaleidoscope effect on images with CUDA. You can build on all platforms using CMake

c cpp cuda image-filter image-filtering image-manipulation image-processing kaleidoscope python real-time real-time-processing video-filter video-filtering video-processing

Last synced: 14 Apr 2025

https://github.com/superlinear-ai/python-gpu

🐳 Python GPU adds a minimal install of CUDA and cuDNN on top of the official python:3.x-slim base image

cuda cudnn docker docker-image python

Last synced: 27 Apr 2025

https://github.com/gianlucapaolocci/background-subtraction-on-gpu-with-cuda-and-opencv

In this code is provided a simple, efficient and fast method to calculate motion and backgroud dynamically using nVidia GPUs power

background-subtraction cuda image-processing nvidia opencv parallel-computing

Last synced: 07 May 2025

https://github.com/cms-patatrack/cluestering

Density-based clustering algorithm developed at CERN

alpaka cern clustering cpp cuda pybind11 python tbb

Last synced: 10 Apr 2025

https://github.com/gjbex/python-on-gpus

Repository for the training on using GPUs from Python.

cuda cupy gpu numba pycuda python training

Last synced: 12 Sep 2025

https://github.com/l4nos/php-cuda

An extesnion for PHP allowing it to access GPU operations on CUDA graphics cards (NVIDIA)

cuda cuda-kernels cuda-php php php-dll php-ext php-extension

Last synced: 26 Aug 2025

https://github.com/theochem/cugbasis

High performance CUDA/Python library for computing quantum chemistry density-based descriptors for larger systems using GPUs.

atoms-in-molecules computational-chemistry conceptual-dft cuda electron-density gpu python qtaim quantum quantum-chemistry theoretical-chemistry

Last synced: 17 Jan 2026

https://github.com/brownbiomechanics/autoscoper

Autoscoper is a 2D-3D image registration software package.

autoscoper biomechanics cuda hpc-server medical-imaging radiography tracking

Last synced: 01 Apr 2025

https://github.com/bigsk1/podcast-ai

AI podcast summary from a youtube video using Anthropic or XAI and Elevenlabs voices

ai-podcast anthropic-claude claude-ai claude-api cuda cudnn elevenlabs elevenlabs-api faster-whisper ffpmeg podcast review-tools xai xai-api youtube yt-dlp

Last synced: 18 Sep 2025

https://github.com/3p3r/pf-localization

Localization using a Particle Filter (and random walk model)

cuda localization matlab particle-filter slam

Last synced: 02 Apr 2025

https://github.com/z3lx/waifu2x-tensorrt

TensorRT implementation of the waifu2x super-resolution model for faster image and video upscaling.

anime cpp cuda cudnn image-upscaling machine-learning neural-network nvidia super-resolution tensorrt upscaling video-upscaling waifu2x

Last synced: 17 Jan 2026

https://github.com/mvisat/mcc-cuda

Implementation of Minutia Cylinder-Code with CUDA for Fingeprint Matching

cuda fingerprint mcc minutia

Last synced: 19 Apr 2025

https://github.com/aperim/docker-nvidia-cuda-ffmpeg

A docker container, with ffmpeg that supports scale_cuda among other things

cuda ffmpeg gpu hacktoberfest nvidia

Last synced: 25 Dec 2025

https://github.com/mortvest/hastl

HaSTL: A fast GPU implementation of STL decomposition with missing values and support for both CUDA and OpenCL

cuda forecasting gpu opencl time-series time-series-analysis

Last synced: 05 Mar 2025

https://github.com/cea-hpc/harp

Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA

cuda gpgpu-computing hpc opencl rust

Last synced: 13 May 2025

https://github.com/acfr/gpu-ray-surface-intersection-in-cuda

A GPU-based ray-surface intersection test implemented in CUDA

cuda gpu

Last synced: 18 Feb 2026

https://github.com/yalue/cudabrot

A CUDA renderer for the Buddhabrot fractal

amd buddhabrot buddhabrot-fractal cuda gpu hip mandelbrot mandelbrot-fractal rocm

Last synced: 07 May 2025

https://github.com/bruce-lee-ly/cuda_back2back_hgemm

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

back2back-gemm back2back-hgemm cublas cuda fused-gemm fused-hgemm gemm gpu hgemm matrix-multiply nvidia tensor-core

Last synced: 13 Apr 2025

https://github.com/minnukota381/cuda-parallel-c-programming

This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.

cuda cuda-programming hpc nvcc nvidia

Last synced: 30 Jun 2025

https://github.com/boyan-soubachov/excelerator

A Microsoft Excel calculation speed-up add in.

calculation-speed cuda excel formulae gpgpu microsoft

Last synced: 11 Apr 2025

https://github.com/tomaszrewak/rotatingvoxels

In this project I use C#, Alea GPU and OpenGL.Net to create a simple, hardware-accelerated, 3d animation of rotating cubes.

alea-gpu-library csharp cuda gpu opengl voxel

Last synced: 21 Apr 2025

https://github.com/feifeibear/pstensor

PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.

cuda deeplearning machinelearning pytorch tensorflow2

Last synced: 08 May 2026

https://github.com/balos1/shi_tomasi_feature_detection

CUDA, OpenMP, and regular serial C implementations of Shi Tomasi feature detection

cuda image-processing openmp shi-tomasi-detection

Last synced: 30 Aug 2025

https://github.com/qureshizawar/cuda-quartic-solver

A general cubic equation solver and quartic equation minimisation solver written for CPU and Nvidia GPUs, for more details and results, see: https://arxiv.org/abs/1903.10041. The library is available for C++/CUDA as well as Python using Pybind11.

cmake cubic-equations cuda cuda-quartic-solver gpu minimisation numpy nvidia-gpus openmp optimization pip pybind11 python quartic quartic-equations quartic-functions quartic-minimisation solver

Last synced: 18 Jul 2025

https://github.com/okerew/neural-web

This repository shows an alternative neural network structure to modern ones, inspiring from the brain and it's creativity, workings.

alternative architecture biology c cpu cuda gpu innovative kernel machine-learning markdown metal neural neural-network neuron objc shader structure

Last synced: 27 Jul 2025

https://github.com/fluiddyn/fluidfft

:chart_with_upwards_trend: Common API (C++ and Python) for Fast Fourier Transform HPC libraries (publish-only mirror)

cuda cython fft fftw3-binding mpi pythran spectral-methods

Last synced: 27 Aug 2025

https://github.com/denzp/rust-ptx-support

Experiments with achieving better ergonomics in Rust CUDA workflow

cuda rust

Last synced: 25 Oct 2025

https://github.com/nikhilmukraj/spiking-neural-networks

Implementations of various simulations for integrate and fire models, as well as conductance based models with synaptic neurotransmission

biological-neural-networks biological-neurons computational-biology cuda hodgkin-huxley-neuron izhikevich-neurons neuroscience python rust

Last synced: 15 Jun 2025

https://github.com/wang-xinyu/cudcnv2

A fully cuda implementation of DCNv2(deformable convolution) forward. Without dependent of cuTorch(THC).

cuda dcnv2

Last synced: 25 Mar 2025

https://github.com/pvgupta24/graph-betweenness-centrality

Parallelizing Graph Betweenness Centrality with CUDA

betweenness-centrality cuda graphs

Last synced: 12 Apr 2025

https://github.com/abelcarreras/cuda_functions

Python functions to calculate the FFT and autocorrelation function using GPU (Cuda)

autocorrelation-functions complex cuda cuda-functions fft gpu power-spectrum pypi python-api

Last synced: 12 Apr 2025

https://github.com/stellar-group/blaze

Fork of the Blaze library for compatibility with Blaze CUDA · https://bitbucket.org/blaze-lib/blaze · https://github.com/STEllAR-GROUP/blaze_cuda

cpp cpp14 cuda hpc linear-algebra metaprogramming

Last synced: 30 Apr 2025

https://github.com/daschr/cuda_firewall

Implementing a Firewall using dpdk and CUDA

cuda dpdk firewall

Last synced: 10 Apr 2025

https://github.com/noahgift/nuclear_powered_command_line_tools

Nuclear Powered Command-Line Tools

cuda jit machine-learning numba python

Last synced: 28 Oct 2025

https://github.com/enp1s0/cumpsgemm

Fast SGEMM emulation on Tensor Cores

cuda fp32 gemm gpu half-precision mixed-precision tensorcore tensorcores

Last synced: 09 Apr 2025

https://github.com/dusanerdeljan/tensor-math-library

Header only lazy evaluation tensor math library with multi-backend parallel eager execution support (TBB, OpenMP, Parallel STL and in the future CUDA and OpenCL)

cuda eager-execution lazy-evaluation matrix-library opencl openmp parallel-computing tbb tensor-library

Last synced: 28 Oct 2025

https://github.com/sukunis/cunfft

Nonequispaced FFTs on GPUs (based on NFFT: http://www.nfft.org)

cuda cunfft gpu nfft

Last synced: 21 Aug 2025

https://github.com/kekeblom/mpm

A simple CUDA accelerated material point method simulation.

computer-graphics cpp cuda docker mpm opengl physically-based-simulation physics-simulation simulations

Last synced: 12 Apr 2025

https://github.com/shapelets/shapelets-compute

Shapelets Compute is an accelerated platform for time series analysis

cuda matrixprofile opencl time-series

Last synced: 06 May 2025

https://github.com/tgautam03/xfilters

GPU (CUDA) accelerated filters using 2D convolution for high resolution images.

2d-convolution c cpp cuda cuda-programming gpu-acceleration gpu-computing gpu-programming image-filters image-processing

Last synced: 10 Oct 2025

https://github.com/dkobylianskii/torch-lap-cuda

A fast CUDA implementation of the Linear Assignment Problem (LAP) solver for PyTorch.

cuda python pytorch

Last synced: 05 May 2026

https://github.com/finmath/finmath-lib-cuda-extensions

Classes enabling finmath-lib to run its Monte-Carlo models on Cuda GPUs

cuda finmath-lib gpu

Last synced: 05 May 2025

https://github.com/marklysze/llamaindex-rag-linux-cuda

Examples of RAG using Llamaindex with local LLMs in Linux - Gemma, Mixtral 8x7B, Llama 2, Mistral 7B, Orca 2, Phi-2, Neural 7B

cuda gemma gemma-2b gemma-7b linux llama-2 llamaindex microsoft-phi-2 mistral-7b mixtral mixtral-8x7b neural-7b neural-chat-7b orca-2 phi-2 retrieval-augemented-generation ubuntu yi-34b

Last synced: 23 Jun 2025

https://github.com/phrb/gpu-autotuning

Autotuning NVCC Compiler Parameters, published @ CCPE Journal

autotuning cuda nvcc opentuner

Last synced: 06 Jul 2025

https://github.com/jatinx/pyhip

Python Interface to HIP and hiprtc Library

bindings cuda gpu hip hiprtc python rocm

Last synced: 21 Sep 2025

https://github.com/pennylaneai/lightning-on-hpc

"Hybrid quantum programming with PennyLane Lightning on HPC platforms" accompanying data and workloads

cpp20 cuda gpu hpc mpi openmp python quantum quantum-computing rocm supercomputing

Last synced: 10 Jun 2025

https://github.com/veriblock/nodecore-pow-cuda-miner

VeriBlock CUDA PoW Miner

cuda ptx vblake veriblock

Last synced: 03 Mar 2026

https://github.com/yhmtsai/ci_windows_cuda

This Repo creates the dockerfiles for using cuda in windows docker and provides the gitlab/github windows shared vm runner config.

continuous-integration cuda docker github-actions gitlab windows

Last synced: 14 Apr 2025

https://github.com/tank3-tk3/procesamiento-imagenes-cuda-opencv

Procesamiento de imágenes con CUDA y OpenCV

cuda image-processing opencv

Last synced: 11 Jul 2025

https://github.com/dexter2206/ising

Ising: a Python package for exactly solving abritrary Ising model instances using exhaustive search.

cuda ising optimization

Last synced: 02 Jul 2025