An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/maliknaik16/parallel-computing

CUDA programming in C++ for high-performance computing using Nvidia GPUs, optimized for tasks like machine learning, or image processing

cores cpp cuda gpu makefile matrix nvcc optimization

Last synced: 10 Jun 2025

https://github.com/xiongsp/pytorch-docker

Pure Pytorch Docker Images. Support almost all combinations of Pytorch, Python, Ubuntu, CentOS, and CUDA. 纯净的Pytorch镜像,支持几乎各种Pytorch、Python、Ubuntu、CentOS、CUDA版本的组合。

centos cuda docker docker-image python3 pytorch ubuntu

Last synced: 17 Apr 2026

https://github.com/stdogpkg/cukuramoto

A python/CUDA pkg which solves numerically the kuramoto model through the Heun's method

complex-networks cuda kuramoto-model

Last synced: 28 Jan 2026

https://github.com/babak2/optimizedsum

Optimized Parallel Sum program demonstrating CPU vs GPU performance

cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio

Last synced: 27 Mar 2025

https://github.com/droduit/multiprocessor-architecture

Introduction to Multiprocessor Architecture @ EPFL

cuda multiprocessor multithreading openmp-parallelization

Last synced: 17 Apr 2026

https://github.com/l1cacheDell/CUDA_Code

Codes for learning cuda. Implementation of multiple kernels.

cuda cuda-programming

Last synced: 10 Mar 2025

https://github.com/cfries/javagpuexperiments

Repository used to demo OpenCL, JOCL, JCuda.

cuda

Last synced: 25 Apr 2026

https://github.com/alexjmercer/fractal-art

Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!

cmake cmakelists cpp20 cuda cuda-programming fractal-rendering graphics mandelbrot multithreading sfml2

Last synced: 05 Mar 2026

https://github.com/mulx10/firefly

Enhancing Object Detection in using Thermal Imaging for thin cross-section unidentifiable objects(eg. cyclist, pedestrians).

autonomous-cars autonomous-navigation autonomous-vehicles c cuda object-detection thermal-camera yolov3

Last synced: 03 Sep 2025

https://github.com/programmer-rd-ai/detectx

A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.

coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet

Last synced: 10 Jun 2025

https://github.com/headless-start/data-augmentation-impact

This repository contains effect of Data Augmentation of Training Set during Model Training.

augmented-images cuda data gpu keras matplotlib mnist opencv-python python3 tensorflow training-data

Last synced: 05 Apr 2026

https://github.com/digimortl/libguess

Patches that give Bitcoin Core an ability of CUDA mining

bitcoin c-plus-plus cryptocurrency cuda

Last synced: 16 Apr 2026

https://github.com/juntyr/necsim-rust

Spatially explicit biodiversity simulations using a parallel library written in Rust

biodiversity cuda mpi necsim rust simulation

Last synced: 22 Mar 2025

https://github.com/zeloe/juce_cuda_convolution

GPU acceleration for efficient, high-quality audio processing.

audio audio-processing convolution cuda dsp juce

Last synced: 03 Mar 2026

https://github.com/tank3-tk3/pi-calculation-cpu-gpu

PI calculation with CPU and GPU

c cpp cuda parallel-computing pi

Last synced: 13 Apr 2026

https://github.com/alpinebuster/arkime-docker-compose

Deploy Arkime with GPU-accelerated Rust/Python parsers and custom plugins using Docker Compose.

arkime c cuda deep-neural-networks docker docker-compose llm machine-learning networking pcap pcapng python rust traffic-analysis

Last synced: 16 Apr 2026

https://github.com/andreabak/whispersubs

Generate subtitles for your video or audio files using the power of AI

ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper

Last synced: 15 Feb 2026

https://github.com/demoriarty/doksparse

sparse DOK tensors on GPU, pytorch

cuda pytorch sparse

Last synced: 28 Jun 2026

https://github.com/orlandopalmeira/trabalho-cp-2023-2024

Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)

computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei

Last synced: 18 May 2026

https://github.com/mazharuddin-mohammed/semidgfem

High-performance TCAD Simulator Using Discontinuous Galerkin FEM

cuda discontinuous-galerkin-method tcad tcad-device-simulator

Last synced: 15 Jun 2025

https://github.com/tthebc01/cudaconda3

Lightweight container environment with Cuda, Miniconda3, and Jupyter Lab.

cuda docker gpu jupyterlab marimo-notebook miniconda3 reverse-proxy-application

Last synced: 11 Feb 2026

https://github.com/andreasholt/cusmc

A CUDA-accelerated Statistical Model Checker for Stochastic Timed Automata

cuda smc

Last synced: 11 Feb 2026

https://github.com/betarixm/cuecc

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cryptography ctypes cuda ecc postech secp256k1

Last synced: 12 May 2025

https://github.com/hanzhi713/bitonic-sort

In-place GPU sort with bitonic sort

bitonic-sort cuda gpu in-place sorting

Last synced: 09 Feb 2026

https://github.com/fattorib/thunderkittens-simple-gemm

Simple Tensorcore GEMM in ThunderKittens

cuda gemm gpu thunderkittens

Last synced: 09 Feb 2026

https://github.com/lukasboettcher/msc-code

This is the repo for my master thesis on a GPU accelerated andersen analysis.

andersen-analysis clang cuda llvm static-analysis

Last synced: 16 Jan 2026

https://github.com/wendylabsinc/tensorrt-swift

TensorRT Swift 6.2 Bindings for Linux

cuda nvidia swift tensor tensorrt

Last synced: 01 Feb 2026

https://github.com/dpbm/qml-course

Minicurso de quantum Machine learning

cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow

Last synced: 31 Jan 2026

https://github.com/gjbex/gpu-programming

Material for a training on portable GPU programming

cuda gpu kokkos openmp openmp-off stl thrust

Last synced: 08 Feb 2026

https://github.com/elftausend/sliced

Array operations with automatic differentiation on CPU and GPU

autograd automatic-differentiation cuda custos matrix opencl

Last synced: 31 Jan 2026

https://github.com/seungjaelim/cuda.tutorial

References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)

cuda gpu-programming nsight-compute nsight-systems

Last synced: 07 Feb 2026

https://github.com/copperfr/blendervxkex

Windows 7 CUDA & OptiX support for Blender 4.x

blender cuda cycles-renderer optix vxkex windows-7

Last synced: 20 Jan 2026

https://github.com/infotrend-inc/ctpo-demo_projects

Jupyter Notebook examples using CTPO as their source container.

cuda opencv pytroch tensorflow2

Last synced: 14 Apr 2026

https://github.com/trahay/mpi-wattmeter

MPI-Wattmeter measures the power consumption of MPI programs

carbon-emissions cuda energy-consumption energy-monitor gpu hpc mpi

Last synced: 17 May 2026

https://github.com/tawssie/zmpy3d_pt

Python implementation of 3D Zernike moments with PyTorch

3d-zernike cuda gpu protein-structure python pytorch structural-bioinformatics superposition zernike-moments

Last synced: 24 Oct 2025

https://github.com/openspeedshop/cbtf-argonavis-gui

Baseline for next generation Open|SpeedShop Graphical User Interface (GUI). The primary focus of this GUI will be the processing and display of CUDA collector performance data. However, there will be refactoring phases to adopt the GUI to support the processing and display of any collector performance data.

cuda performance profiler profiling

Last synced: 18 Apr 2026

https://github.com/tortillazhawaii/rr_sort

Various sorting implementations using distributed and parallel methods

bazel cpp cuda java openmp spark threads

Last synced: 14 Apr 2026

https://github.com/bonj4/wiki

This repository contains documentation and installation scripts for various tools and libraries.

cuda pangolin pybind11 sfm tensorrt

Last synced: 17 Jan 2026

https://github.com/tyler-romero/aegae

Learning Triton / CUDA

cuda triton

Last synced: 11 Apr 2026

https://github.com/hadv/vaneth

GPU-accelerated CREATE2 vanity address miner for Ethereum

create2-contract-deployment cuda ethereum gpu gpu-acceleration gpu-programming open-cl vanity-address

Last synced: 21 Jan 2026

https://github.com/kpetridis24/four-russians-algorithm

Boolean matrix multiplication accelerated by the four-Russians algorithm

c cuda gpu high-performance matrix-multiplication preprocess

Last synced: 29 May 2026

https://github.com/bdwhst/fluora

A CUDA PBR path tracer

cpp cuda pathtracing pbr rendering

Last synced: 13 Feb 2026

https://github.com/matthewfeickert/cuda-tf-torch

An Ubuntu 18.04 NVIDIA Docker image with CUDA 10.1 CuDNN 7 with TensorFlow and PyTorch

cuda cuda-101 cudnn cudnn-v7 docker docker-image gpu nvidia-docker nvidia-gpu pytorch tensorflow torch

Last synced: 07 Jan 2026

https://github.com/toxy4ny/artaxerxes

Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs

cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools security-tools stress-testing

Last synced: 08 Oct 2025

https://github.com/alpha74/cuda_basics

Nvidia NVCC CUDA programs for begineers.

c cpp cuda cuda-programs nvcc nvidia parallel-computing parallel-programming

Last synced: 08 May 2026

https://github.com/tvanfossen/entropic

Local-first agentic inference engine in C/C++. Multi-tier model routing, grammar-constrained output, MCP tool servers. Embeddable via C ABI.

agentic-ai agentic-framework cpp cpp20 cuda edge-ai embedded-ai gbnf gguf grammar-constrained-decoding inference-engine llama-cpp llm local-llm mcp on-device-ai privacy-first tool-calling

Last synced: 30 May 2026

https://github.com/boltzmannentropy/vllm-5090

vLLM-5090: Docker Container for RTX 5090 on WSL2/Windows

5090 cuda docker vllm

Last synced: 08 Oct 2025

https://github.com/dujonwalker/nixos-config-x86_64-cuda

This repository contains my NixOS configuration optimized for 64-bit x86 systems with NVIDIA CUDA support, featuring a Plasma 6 desktop environment and a variety of essential applications for development, multimedia, and productivity. It serves as a backup for easy restoration and setup on new installations.

cuda flatpak nix nixos nixos-configuration ollama

Last synced: 17 Jan 2026

https://github.com/szymon423/tsp-cpu-vs-gpu

Simple brute force approach to solve travelling salesman problem with CPU and GPU

cuda tsp

Last synced: 11 Mar 2025

https://github.com/kar-dim/watermarking-gpu

Code for my Diploma thesis at Information and Communication Systems Engineering (University of the Aegean, School of Engineering) with title "Efficient implementation of watermark and watermark detection algorithms for image and video using the graphics processing unit". Part 2 / GPU

arrayfire cpp cuda ffmpeg gpu image-processing opencl parallel-computing video-processing watermark-image watermarking

Last synced: 09 Apr 2025

https://github.com/yosh-matsuda/gpu-ptr

Cross-platform GPU smart pointer with C++20 range support

cpp cpp20 cuda gpu header-only hip

Last synced: 17 Jan 2026

https://github.com/brocbyte/realtime-deformations

Snow simulation (Material Point Method)

cuda glm material-point-method opengl

Last synced: 10 Aug 2025

https://github.com/kagof/julia-image-processing

Image processing programs written in Julia

cuda image-processing julia

Last synced: 18 May 2026

https://github.com/muhac/jupyter-pytorch-docker

JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.

conda-environment cuda docker jupyterlab pytorch

Last synced: 01 Oct 2025

https://github.com/eshibusawa/cupy-cuda

Learn CUDA programming essentials with CuPy, from basic kernels to advanced memory patterns

cooperative-thread-array cub cuda cupy gpu parallel-computing python

Last synced: 15 Jun 2025

https://github.com/ehsanmok/cs-521

UBC CS 521: Parallel Computing and Architectures

cuda erlang parallel-algorithm parallel-computing

Last synced: 16 May 2026

https://github.com/kchristin22/ising_model

Implementation of a cellular automaton on GPU using different features of CUDA

cellular-automaton cuda gpu-programming hpc ising-model parallel-computing

Last synced: 15 Mar 2025

https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator

This simulator computes all possible intersections for a very small timestep for a particle model

cpp20 cuda simulator

Last synced: 17 Apr 2026

https://github.com/emilienmendes/gpgpu

Parallélisation et optimisation de reconnaissance de point dans une image

cuda gpgpu parallel-programming

Last synced: 28 Oct 2025

https://github.com/yooodleee/hello-cuda

👽Nice to meet you, CUDA!👽

c cc cuda gpgpu multiprocessing

Last synced: 09 Apr 2026

https://github.com/amirbroker/cudadtw

Use CUDA with numba for Dynamic Time Warping

cuda dtw dynamic-time-warping gpu numba

Last synced: 16 Apr 2026

https://github.com/umer-farooq-cs/canny-edge-detector

High-performance Canny edge detector with CPU and CUDA implementations. Loads PGM images, performs Gaussian smoothing, gradients, non-max suppression, and hysteresis. Benchmarks both paths, outputs edge maps, and reports speedup. Simple Makefile, sample images included.

c canny-edge-detection computer-vision cpp cuda gpu high-performance-computing image-processing nvcc pgm

Last synced: 18 Apr 2026

https://github.com/mre/talks

...mostly Computer Science related.

computer-science cuda talks tech-talks

Last synced: 28 Apr 2026

https://github.com/sd7campeon/yelp-sentiment-analysis-with-python-bs4-and-llm

A scalable pipeline for automated extraction, preprocessing, and sentiment analysis of Yelp reviews. Uses advanced HTTP requests, HTML parsing, and text normalization (tokenization, stopword removal, lemmatization) to enable precise polarity and subjectivity analysis for consumer insights and business analytics.

beautifulsoup beautifulsoup4 business-analytics cuda data-analysis nlp-machine-learning nltk opinion-mining pandas python python3 requests-library-python sentiment-analysis text-preprocessing textblob torch web-scraping yelp-reviews

Last synced: 06 May 2026

https://github.com/hyunjinno/multicore_computing

A repository of multicore programming in Java and C.

c cpp cuda java multithreading openmp thread thrust

Last synced: 18 Apr 2026

https://github.com/wallneradam/docker-ccminer

CCMiner (tpruvot version) Docker Builder

ccminer cuda docker gpu litecoin miner monero nvidia nvidia-docker

Last synced: 18 Apr 2026

https://github.com/jtompuri/weighted-voronoi-stippling

High-performance weighted Voronoi stippling implementation. Exports PNG and TSP files. Visualizes TSP tours as continuous line drawings.

computer-graphics cuda gpu-acceleration lloyd-relaxation numba python stippling traveling-salesman tsp voronoi

Last synced: 18 May 2026

https://github.com/microo8/micronn

Simple neural network library with backpropagation using CUDA

c cuda neural-network

Last synced: 19 May 2026

https://github.com/steleman/openai-triton

Fork of OpenAI's Triton compiler v3.4.0 using LLVM 21.1.0 / 21.1.1 on Fedora 41+

cuda fedora linux llvm mlir mlir-dialect openai rocm triton

Last synced: 08 Apr 2026

https://github.com/senli1073/docker-gpu-monitor

A lightweight GPU monitor designed for real-time web-based viewing of GPU server status.

container cuda docker flask gpu gpu-monitoring linux memory-usage nvidia-smi web

Last synced: 05 Apr 2026

https://github.com/mayukhdeb/patrick

Tiny neural net library written from scratch with cupy :warning: under construction :warning:

cuda deep-learning gpu-computing machine-learning neural-network regression

Last synced: 08 May 2026

https://github.com/greg-tarr/fastsimplex

CUDA/MPS accelerated 2D & 3D simplex noise generation.

cuda mps noise-generator python simplex-noise

Last synced: 20 Apr 2026

https://github.com/shivendrra/axgrad

lightweight tensor library that contains it's own auto-diff engine like pytorch

autograd cuda pytorch scratch-implementation tinygrad

Last synced: 08 May 2026

https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition

This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.

audio-processing cpp cuda fourier-transform python

Last synced: 10 May 2026

https://github.com/5had3z/torch-discounted-cumsum-nd

PyTorch Discounted Cumsum with Autograd (CPU + CUDA)

cuda machine-learning pytorch

Last synced: 18 Apr 2026

https://github.com/xihuai18/image-processing-in-cuda

Implementation of Image Processing Method

cuda imageprocessing

Last synced: 04 Oct 2025

https://github.com/sohhamseal/scalable-systems-programs

A little less effort to learn parallel programming...

cuda mpi openmp

Last synced: 18 Apr 2026

https://github.com/rkv0id/automata-vtk

Multi-dimensional Cellular Automata visualization using Python's VTK bindings on top of a CUDA-parallel grid updates.

cellular-automata cuda game-of-life python vtk

Last synced: 19 Apr 2026

https://github.com/bl33h/pythagoreantheorem

A program that calculates the Pythagorean theorem for a large number of elements using GPU parallel processing.

arrays cuda kernel parallel-programming pythagoras pythagorean-theorem

Last synced: 19 May 2026

https://github.com/m-torhan/cuda-stl-renderer

CUDA C++ implementation of STL file renderer using ray tracing method

cuda

Last synced: 25 Feb 2026

https://github.com/makischristou/mandelbrot

Mandelbrot set visualizer using CUDA.

cpp cuda gpu mandelbrot nvidia renderer rust

Last synced: 09 Apr 2026

https://github.com/franciscoda/psvm

R package and C++ library that allows training SVM models in a GPU using CUDA and predicting out-of-sample data. A support vector machine (SVM) is a type of machine learning model that is trained using supervised data to classify samples.

cpp cpp17 cuda machine-learning r svm-classifier svm-training

Last synced: 18 Apr 2026

https://github.com/bokutotu/cudnn_graph_api_example

cudnn graph api example

cuda cudnn cudnn-v8

Last synced: 04 May 2026

https://github.com/hatamiarash7/cuda-python

GPU programming using CUDA & Python

cuda gpu gpu-computing gpu-programming python

Last synced: 29 Apr 2026

https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection

Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.

cpp cuda opencv python yolo

Last synced: 12 Apr 2026

https://github.com/bl33h/productoftwovectors

This code utilizes CUDA for parallel vector multiplication on a GPU, demonstrating GPU's acceleration capabilities.

cuda gpu kernel paralelism parallel-programming product vector

Last synced: 16 May 2026

https://github.com/subatomicplanets/simplebitcoinminer

A simple Bitcoin C++ and CUDA solo miner

bitcoin cpp cryptocurrency cuda miner

Last synced: 19 Apr 2026

https://github.com/manishklach/thermal-observatory

A generic thermal observability framework for CPU, GPU, board, and platform telemetry across vendor APIs, kernel interfaces, and runtime correlation layers.

amd arm64 cuda linux nvidia nvml observability rocm telemetry thermal-framework thermal-monitoring x86-64

Last synced: 09 Jun 2026