CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/zelosleone/audiobook-generator
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech
Last synced: 05 May 2026
https://github.com/hurbalurba/quick-llama.cpp-server
The framework for posting a more modern cuda image for llama.cpp with cuda13 for just newer cards with RPC support. Started as just learning how to compile llama.cpp custom.
cuda cuda13 devops docker dockerbuild gguf llamacpp llm rpc
Last synced: 05 May 2026
https://github.com/j89103138/yolov11-traffic-sign
This repository contains a YOLOv11 project for training, detection, and benchmarking of traffic signs. The project utilizes CUDA acceleration to enhance performance and efficiency in real-time traffic sign detection and evaluation.
cuda opencv python pytorch traffic traffic-sign yolov11
Last synced: 05 May 2026
https://github.com/barrrry1/claymore-s-dual-miner
Claymore's Dual Miner is a powerful GPU mining software designed for Ethereum (ETH) and simultaneous dual mining of coins like Decred, Siacoin, Pascal, and Lbry. It supports AMD and NVIDIA GPUs, leveraging OpenCL and CUDA optimization for maximum hashrate. Features include automatic GPU tuning, detailed statistics, and stability watchdog.
blockchain crypto-mining cryptocurrency cuda eth ethereum gpu-mining mining mining-pool opencl
Last synced: 05 May 2026
https://github.com/pauloruszel/yolo11_face_detection
cuda nvcc nvidia-gpu pip python3 pytorch widerface-dataset yolo11
Last synced: 05 May 2026
https://github.com/jakubfr4czek/concurrent-gauss-elimination
Concurrent gaussian elimination algorithm implemented using traces theory. Parallelism has been achieved employing CUDA cores.
agh agh-ust agh-wi conda cuda cuda-kernels cuda-toolkit diekert-graph graphviz java python python3 traces-theory
Last synced: 05 May 2026
https://github.com/abdelrahman-amen/active_learning_with_different_query_strategies
This project explores the implementation of active learning techniques, focusing on various query strategies to optimize the selection of informative data points for model training. It aims to reduce the amount of labeled data required while improving model performance, especially in scenarios with limited labeled data.
activelearning cuda entropy kldivergence margin numpy python pyto uncertainty
Last synced: 06 May 2026
https://github.com/hritiksauw199/human-face-to-cartoon-conversion-using-optimized-cyclegan
Transform real human faces into cartoon-style images using a reduced CycleGAN architecture optimized for efficiency and quality.
cuda cyclegan data-science deep-learning deep-neural-networks gan human-cartoon matplotlib neural-network python pytorch torchvision
Last synced: 06 May 2026
https://github.com/iglee/jax-cuda-eicl-exp-docker
Docker for getting jax to work with cuda, for reproducing ml experiments like eicl. Sure, let's NOT make a compatibility matrix and let people fight for their lives on cuda
cuda docker jax jaxline ml-engineering ml-experiments tensorflow
Last synced: 06 May 2026
https://github.com/raiszo/cs334
Journey thorugh Intro to Parallel Programming
Last synced: 06 May 2026
https://github.com/r00tens/text-classifier
Naive Bayes classifier for text classification with CPU and GPU (CUDA)
classification classifier cpp cuda machine-learning naive-bayes
Last synced: 06 May 2026
https://github.com/iamfaham/model-inference-profiler
A PyTorch-based tool for profiling deep learning model inference performance, analyzing computational bottlenecks, and visualizing resource utilization.
cuda memory pytorch visualizations
Last synced: 06 May 2026
https://github.com/rosnavigator/parallelkmeansimagecompressor
Parallel KMeans-based image quantization compressor that reduces the number of colors in an image while preserving visual quality. It uses KMeans clustering for color quantization and supports sequential, OpenMP, MPI, and CUDA implementations for performance and scalability. PoliMi - Advanced Methods for Scientific Computing (2023-2024)
boost clustering colors compression cuda image-quantization kmeans kmeans-clustering lossy-compression mpi odette opencv openmp parallel-computing parallel-programming performance polimi scalability sl-train
Last synced: 06 May 2026
https://github.com/sarodyatawatta/flagpol
Energy and polarization based interference mitigation
cuda energy mixed-precision polarization radio-frequency-interference radio-interferometry reinforcement-learning statistical-inference
Last synced: 06 May 2026
https://github.com/jamesnulliu/learning-programming-massively-parallel-processors
Leaning notes of Programming Massively Parallel Processors, 4-th edition.
Last synced: 06 May 2026
https://github.com/mka-codelake/wispy
Minimalist push-to-talk dictation tool for Windows. Faster Whisper, local, offline.
cuda dictation faster-whisper local offline portable push-to-talk python speech-to-text stt transcription voice-input whisper windows
Last synced: 06 May 2026
https://github.com/sebp/vscode-sycl-dpcpp-cuda
Sample project to use the VS Code Remote - Containers extension to develop SYCL applications for NVIDIA GPUs using the oneAPI DPC++ compiler.
cuda dpcpp fedora gpu-computing podman sycl vscode
Last synced: 06 May 2026
https://github.com/jpuigcerver/prob-phoc
Probabilistic relevance scores from PHOC embeddings
cuda keyword-spotting kws phoc pytorch
Last synced: 07 May 2026
https://github.com/drilonaliu/parallel-sierpinski-triangle
GPU-accelerated Sierpinski Triangle generation with CUDA and OpenGL interoperability.
cuda fractals gpu parallel-programming sierpinski-triangle
Last synced: 07 May 2026
https://github.com/yuuuuurei/yolo-sibi
Real-time SIBI hand gesture detection using YOLOv8 and deep learning classifiers.
bahasa-indonesia bahasa-isyarat cuda deep-learning hand-gesture hand-gesture-recognition pytorch real-time sibi sign-language yolo yolov8
Last synced: 07 May 2026
https://github.com/drilonaliu/parallel-koch-snowflake
GPU-accelerated Koch Snowflake generation with CUDA and OpenGL interoperability.
cuda fractals gpu koch-snowflake parallel-programming
Last synced: 07 May 2026
https://github.com/muhamadajiw/parallel-matrix-inversion
A parallel program for matrix inversion using MPI, OpenMP, and CUDA
Last synced: 07 May 2026
https://github.com/shreya888/learning-cuda-with-cpp-and-pytorch
My notes, code, & insights will be recorded here while learning CUDA with C++ and PyTorch
Last synced: 07 May 2026
https://github.com/stevenchang5/canny_edge
Implementation of canny edge detection, with option to use cuda to improve performance
Last synced: 07 May 2026
https://github.com/rssr25/cuda
Following Cuda By Example book.
cpp cuda cuda-programming hpc shaders
Last synced: 07 May 2026
https://github.com/wpjunior/cuda-numba-playground
Some uses of cuda with numba framework
Last synced: 07 May 2026
https://github.com/pankajarm/ethereum-mining-cuda
cuda ethereum ethereum-mining ethminer ubuntu1604
Last synced: 08 May 2026
https://github.com/not-ml/ml-3
A PyTorch-based Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset, featuring advanced architecture, data augmentation, GPU support, and dynamic learning rate scheduling.
ai cifar10 cnn cuda gpu image-classification machine-learning modeltraining python pytorch torchvision
Last synced: 08 May 2026
https://github.com/jimmygizmo/tensorpup
Machine-learning model training using parallelization strategies on multiple serverless GPU instances.
ai cuda cudnn distributed gpu serverless tensorflow
Last synced: 08 May 2026
https://github.com/popke523/rybki
A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.
Last synced: 08 May 2026
https://github.com/leo27945875/parallel_pso
cpp cuda openmp parallel-programming particle-swarm-optimization pthread pybind11 python
Last synced: 08 May 2026
https://github.com/sydney-informatics-hub/computer-vision-fine-tuning
Fine tune a computer vision to solve your task locally, on HPC, in a container, or in the cloud!
computer-vision cuda deep-learning python
Last synced: 09 May 2026
https://github.com/sugarcane-mk/finetuning_wav2vec2
This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
asr asr-model cuda facebook fairseq fine-tuning finetuning huggingface librosa python torch transformers wav2vec2 wav2vec2-large-960h
Last synced: 09 May 2026
https://github.com/ginkobalboa/parfis
Particles and field simulator. Written in C++ with Python bindings. The algorithm is based on the particle-in-cell (PIC) method used for interacting many-particle systems.
cpp cuda physics-simulation python
Last synced: 09 May 2026
https://github.com/dbklim/optimized_tensorflow_wheels
Optimized versions TensorFlow and TensorFlow-GPU for specific CPUs and GPUs (for both old and new).
cuda nvidia-cuda nvidia-gpu tensorflow tensorflow-community-wheels tensorflow-gpu tensorflow-packages tensorflow-whells wheels
Last synced: 09 May 2026
https://github.com/lfrati/subpair
Fast pairwise cosine distance calculation and numba accelerated evolutionary matrix subset extraction 🍐🚀
Last synced: 09 May 2026
https://github.com/nick8592/ubuntu-20.04-cuda-cudnn-pytorch
cuda cuda-toolkit cudnn python3 pytorch ubuntu2004
Last synced: 09 May 2026
https://github.com/xorengine/marvin4000
Real-time audio translation using Whisper + SeamlessM4T / NLLB-200
ai asr audio-processing consumer-hardware cuda gpu-accelerated machine-learning multilingual nllb nmt pytorch real-time seamlessm4t speech-recognition transcription translation whisper
Last synced: 09 May 2026
https://github.com/starlitdreams/lunar-landing
This project implements a DQN agent using PyTorch to solve the LunarLander-v2 environment from OpenAI Gym. The agent learns to control the lunar lander using experience replay and a target network, aiming to maximize rewards by landing smoothly. Uses CUDA for computation.
artificial-intelligence cuda deep-learning gymnasium neural-network neural-networks numpy nvidia-gpu python python3 torch
Last synced: 09 May 2026
https://github.com/donaurelio/ansible-playbooks
A Bunch of ansible-playbooks that automate computer infraestruture provisioning
ansible-playbooks cuda docker gromacs openmpi
Last synced: 09 May 2026
https://github.com/michaelfranzl/image_fah-client
Dockerfile for Folding@home client with AMD and Nvidia GPGPU support
container cuda debian docker foldingathome gpu-computing opencl
Last synced: 09 May 2026
https://github.com/edumucelli/build-tensorflow
Build Tensorflow from source using a Dockerfile
Last synced: 10 May 2026
https://github.com/chris-official/pytorchgaf
PyTorch accelerated GAF transform
cuda gpu gramian-angular-fields image-analysis python pytorch time-series
Last synced: 10 May 2026
https://github.com/neuraladitya/neural_network_c
Neural Network C is an advanced neural network implementation in pure C, optimized for high performance on CPUs and NVIDIA GPUs.
artificial-intelligence bayesian-optimization c-programming convolutional-neural-networks cuda deep-learning encryption gpu-computing high-performance-computing machine-learning mpi multi-gpu neural-network openmp parallel-computing quantization real-time-monitoring secure-computing tensor-cores transformers
Last synced: 10 May 2026
https://github.com/sebftw/interp2gpu
GPU-accelerated 2D spline interpolation, à la interp2(..., "spline"), in MATLAB.
cuda gpu gpu-acceleration matlab spline spline-interpolation
Last synced: 10 May 2026
https://github.com/cashcon57/open-supersampling
OpenSuperSampling (OSS) — vendor-agnostic open-source RT denoising, upscaling, and frame extrapolation
cuda deep-learning dlss frame-generation fsr game-engine gaussian-splatting open-source real-time-rendering super-resolution upscaling
Last synced: 10 Jun 2026
https://github.com/dlr-amr/t8gpu
Header-only finite volume library targetting GPUs using t8code as meshing backend.
adaptive-mesh-refinement cuda finite-volume gpgpu-computing hpc mesh mpi parallel-computing simulation
Last synced: 10 May 2026