Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
![](https://explore-feed.github.com/topics/cuda/cuda.png)
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-02-15 00:06:58 UTC
- JSON Representation
https://github.com/jonastoth/cuda_raytracer
University project to implement a basic Raytracer in CUDA
Last synced: 02 Feb 2025
https://github.com/aaditya29/parallel-computing-and-cuda
Learning about Parallel Computing and GPU programming using CUDA.
c cpp cuda cuda-kernels cuda-programming nvidia-cuda openmp openmpi parallel-computing parallel-programming
Last synced: 07 Feb 2025
https://github.com/sebftw/interp2gpu
GPU-accelerated 2D spline interpolation, à la interp2(..., "spline"), in MATLAB.
cuda gpu gpu-acceleration matlab spline spline-interpolation
Last synced: 14 Dec 2024
https://github.com/dwain-barnes/llm-gguf-auto-converter
Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.
auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization
Last synced: 31 Jan 2025
https://github.com/yangfengzzz/tardis
Travel space and time by using autodiff and codegen
Last synced: 09 Feb 2025
https://github.com/phrutis/minikeys_for_sale
GPU program for brute MiniKeys Casascius Serie1 (22 characters)
bitcoin brute-force btc casascius cuda gpu minikeys program uncompressed
Last synced: 24 Jan 2025
https://github.com/islamshahil/live-video-analysis
Live Video Analysis using PyTorch
cuda deeplearning neural-network opencv-python python pytorch video-processing webcam
Last synced: 26 Jan 2025
https://github.com/llm-db/understanding-gpu-architecture-implications-on-llm-serving-workloads
Understanding GPU Architecture Implications on LLM Serving Workloads (Master Thesis, ETH Zürich, 2024)
cuda inference pytorch rocm transformer
Last synced: 14 Dec 2024
https://github.com/brainlesslabs/jalebi
C++ String algorithms for maximum performance
c-plus-plus cplusplus cpp cpp-library cpu cuda library parallel performance simd sse string string-matching vectorization
Last synced: 26 Jan 2025
https://github.com/abdelrahman-amen/active_learning_with_different_query_strategies
This project explores the implementation of active learning techniques, focusing on various query strategies to optimize the selection of informative data points for model training. It aims to reduce the amount of labeled data required while improving model performance, especially in scenarios with limited labeled data.
activelearning cuda entropy kldivergence margin numpy python pyto uncertainty
Last synced: 24 Jan 2025
https://github.com/sugarcane-mk/finetuning_wav2vec2
This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
asr asr-model cuda facebook fairseq fine-tuning finetuning huggingface librosa python torch transformers wav2vec2 wav2vec2-large-960h
Last synced: 24 Jan 2025
https://github.com/usman619/pdc
Parallel and Distributed Computing
cuda distributed-computing distributed-systems nextcloud
Last synced: 13 Jan 2025
https://github.com/marius311/cudadistributedtools.jl
A set of utility tools for multi-GPU + multi-process workflows
Last synced: 07 Feb 2025
https://github.com/lord-turmoil/cudacmakedemo
A demo for building CUDA program with CMake
Last synced: 23 Jan 2025
https://github.com/bjornmelin/ml-production-engineering
⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯
cuda deployment docker fastapi gpu-computing kubernetes mlops production
Last synced: 25 Jan 2025
https://github.com/bjornmelin/ml-vision-lab
👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸
computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics
Last synced: 25 Jan 2025
https://github.com/bjornmelin/nlp-engineering-hub
📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤
cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers
Last synced: 25 Jan 2025
https://github.com/corazon-code/pyloo
Python package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS) for Bayesian Modeling
bayes bayesian-data-analysis cross-validation cuda dump fuzzy-matching looker loot-table machine-learning minecraft model-comparison python spreadsheet tensorflow
Last synced: 09 Feb 2025
https://github.com/bjornmelin/ml-algorithm-playground
🧪 Core ML algorithm implementations with GPU acceleration. Featuring optimized implementations across various libraries with comprehensive analysis. 📈
algorithms cuda gpu-computing lightgbm machine-learning python scikit-learn xgboost
Last synced: 25 Jan 2025
https://github.com/parxd/fasterdl
cuBLAS/CUDA tensor library with auto-diff support
cublas cuda cudnn deep-learning machine-learning
Last synced: 06 Jan 2025
https://github.com/muneeb706/cuda
sample programs implemented using cuda (gpu)
cplusplus cuda gpu-programming
Last synced: 31 Jan 2025
https://github.com/kenwuqianghao/c4ai-cuda-birds
Homework assignments for C4AI Beginners in Research-Driven Studies
Last synced: 27 Dec 2024
https://github.com/xza85hrf/flag_prediction_project
This application predicts the name of a country (or countries) based on an input flag image. It uses advanced image processing techniques and deep learning models built with PyTorch to classify flags accurately.
cross-validation cuda data-augmentation docker efficientnetb0 flag-recognition image-classification machine-learning mixed-precision-training mobilenetv2 python pytorch resnet resnet-50 transfer-learning
Last synced: 31 Jan 2025
https://github.com/imanghd/parallelprocessing
CE Algorithms Lab @ SUT
cuda openmp parallel-algorithm parallel-processing systolic
Last synced: 02 Feb 2025
https://github.com/rajshrestha86/kmeans-clusterize-cuda
Implementation of K-Means algorithm from scratch using CUDA.
Last synced: 07 Feb 2025
https://github.com/kanchishimono/python-images
Ubuntu based Python container images, including CUDA images
container-image cuda docker dockerfile machine-learning python python3
Last synced: 26 Jan 2025
https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is
Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization
ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow
Last synced: 12 Oct 2024
https://github.com/jaidevd/ipec-fdp
cuda hpc keras mapreduce numba spark tensorflow
Last synced: 01 Feb 2025
https://github.com/djenriquez/ccminer
Dockerized ccminer
cuda docker ethereum mining nvidia nvidia-docker
Last synced: 01 Feb 2025
https://github.com/trentonom0r3/raft-analysis
Simple analysis script 'demotest.py' using RAFT optical flow to get flow vectors, occlusion masks, and Information on keyframes with significant motion changes
cuda flow-maps occlusion-masks opticalflow python pytorch raft
Last synced: 08 Feb 2025
https://github.com/popke523/rybki
A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.
Last synced: 08 Feb 2025
https://github.com/sferez/sspp_sparse_matrix_cuda
Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA
cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix
Last synced: 13 Jan 2025
https://github.com/vectorworksreal/sd-forge-docker
sd forge webui docker image.
ai-art artificial-intelligence containerization cuda docker docker-image forge image-to-image machine-learning sd-forge stable-diffusion stable-diffusion-webui text-to-image ubuntu webui
Last synced: 10 Feb 2025
https://github.com/phantom7knight/cuda-fusion
This project is for learning CUDA to understand the GPU work better.
cuda cuda-programming gpgpu gpu
Last synced: 08 Feb 2025
https://github.com/sebp/vscode-sycl-dpcpp-cuda
Sample project to use the VS Code Remote - Containers extension to develop SYCL applications for NVIDIA GPUs using the oneAPI DPC++ compiler.
cuda dpcpp fedora gpu-computing podman sycl vscode
Last synced: 08 Feb 2025
https://github.com/thalesmg/haskell-accelerate-parconc
Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell
accelerate cuda gpu-computing haskell parallel-computing
Last synced: 08 Feb 2025
https://github.com/neel-dandiwala/cuda-programs
Miscellaneous programs that grasp the concept of Parallel Computing
cuda gpu-programming parallel-programming
Last synced: 26 Dec 2024
https://github.com/0xhilsa/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 08 Feb 2025
https://github.com/amitkumarj441/deep-learning-on-your-finger
A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:
Last synced: 26 Jan 2025
https://github.com/dhruvsrikanth/fastconv
Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.
convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming
Last synced: 25 Dec 2024
https://github.com/larygwil/cuda-samples-old
nvidia cuda samples old (5.0 - 7.5)
Last synced: 02 Feb 2025
https://github.com/miferreiro/cdap-cuda
CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020
Last synced: 27 Dec 2024
https://github.com/rssr25/cuda
Following Cuda By Example book.
cpp cuda cuda-programming hpc shaders
Last synced: 24 Dec 2024
https://github.com/xstupi00/N-Body-CUDA
PCG - Parallel Computations on GPU - Project - N-Body-CUDA
cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit
Last synced: 23 Oct 2024
https://github.com/storterald/neural-network
Simple neural network implementation in C++ and CUDA
asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network
Last synced: 02 Feb 2025
https://github.com/ludgerpaehler/lulesh-enzyme
AD with Enzyme through Lulesh.
automatic-differentiation cuda cuda-programming gpu-computing high-performance-computing llvm-enzyme scientific-computing
Last synced: 05 Jan 2025
https://github.com/sangioai/sph
CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.
Last synced: 12 Feb 2025
https://github.com/sahil-rajwar-2004/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 19 Nov 2024
https://github.com/dirmeier/cuda-etudes
:notes: A collection of CUDA recipes
Last synced: 17 Jan 2025
https://github.com/aeyage/intraday_prices
GPU-accelerated portfolio optimisation
Last synced: 10 Feb 2025
https://github.com/lordofhyphens/gpu-path-delay-coverage
CUDA-based Path Delay Fault Coverage
Last synced: 28 Jan 2025
https://github.com/jackrekirby/raytracing-cuda
Raytracing using CUDA
cpp cuda raytracing raytracing-in-one-weekend
Last synced: 08 Feb 2025
https://github.com/fabulani/360ip-with-cuda
360° Image Processing with CUDA and OpenCV.
360-image 360-video cpp cuda image-processing opencv
Last synced: 08 Feb 2025
https://github.com/ydkn/htw-progko-cuda
Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.
cuda image-transformations opencv
Last synced: 11 Jan 2025
https://github.com/vladd12/libexecstd
Modern C++ library for using an execution context of computer devices
cpp cpp17 cuda gpu-acceleration gpu-computing
Last synced: 28 Jan 2025
https://github.com/lfrati/subpair
Fast pairwise cosine distance calculation and numba accelerated evolutionary matrix subset extraction 🍐🚀
Last synced: 16 Jan 2025
https://github.com/jpuigcerver/prob-phoc
Probabilistic relevance scores from PHOC embeddings
cuda keyword-spotting kws phoc pytorch
Last synced: 16 Jan 2025
https://github.com/bardiparsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 19 Nov 2024
https://github.com/cscfi/csc-env-julia
Julia language environment including MPI.jl, CUDA.jl and AMDGPU.jl preferences for HPC clusters at CSC.
amdgpu ansible cuda hpc julia julia-language mpi
Last synced: 22 Jan 2025
https://github.com/darshanakgr/meanfiltergpu
A gpu implementation of mean filter in CUDA
Last synced: 28 Jan 2025
https://github.com/programmergnome/cuda-codes
Snippet repository for learning parallel GPU programming with CUDA.
c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization
Last synced: 22 Jan 2025
https://github.com/snandasena/courseera_gpu_specilization
Example for Cuda streaming
Last synced: 14 Jan 2025
https://github.com/ramyacp14/document-based-question-and-answers
Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.
chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit
Last synced: 12 Oct 2024
https://github.com/ronaldsg20/compu-paralela
Códigos de ejemplo para computación paralela y distribuida
cuda opencv openmp posix-threads
Last synced: 05 Jan 2025
https://github.com/kentakoong/mtnlog
A simple multinode performance logger for Python
cuda lanta nvitop python slurm-cluster
Last synced: 22 Jan 2025
https://github.com/flavienbwk/tensorflow2-cuda-10.2-docker
Tensorflow 2.3, CUDA 10.2, Docker compatible image
cuda docker python3 tensorflow ubuntu1804
Last synced: 28 Jan 2025
https://github.com/flavienbwk/nvidia-cuda-mirror-docker
An all-in-one mirror for installing NVIDIA Docker.
cuda docker linux-mirror mirror nvidia nvidia-docker nvidia-docker2 offline offline-capable
Last synced: 28 Jan 2025
https://github.com/boostibot/bachelors
My bachelors thesis at CTU in Prague, Faculty of Nuclear Sciences and Physical Engineering supervised by Ing. Pavel Strachota, Ph.D
crystal-growth cuda finite-volume-method parallel-programming phase-field-method
Last synced: 18 Jan 2025
https://github.com/sustia-llc/gpu_logger_poc
GPU execution verification system with immutable Kafka logging. Monitors CUDA operations, validates GPU performance, and maintains auditable operation history. Built with Rust and Candle for reliable ML model execution tracking.
candle-core cuda docker gpu gpu-computing kafka logging machine-learning mlops monitoring nvidia performance-testing rust
Last synced: 12 Feb 2025
https://github.com/grindelfp/cuda-n-body-simulation
Simulation of N-Body movement using CUDA.
Last synced: 12 Feb 2025
https://github.com/baonguyen6742/uv-install-torch
Tutorial to install torch/pytorch with cuda using uv
cuda install installation package python pytorch resolver torch torchaudio torchvision tutorial uv
Last synced: 12 Feb 2025
https://github.com/wpjunior/cuda-numba-playground
Some uses of cuda with numba framework
Last synced: 13 Jan 2025
https://github.com/jonyandunh/stanforddogsresnet
A classifier for 120 dogs classified at Stanford Dogs Dataset, using the Pytorch framework and using custom Resnet for neural network learning
cuda deep-learning python pytorch resnet resnet-18 standford-dog stanford
Last synced: 14 Jan 2025
https://github.com/sydney-informatics-hub/computer-vision-fine-tuning
Fine tune a computer vision to solve your task locally, on HPC, in a container, or in the cloud!
computer-vision cuda deep-learning python
Last synced: 22 Jan 2025
https://github.com/azdavis/parallel-portrait-mode
Parallel Portrait Mode
cuda image-processing ispc openmp
Last synced: 28 Jan 2025
https://github.com/bhavinpatel4199/image-processing-with-opencv-and-cuda-on-google-colab
This repository demonstrates image processing using OpenCV with CUDA for GPU acceleration on Google Colab. It includes basics like displaying and manipulating images, alongside advanced techniques using CUDA to enhance performance. Ideal for learning GPU-accelerated image processing in Python.
computer-vision cuda google-colab gpu-acceleration high-performance-computing image-processing opencv pixel-manupulation
Last synced: 12 Feb 2025
https://github.com/dragonscypher/prompty
Tool for generating smart and secure prompts for language models!
autotokenizer bert-model cuda google-t5 llm python3 tensorflow threading
Last synced: 22 Jan 2025
https://github.com/raiszo/cs334
Journey thorugh Intro to Parallel Programming
Last synced: 25 Jan 2025
https://github.com/jpodivin/gputomata
Cellular automata running on CUDA capable GPUs
cellular-automata cellular-automaton cuda
Last synced: 27 Dec 2024
https://github.com/9prady9/archdock
Arch linux docker image for app development
arch-linux arrayfire cuda docker-image forge opencl
Last synced: 09 Feb 2025
https://github.com/adesoji1/visis_backend_assessment_submission-adesoji
Create a backend API to handle book information requests, and summary generation.
bart cache cuda data-extraction fastapi flask hugging-face hugging-face-hub llama postman-api python3 pytorch spacy sqlite3-database swagger-api tensorboard-visualizations transformer ubuntu2304
Last synced: 14 Feb 2025
https://github.com/bd2720/accesspatterns
Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.
c cache cuda cuda-toolkit performance-analysis performance-testing profiling
Last synced: 31 Jan 2025
https://github.com/branebb/nn-framework
Framework for creating neural networks using C++ and CUDA platform. This project is part of my final university assignment for bachelor's degree.
cmake cpp cuda cuda-programming
Last synced: 19 Nov 2024
https://github.com/mmz33/practice-cuda
c cpp cuda cuda-programming gpu-programming parallel-programming
Last synced: 22 Jan 2025
https://github.com/daelsepara/hipnewton
GPU Implementation of Newton Fractal Generator with Benchmarking
amd cuda fractal gpu gpu-compute gpu-computing hip newton parallel-computing rocm sdk
Last synced: 05 Feb 2025
https://github.com/aaaastark/nvidia-cuda-google-colab
Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).
c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition
Last synced: 15 Jan 2025
https://github.com/parlaynu/inference-tvm
Export ONNX to ApacheTVM and run inference in containerized environments.
apache-tvm cuda docker jetson-nano onnx raspberrypi4 x86-64
Last synced: 28 Jan 2025
https://github.com/fikri-rouzan/cuda-c-program-part-3
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025
https://github.com/fikri-rouzan/cuda-c-program-part-1
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025
https://github.com/fikri-rouzan/cuda-c-program-part-2
CUDA C program from NVIDIA course.
Last synced: 05 Feb 2025