Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/kobinarth-panchalingam/parallel-and-concurrent-programming

Semester - 7 | CS4533 - Parallel and Concurrent Programming | Labs

c concurrent-programming cuda java openmp pthreads

Last synced: 08 Jan 2025

https://github.com/starlitdreams/lunar-landing

This project implements a DQN agent using PyTorch to solve the LunarLander-v2 environment from OpenAI Gym. The agent learns to control the lunar lander using experience replay and a target network, aiming to maximize rewards by landing smoothly. Uses CUDA for computation.

artificial-intelligence cuda deep-learning gymnasium neural-network neural-networks numpy nvidia-gpu python python3 torch

Last synced: 05 Feb 2025

https://github.com/seieric/pytorch-mpi-singularity

Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel

cuda hpc nvidia openmpi pytorch singularity utokyo

Last synced: 05 Feb 2025

https://github.com/cooliron2311/cumd5bf

CUDA based md5 password bruteforcer

cuda md5 python

Last synced: 05 Feb 2025

https://github.com/rdma-from-gpu/.github

Public code release for our paper "Toward GPU-centric Networking on Commodity Hardware"

cuda gpu linux network rdma research

Last synced: 05 Feb 2025

https://github.com/iglee/jax-cuda-eicl-exp-docker

Docker for getting jax to work with cuda, for reproducing ml experiments like eicl. Sure, let's NOT make a compatibility matrix and let people fight for their lives on cuda

cuda docker jax jaxline ml-engineering ml-experiments tensorflow

Last synced: 05 Feb 2025

https://github.com/cuda8/brainwords2

GPU brainflayer for sale $250

brain brainflayer brainwords cuda gpu key pass passphrase private

Last synced: 23 Oct 2024

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 05 Feb 2025

https://github.com/aartintelligent/ops-cuda

OPS CUDA Stacks

cuda docker

Last synced: 05 Feb 2025

https://github.com/rajshrestha86/kmeans-clusterize-cuda

Implementation of K-Means algorithm from scratch using CUDA.

c cuda kmeans-clustering

Last synced: 07 Feb 2025

https://github.com/deepschneider/tinygrad-universal

Universal version of Tinygrad with CUDA and OpenCL support

autograd automatic-differentiation cuda pycuda pyopencl tinygrad tinygrad-cuda

Last synced: 16 Jan 2025

https://github.com/jamesnulliu/learning-programming-massively-parallel-processors

Leaning notes of Programming Massively Parallel Processors, 4-th edition.

cuda notes pytorch

Last synced: 02 Feb 2025

https://github.com/neugence/acehub

AI Champions for Excellence: Fresh, informative courses and content designed to help developers, researchers, and leaders advance in the field of AI.

ai cuda cv ml mlops nlp pytorch rl rlhf tensorflow

Last synced: 13 Oct 2024

https://github.com/sangioai/torchpace

PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.

cuda pytorch transformer

Last synced: 02 Feb 2025

https://github.com/bhavinpatel4199/image-processing-with-opencv-and-cuda-on-google-colab

This repository demonstrates image processing using OpenCV with CUDA for GPU acceleration on Google Colab. It includes basics like displaying and manipulating images, alongside advanced techniques using CUDA to enhance performance. Ideal for learning GPU-accelerated image processing in Python.

computer-vision cuda google-colab gpu-acceleration high-performance-computing image-processing opencv pixel-manupulation

Last synced: 12 Feb 2025

https://github.com/bjornmelin/edge-ai-engineering

📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖

cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite

Last synced: 02 Feb 2025

https://github.com/phrutis/brainwords2

GPU brainflayer for sale $250

brain brainflayer brainwords cuda gpu key pass passphrase private

Last synced: 05 Feb 2025

https://github.com/ne0nwinds/gpupuzzles

My solutions to srush/GPU-Puzzles using CUDA

cuda

Last synced: 02 Feb 2025

https://github.com/atelierarith/julia_gpu_playground

For those who want use Julia with GPU

cuda docker docker-compose julia

Last synced: 06 Feb 2025

https://github.com/ysl1016/cudadigitfilter

CUDA-based parallel image filtering system for MNIST dataset

computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing

Last synced: 02 Feb 2025

https://github.com/sephiroth7712/k-nearest-neigbours

Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.

cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark

Last synced: 06 Feb 2025

https://github.com/ypatel2022/gpu-accelerated-game-of-life

Accelerating Game of Life Compute with CUDA.

cpp cuda gpu

Last synced: 28 Dec 2024

https://github.com/bjornmelin/ai-system-design

🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️

architecture cuda distributed-systems engineering gpu-computing production scalability system-design

Last synced: 02 Feb 2025

https://github.com/belrbez/ship-graphic-qt-qml-cuda-c

Client-Server application for Rocket driving in QML graphics

c client-server cpp cuda qml qt5 rocket

Last synced: 06 Feb 2025

https://github.com/srivanijayanthi/pytorch-onnx-tensorrt-conversion

This repository provides a step-by-step guide to converting a PyTorch model to the ONNX format and subsequently to TensorRT for optimized inference.

cuda onnx pytorch tensorrt

Last synced: 24 Jan 2025

https://github.com/jiriklepl/bits-knn-jpdc2024

Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search

bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k

Last synced: 26 Jan 2025

https://github.com/0xhilsa/pycu

PyCu

cpp cuda nvcc python3

Last synced: 22 Dec 2024

https://github.com/bjornmelin/tensorflow-evolution

🧠 Progressive journey through TensorFlow, from basics to advanced architectures. Featuring custom training pipelines, optimized GPU implementations, and production-ready models. Includes CUDA optimizations for large-scale training. 🚀

cuda deep-learning gpu-optimization machine-learning ml-engineering neural-networks python tensorflow

Last synced: 24 Jan 2025

https://github.com/wiktor2718/matrix_flow

Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.

adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust

Last synced: 26 Jan 2025

https://github.com/sbstndb/neural_k

A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend

ai cuda kokkos library neural-network openmp

Last synced: 05 Feb 2025

https://github.com/timdev-r/cv-ground-truth-extraction

(Dump) Helper for ground truth extraction, movement analytics and silhouette visual demonstration

computer-vision cuda ground-truth intel-realsense pandas python

Last synced: 21 Jan 2025

https://github.com/bjornmelin/cuda-core-projects

🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻

cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing

Last synced: 24 Jan 2025

https://github.com/abhiram-kandiyana/cuda-blast-2024

Reimplementation of NCBI BLAST with CUDA backend for faster retrieval

blast cuda gpu-acceleration parallel-processing

Last synced: 21 Jan 2025

https://github.com/spatialgraphics/tardis

Travel space and time by using autodiff and codegen

autodiff codegen cuda

Last synced: 05 Feb 2025

https://github.com/angchen0325/cuda-learn

Ang's CUDA-learn project

cuda gpu-computing

Last synced: 08 Jan 2025

https://github.com/smilu97/system-hyu

한양대 시스템 프로그래밍 과제 제출용 레포지터리

c cuda linux matrix

Last synced: 24 Jan 2025

https://github.com/tthebc01/kawpow

Containerized KAWPOW miner.

cuda docker kawpow ravencoin

Last synced: 03 Jan 2025

https://github.com/chibby0ne/cuda_by_example

Old notes (and new ones) of the Cuda by Example book

cuda cuda-programming gpgpu gpu-computing gpu-programming

Last synced: 31 Dec 2024

https://github.com/zelosleone/audiobook-generator

A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.

ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech

Last synced: 03 Feb 2025

https://github.com/nourmorsy/convolution-neural-network-cuda

Code for optimization to CNN using CUDA

c cnn cuda

Last synced: 13 Jan 2025

https://github.com/shreya888/learning-cuda-with-cpp-and-pytorch

My notes, code, & insights will be recorded here while learning CUDA with C++ and PyTorch

cpp cuda pytorch

Last synced: 30 Dec 2024

https://github.com/h1me01/cuda_neural_network

Cuda version of my previous AVX-512 based neural network.

chess cuda cuda-programming neural-network

Last synced: 07 Jan 2025

https://github.com/sedflix/cuda_pattern_matching

Getting words frequency using the concepts of pattern matching in CUDA

cuda word-frequency

Last synced: 31 Dec 2024

https://github.com/roryclear/warp-shuffle-demo

warp reduce example

cuda warp

Last synced: 05 Feb 2025

https://github.com/lord-turmoil/cudacmakedemo

A demo for building CUDA program with CMake

cuda tutorial

Last synced: 23 Jan 2025

https://github.com/michaelfranzl/image_fah-client

Dockerfile for Folding@home client with AMD and Nvidia GPGPU support

container cuda debian docker foldingathome gpu-computing opencl

Last synced: 21 Jan 2025

https://github.com/jamezchard/s1mple_c0mpute

some compute (gpgpu) codes

c cpp cuda gpgpu

Last synced: 05 Feb 2025

https://github.com/k-hengzhou/hphoto

一个基于AI的智能照片管理工具,支持人脸识别、相似人脸自动聚类和nsfw检测

cuda insightface nsfw nsfw-detection nudenet photos

Last synced: 09 Jan 2025

https://github.com/jmuwrobotics/libbicos

GPU-Accelerated Binary Correspondence Search for Multishot Stereo Vision

computer-vision cuda depth-map stereo-camera stereo-matching stereo-vision

Last synced: 30 Dec 2024

https://github.com/timvgl/cuxrft

Performs FFT in xarrays using cuda

cuda cupy fft python xarray

Last synced: 09 Jan 2025

https://github.com/scar17off/ai-2048

A Python implementation of 2048 with a self-learning AI agent powered by TensorFlow. Features reinforcement learning, GPU acceleration, and real-time gameplay visualization.

2048 2048-ai 2048-game artificial-intelligence cuda deep-learning game-ai gpu-computing machine-learning neural-networks pygame python reinforcement-learning self-learning tensorflow

Last synced: 30 Dec 2024

https://github.com/bardifarsi/threadpoolmanager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 29 Dec 2024

https://github.com/danieljvickers/fluid_simulation

An educational example for learning the Navier-Stoke equations. Also included is a C++ and CUDA shared object library, buildable with CMake, for use in your personal projects.

cpp cuda differential-equations navier-stokes numpy physics python simulation

Last synced: 30 Dec 2024

https://github.com/skyguy126/cuda-learnings

Collection of personal CUDA learnings.

cuda

Last synced: 05 Feb 2025

https://github.com/occisor2/fluidsimulation

Second project of my parallel algorithms course

cuda high-performance-computing

Last synced: 11 Jan 2025

https://github.com/f-koehler/itesol

WIP: Iterative eigensolvers for C++20, Python and CUDA

cpp20 cuda eigenvalues linear-algebra python

Last synced: 28 Dec 2024

https://github.com/cs550-epfl/review

Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 05 Feb 2025

https://github.com/ionmich/cs149-local-dev

Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments

conda cs149 cuda ispc parallel-computing

Last synced: 06 Feb 2025

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python

Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.

accelerated-computing cuda cuda-programming jit numba nvidia python

Last synced: 06 Feb 2025

https://github.com/lruizap/testcuda

Guide to install and use cuda for programming

cuda cudnn nvidia pytorch

Last synced: 02 Feb 2025

https://github.com/sid911/neuralnetworkcpp

A small experiment to learn about neural networks and their runtimes in cpp

cpp cuda machine-learning neural-network

Last synced: 14 Jan 2025

https://github.com/roryclear/cuda-ml

simple cuda optimized mnist classifier

colab-notebook cuda mnist-classification pycuda

Last synced: 21 Jan 2025

https://github.com/shineiarakawa/particle-stabilizer

A C++ and CUDA-based program for simulating the motion of particles.

cpp cuda n-body particles

Last synced: 13 Jan 2025

https://github.com/mathiasotnes/gemm

General Matrix Multiplication (GEMM) optimization in Cuda.

cuda gpu

Last synced: 31 Jan 2025

https://github.com/trentonom0r3/raft-analysis

Simple analysis script 'demotest.py' using RAFT optical flow to get flow vectors, occlusion masks, and Information on keyframes with significant motion changes

cuda flow-maps occlusion-masks opticalflow python pytorch raft

Last synced: 08 Feb 2025

https://github.com/xza85hrf/flux_pipeline

FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.

ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model

Last synced: 22 Dec 2024

https://github.com/nvaranki/cmmx

CUDA matrix multiplication (official guide, modified)

cuda cuda-kernels

Last synced: 10 Dec 2024

https://github.com/demetriantitus/machine-vision---yolov8

This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams

computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8

Last synced: 05 Feb 2025

https://github.com/rkarahul/person-detector-faceverifier

Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.

bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8

Last synced: 05 Feb 2025

https://github.com/dasbd72/nthu-ipc-2022

National Tsing Hua University - Introduction to Parallel Computing - 2022

cuda cuda-programming hpc mpi openmp pthreads

Last synced: 05 Feb 2025

https://github.com/popke523/rybki

A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.

boids cuda opengl

Last synced: 08 Feb 2025

https://github.com/juntyr/necsim-rust-analysis

Analysis of the spatially explicit biodiversity simulation `necsim-rust`

analysis biodiversity cuda mpi necsim rust simulation

Last synced: 25 Jan 2025

https://github.com/sid911/scions_old

A small, fast and easy to use Machine Learning framework for edge

cpp cuda library machine-learning

Last synced: 14 Jan 2025

https://github.com/evstigneevnm/slurm_gpu_mpi_docker

This is a repository that contains a sample of how to make a Dockerfile and compile your program that uses MPI into slurm with enroot and pyxis from NVIDIA.

cuda docker enroot mpi nvidia pyxis slurm

Last synced: 05 Feb 2025

https://github.com/thanduriel/cuda_hip_comparison

performance study of atomics on GPUs

atomics cuda hip

Last synced: 05 Feb 2025

https://github.com/apostolis1/parallel-processing-systems

Project of the undergrad course "Parallel Processing Systems" - NTUA

benchmark c cuda mpi openmp parallel-computing

Last synced: 05 Feb 2025

https://github.com/anne-andresen/autoencoder_3d_c_cuda

3D Autoencoder training in raw C/CUDA

3d autoencoder c cuda nifti

Last synced: 05 Feb 2025

https://github.com/iebeid/cuda-particles

A simple visualization of particles calcualted using CUDA

cuda opengl

Last synced: 12 Jan 2025

https://github.com/yinguobing/opencv-docker

Dockerfiles for OpenCV build.

cuda docker ffmpeg opencv

Last synced: 13 Jan 2025

https://github.com/prateekshukla1108/thunderkittens-docs

Documentation for ThunderKittens framework

cuda deep-le

Last synced: 24 Jan 2025

https://github.com/shineiarakawa/cuda-cmake-minimal-template

A minimal CUDA C++ project template with CMake

cmake cuda dear-imgui opengl project-template stb-image

Last synced: 21 Jan 2025

https://github.com/patriciobcs/mini-aevol

Parallel implementation of a reduced version of the Aevol simulator

aevol cuda simulation

Last synced: 20 Jan 2025

https://github.com/versi379/optimized-matrix-multiplication

This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.

cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming

Last synced: 21 Jan 2025

https://github.com/usman619/pdc

Parallel and Distributed Computing

cuda distributed-computing distributed-systems nextcloud

Last synced: 13 Jan 2025