Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/sangioai/torchpace

PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.

cuda pytorch transformer

Last synced: 02 Feb 2025

https://github.com/jmuwrobotics/libbicos

GPU-Accelerated Binary Correspondence Search for Multishot Stereo Vision

computer-vision cuda depth-map stereo-camera stereo-matching stereo-vision

Last synced: 30 Dec 2024

https://github.com/bjornmelin/edge-ai-engineering

📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖

cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite

Last synced: 02 Feb 2025

https://github.com/iglee/jax-cuda-eicl-exp-docker

Docker for getting jax to work with cuda, for reproducing ml experiments like eicl. Sure, let's NOT make a compatibility matrix and let people fight for their lives on cuda

cuda docker jax jaxline ml-engineering ml-experiments tensorflow

Last synced: 05 Feb 2025

https://github.com/alexkranias/triton_vs_cuda

Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.

cuda cuda-kernels gpu gpu-programming parallel-programming python triton

Last synced: 05 Feb 2025

https://github.com/ne0nwinds/gpupuzzles

My solutions to srush/GPU-Puzzles using CUDA

cuda

Last synced: 02 Feb 2025

https://github.com/atelierarith/julia_gpu_playground

For those who want use Julia with GPU

cuda docker docker-compose julia

Last synced: 06 Feb 2025

https://github.com/ysl1016/cudadigitfilter

CUDA-based parallel image filtering system for MNIST dataset

computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing

Last synced: 02 Feb 2025

https://github.com/bjornmelin/ai-system-design

🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️

architecture cuda distributed-systems engineering gpu-computing production scalability system-design

Last synced: 02 Feb 2025

https://github.com/sephiroth7712/k-nearest-neigbours

Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.

cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark

Last synced: 06 Feb 2025

https://github.com/phrutis/brainwords2

GPU brainflayer for sale $250

brain brainflayer brainwords cuda gpu key pass passphrase private

Last synced: 05 Feb 2025

https://github.com/sbstndb/neural_k

A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend

ai cuda kokkos library neural-network openmp

Last synced: 05 Feb 2025

https://github.com/spatialgraphics/tardis

Travel space and time by using autodiff and codegen

autodiff codegen cuda

Last synced: 05 Feb 2025

https://github.com/belrbez/ship-graphic-qt-qml-cuda-c

Client-Server application for Rocket driving in QML graphics

c client-server cpp cuda qml qt5 rocket

Last synced: 06 Feb 2025

https://github.com/jiriklepl/bits-knn-jpdc2024

Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search

bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k

Last synced: 26 Jan 2025

https://github.com/wiktor2718/matrix_flow

Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.

adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust

Last synced: 26 Jan 2025

https://github.com/0xhilsa/pycu

PyCu

cpp cuda nvcc python3

Last synced: 22 Dec 2024

https://github.com/scar17off/ai-2048

A Python implementation of 2048 with a self-learning AI agent powered by TensorFlow. Features reinforcement learning, GPU acceleration, and real-time gameplay visualization.

2048 2048-ai 2048-game artificial-intelligence cuda deep-learning game-ai gpu-computing machine-learning neural-networks pygame python reinforcement-learning self-learning tensorflow

Last synced: 30 Dec 2024

https://github.com/roryclear/warp-shuffle-demo

warp reduce example

cuda warp

Last synced: 05 Feb 2025

https://github.com/jamezchard/s1mple_c0mpute

some compute (gpgpu) codes

c cpp cuda gpgpu

Last synced: 05 Feb 2025

https://github.com/skyguy126/cuda-learnings

Collection of personal CUDA learnings.

cuda

Last synced: 05 Feb 2025

https://github.com/cs550-epfl/review

Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 05 Feb 2025

https://github.com/amitkumarj441/deep-learning-on-your-finger

A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:

cuda cudnn gcp

Last synced: 26 Jan 2025

https://github.com/xza85hrf/flux_pipeline

FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.

ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model

Last synced: 22 Dec 2024

https://github.com/macaycz/nn

A lightweight, GPU-accelerated machine learning library built with CUDA.

cuda deep-learning gpu machine-learning neural-network

Last synced: 13 Feb 2025

https://github.com/jeremywildsmith/shadowhash

Elixir distributed Shadow File password cracker with GPU accelerated cracking for md5crypt hashing algorithm.

cracking-hashes cuda distributed-systems elixir hashing nx security

Last synced: 13 Feb 2025

https://github.com/h1me01/cuda_neural_network

Cuda version of my previous AVX-512 based neural network.

chess cuda cuda-programming neural-network

Last synced: 07 Jan 2025

https://github.com/danieljvickers/fluid_simulation

An educational example for learning the Navier-Stoke equations. Also included is a C++ and CUDA shared object library, buildable with CMake, for use in your personal projects.

cpp cuda differential-equations navier-stokes numpy physics python simulation

Last synced: 30 Dec 2024

https://github.com/smilu97/system-hyu

한양대 시스템 프로그래밍 과제 제출용 레포지터리

c cuda linux matrix

Last synced: 24 Jan 2025

https://github.com/chibby0ne/cuda_by_example

Old notes (and new ones) of the Cuda by Example book

cuda cuda-programming gpgpu gpu-computing gpu-programming

Last synced: 31 Dec 2024

https://github.com/trentonom0r3/raft-analysis

Simple analysis script 'demotest.py' using RAFT optical flow to get flow vectors, occlusion masks, and Information on keyframes with significant motion changes

cuda flow-maps occlusion-masks opticalflow python pytorch raft

Last synced: 08 Feb 2025

https://github.com/versi379/optimized-matrix-multiplication

This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.

cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming

Last synced: 21 Jan 2025

https://github.com/popke523/rybki

A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.

boids cuda opengl

Last synced: 08 Feb 2025

https://github.com/zelosleone/audiobook-generator

A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.

ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech

Last synced: 03 Feb 2025

https://github.com/iebeid/cuda-particles

A simple visualization of particles calcualted using CUDA

cuda opengl

Last synced: 12 Jan 2025

https://github.com/toshikinakamura0412/dotfiles_for_docker

My dotfiles for docker of some linux distribution

cuda docker docker-compose dotfiles git neovim ros-noetic tmux zsh

Last synced: 20 Nov 2024

https://github.com/isquicha/cuda-parallel-studies

Learning CUDA programming here =D

cuda cuda-programming cuda-toolkit

Last synced: 22 Jan 2025

https://github.com/gladap/heterogeneous_computing_project

Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters

cuda heterogeneous-parallel-programming

Last synced: 05 Feb 2025

https://github.com/ribin-baby/cuda_cudnn_installation_on_ubuntu20.04

Installation of CUDA-11.8 with cuDNN-8.7 for ubuntu(20.04) server A30 GPU, and onnx gpu installation guide

cuda gpu linux onnxruntime server

Last synced: 16 Jan 2025

https://github.com/flolu/hardware-praktikum

SoSe 2021 Hardware Praktikum

college cuda hardware

Last synced: 09 Jan 2025

https://github.com/sedflix/cuda_pattern_matching

Getting words frequency using the concepts of pattern matching in CUDA

cuda word-frequency

Last synced: 31 Dec 2024

https://github.com/patriciobcs/mini-aevol

Parallel implementation of a reduced version of the Aevol simulator

aevol cuda simulation

Last synced: 20 Jan 2025

https://github.com/sferez/sspp_sparse_matrix_cuda

Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA

cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix

Last synced: 13 Jan 2025

https://github.com/pintamonas4575/rlgan-project-maadm-upm

Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.

cuda deep-learning gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow

Last synced: 05 Feb 2025

https://github.com/k-hengzhou/hphoto

一个基于AI的智能照片管理工具,支持人脸识别、相似人脸自动聚类和nsfw检测

cuda insightface nsfw nsfw-detection nudenet photos

Last synced: 09 Jan 2025

https://github.com/f14-bertolotti/torchess

cuda torch extension for a chess engine

chess cuda torch

Last synced: 05 Feb 2025

https://github.com/timvgl/cuxrft

Performs FFT in xarrays using cuda

cuda cupy fft python xarray

Last synced: 09 Jan 2025

https://github.com/kts-o7/n-body-parallel-implementation

A simple study to compare the speed-up obtained by using different parallelization formats like MPI,OpenMP and CUDA for FFT implementation of n-body simulation

cuda mpi openmp parallel-computing pthreads

Last synced: 05 Feb 2025

https://github.com/thomasvonwu/interview-note

Share Interview Questions and Summarize Answers

cuda interview llm

Last synced: 05 Feb 2025

https://github.com/fikri-rouzan/cuda-c-program-part-2

CUDA C program from NVIDIA course.

c cuda

Last synced: 05 Feb 2025

https://github.com/fikri-rouzan/cuda-c-program-part-1

CUDA C program from NVIDIA course.

c cuda

Last synced: 05 Feb 2025

https://github.com/f-koehler/itesol

WIP: Iterative eigensolvers for C++20, Python and CUDA

cpp20 cuda eigenvalues linear-algebra python

Last synced: 28 Dec 2024

https://github.com/roryclear/cuda-ml

simple cuda optimized mnist classifier

colab-notebook cuda mnist-classification pycuda

Last synced: 21 Jan 2025

https://github.com/fikri-rouzan/cuda-c-program-part-3

CUDA C program from NVIDIA course.

c cuda

Last synced: 05 Feb 2025

https://github.com/parlaynu/inference-tvm

Export ONNX to ApacheTVM and run inference in containerized environments.

apache-tvm cuda docker jetson-nano onnx raspberrypi4 x86-64

Last synced: 28 Jan 2025

https://github.com/lruizap/testcuda

Guide to install and use cuda for programming

cuda cudnn nvidia pytorch

Last synced: 02 Feb 2025

https://github.com/ionmich/cs149-local-dev

Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments

conda cs149 cuda ispc parallel-computing

Last synced: 06 Feb 2025

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python

Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.

accelerated-computing cuda cuda-programming jit numba nvidia python

Last synced: 06 Feb 2025

https://github.com/yinguobing/opencv-docker

Dockerfiles for OpenCV build.

cuda docker ffmpeg opencv

Last synced: 13 Jan 2025

https://github.com/branebb/nn-framework

Framework for creating neural networks using C++ and CUDA platform. This project is part of my final university assignment for bachelor's degree.

cmake cpp cuda cuda-programming

Last synced: 19 Nov 2024

https://github.com/bd2720/accesspatterns

Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.

c cache cuda cuda-toolkit performance-analysis performance-testing profiling

Last synced: 31 Jan 2025

https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is

Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization

ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow

Last synced: 12 Oct 2024

https://github.com/raiszo/cs334

Journey thorugh Intro to Parallel Programming

cmake cs334 cuda msbuild

Last synced: 25 Jan 2025

https://github.com/dragonscypher/prompty

Tool for generating smart and secure prompts for language models!

autotokenizer bert-model cuda google-t5 llm python3 tensorflow threading

Last synced: 22 Jan 2025

https://github.com/phantom7knight/cuda-fusion

This project is for learning CUDA to understand the GPU work better.

cuda cuda-programming gpgpu gpu

Last synced: 08 Feb 2025

https://github.com/prateekshukla1108/thunderkittens-docs

Documentation for ThunderKittens framework

cuda deep-le

Last synced: 24 Jan 2025

https://github.com/kanchishimono/python-images

Ubuntu based Python container images, including CUDA images

container-image cuda docker dockerfile machine-learning python python3

Last synced: 26 Jan 2025

https://github.com/nvaranki/cmmx

CUDA matrix multiplication (official guide, modified)

cuda cuda-kernels

Last synced: 10 Dec 2024

https://github.com/demetriantitus/machine-vision---yolov8

This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams

computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8

Last synced: 05 Feb 2025

https://github.com/rkarahul/person-detector-faceverifier

Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.

bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8

Last synced: 05 Feb 2025

https://github.com/dasbd72/nthu-ipc-2022

National Tsing Hua University - Introduction to Parallel Computing - 2022

cuda cuda-programming hpc mpi openmp pthreads

Last synced: 05 Feb 2025

https://github.com/sebp/vscode-sycl-dpcpp-cuda

Sample project to use the VS Code Remote - Containers extension to develop SYCL applications for NVIDIA GPUs using the oneAPI DPC++ compiler.

cuda dpcpp fedora gpu-computing podman sycl vscode

Last synced: 08 Feb 2025

https://github.com/sydney-informatics-hub/computer-vision-fine-tuning

Fine tune a computer vision to solve your task locally, on HPC, in a container, or in the cloud!

computer-vision cuda deep-learning python

Last synced: 22 Jan 2025

https://github.com/thalesmg/haskell-accelerate-parconc

Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell

accelerate cuda gpu-computing haskell parallel-computing

Last synced: 08 Feb 2025

https://github.com/evstigneevnm/slurm_gpu_mpi_docker

This is a repository that contains a sample of how to make a Dockerfile and compile your program that uses MPI into slurm with enroot and pyxis from NVIDIA.

cuda docker enroot mpi nvidia pyxis slurm

Last synced: 05 Feb 2025

https://github.com/thanduriel/cuda_hip_comparison

performance study of atomics on GPUs

atomics cuda hip

Last synced: 05 Feb 2025

https://github.com/apostolis1/parallel-processing-systems

Project of the undergrad course "Parallel Processing Systems" - NTUA

benchmark c cuda mpi openmp parallel-computing

Last synced: 05 Feb 2025

https://github.com/jonyandunh/stanforddogsresnet

A classifier for 120 dogs classified at Stanford Dogs Dataset, using the Pytorch framework and using custom Resnet for neural network learning

cuda deep-learning python pytorch resnet resnet-18 standford-dog stanford

Last synced: 14 Jan 2025

https://github.com/anne-andresen/autoencoder_3d_c_cuda

3D Autoencoder training in raw C/CUDA

3d autoencoder c cuda nifti

Last synced: 05 Feb 2025

https://github.com/shineiarakawa/cuda-cmake-minimal-template

A minimal CUDA C++ project template with CMake

cmake cuda dear-imgui opengl project-template stb-image

Last synced: 21 Jan 2025

https://github.com/grindelfp/cuda-n-body-simulation

Simulation of N-Body movement using CUDA.

cuda n-body-simulation

Last synced: 12 Feb 2025

https://github.com/sustia-llc/gpu_logger_poc

GPU execution verification system with immutable Kafka logging. Monitors CUDA operations, validates GPU performance, and maintains auditable operation history. Built with Rust and Candle for reliable ML model execution tracking.

candle-core cuda docker gpu gpu-computing kafka logging machine-learning mlops monitoring nvidia performance-testing rust

Last synced: 12 Feb 2025

https://github.com/brocbyte/cuball

CUDA-based implementation of "Real-Time Rigid Body Simulation on GPUs" [from GPU Gems 3]

cpp cuda

Last synced: 05 Jan 2025

https://github.com/boostibot/bachelors

My bachelors thesis at CTU in Prague, Faculty of Nuclear Sciences and Physical Engineering supervised by Ing. Pavel Strachota, Ph.D

crystal-growth cuda finite-volume-method parallel-programming phase-field-method

Last synced: 18 Jan 2025

https://github.com/sid911/neuralnetworkcpp

A small experiment to learn about neural networks and their runtimes in cpp

cpp cuda machine-learning neural-network

Last synced: 14 Jan 2025