Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/ashwanirathee/imagesgpu.jl

Image Processing on GPU in Julia

cuda gpu image image-processing julia

Last synced: 08 Jan 2025

https://github.com/uefi-code/msra_thepracticespaceproject_pytorchcuda

My repo to attend MSRA the Practice Space Project 2022, CUDA Implement and Optimize

ann cuda pytorch

Last synced: 18 Feb 2025

https://github.com/lightshade12/kittlespt

A hobby CUDA pathtracing renderer.

3d-graphics computer-graphics cuda gpu path-tracing ray-tracing

Last synced: 24 Jan 2025

https://github.com/tensorbfs/cutropicalgemm.jl

The fastest Tropical number matrix multiplication on GPU

cuda gemm tropical-algebra

Last synced: 13 Feb 2025

https://github.com/han-minhee/sgemm_hip

SGEMM implementations in HIP for NVIDIA / AMD GPUs

cuda gpgpu gpu hip rocm

Last synced: 05 Feb 2025

https://github.com/speedcell4/torchdevice

Setup CUDA_VISIBLE_DEVICES

cuda deep-learning gpu machine-learning pytorch

Last synced: 08 Feb 2025

https://github.com/luis-kr/depthmap

Depth map estimation tool using Depth-Anything-V2. Generate accurate depth maps from images with support for both relative and metric depth measurements.

cuda depth-anything depth-estimation depth-map image-processing python pytorch

Last synced: 14 Jan 2025

https://github.com/kobinarth-panchalingam/parallel-and-concurrent-programming

Semester - 7 | CS4533 - Parallel and Concurrent Programming | Labs

c concurrent-programming cuda java openmp pthreads

Last synced: 08 Jan 2025

https://github.com/nvaranki/cmmx

CUDA matrix multiplication (official guide, modified)

cuda cuda-kernels

Last synced: 10 Dec 2024

https://github.com/demetriantitus/machine-vision---yolov8

This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams

computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8

Last synced: 05 Feb 2025

https://github.com/rkarahul/person-detector-faceverifier

Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.

bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8

Last synced: 05 Feb 2025

https://github.com/dasbd72/nthu-ipc-2022

National Tsing Hua University - Introduction to Parallel Computing - 2022

cuda cuda-programming hpc mpi openmp pthreads

Last synced: 05 Feb 2025

https://github.com/jpodivin/gputomata

Cellular automata running on CUDA capable GPUs

cellular-automata cellular-automaton cuda

Last synced: 18 Feb 2025

https://github.com/evstigneevnm/slurm_gpu_mpi_docker

This is a repository that contains a sample of how to make a Dockerfile and compile your program that uses MPI into slurm with enroot and pyxis from NVIDIA.

cuda docker enroot mpi nvidia pyxis slurm

Last synced: 05 Feb 2025

https://github.com/thanduriel/cuda_hip_comparison

performance study of atomics on GPUs

atomics cuda hip

Last synced: 05 Feb 2025

https://github.com/apostolis1/parallel-processing-systems

Project of the undergrad course "Parallel Processing Systems" - NTUA

benchmark c cuda mpi openmp parallel-computing

Last synced: 05 Feb 2025

https://github.com/anne-andresen/autoencoder_3d_c_cuda

3D Autoencoder training in raw C/CUDA

3d autoencoder c cuda nifti

Last synced: 05 Feb 2025

https://github.com/srivanijayanthi/pytorch-onnx-tensorrt-conversion

This repository provides a step-by-step guide to converting a PyTorch model to the ONNX format and subsequently to TensorRT for optimized inference.

cuda onnx pytorch tensorrt

Last synced: 24 Jan 2025

https://github.com/bjornmelin/tensorflow-evolution

🧠 Progressive journey through TensorFlow, from basics to advanced architectures. Featuring custom training pipelines, optimized GPU implementations, and production-ready models. Includes CUDA optimizations for large-scale training. 🚀

cuda deep-learning gpu-optimization machine-learning ml-engineering neural-networks python tensorflow

Last synced: 24 Jan 2025

https://github.com/bjornmelin/cuda-core-projects

🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻

cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing

Last synced: 24 Jan 2025

https://github.com/nourmorsy/convolution-neural-network-cuda

Code for optimization to CNN using CUDA

c cnn cuda

Last synced: 13 Jan 2025

https://github.com/miferreiro/cdap-cuda

CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020

c cuda scan

Last synced: 18 Feb 2025

https://github.com/drilonaliu/bachelor-thesis

Parallel Programming Fractals

cuda fractals gpu parallel-programming

Last synced: 26 Jan 2025

https://github.com/moshidev/acap

Prácticas de la asignatura Arquitectura y Computación de Altas Prestaciones

cuda homework-assignments mpi pthreads

Last synced: 05 Feb 2025

https://github.com/nwpu66/cookiekiss-engine

CookieKiss Engine include a render and other small tech related to compute graphic.

compute-graphics cpp cuda opengl vulkan

Last synced: 14 Feb 2025

https://github.com/roryclear/cuda-ml

simple cuda optimized mnist classifier

colab-notebook cuda mnist-classification pycuda

Last synced: 21 Jan 2025

https://github.com/yinguobing/opencv-docker

Dockerfiles for OpenCV build.

cuda docker ffmpeg opencv

Last synced: 13 Jan 2025

https://github.com/senli1073/docker-gpu-monitor

A lightweight GPU monitor designed for real-time web-based viewing of GPU server status.

container cuda docker flask gpu gpu-monitoring linux memory-usage nvidia-smi web

Last synced: 06 Feb 2025

https://github.com/prateekshukla1108/thunderkittens-docs

Documentation for ThunderKittens framework

cuda deep-le

Last synced: 24 Jan 2025

https://github.com/alan-cooney/python-cuda-starter-template

Python CUDA Starter Template

cuda deep-learning

Last synced: 06 Feb 2025

https://github.com/shineiarakawa/cuda-cmake-minimal-template

A minimal CUDA C++ project template with CMake

cmake cuda dear-imgui opengl project-template stb-image

Last synced: 21 Jan 2025

https://github.com/parxd/ml-cuda-kernels

various CUDA kernels optimized for specific ML algos

cuda machine-learning

Last synced: 30 Dec 2024

https://github.com/sid911/neuralnetworkcpp

A small experiment to learn about neural networks and their runtimes in cpp

cpp cuda machine-learning neural-network

Last synced: 14 Jan 2025

https://github.com/sid911/scions_old

A small, fast and easy to use Machine Learning framework for edge

cpp cuda library machine-learning

Last synced: 14 Jan 2025

https://github.com/drilonaliu/parallel-mandelbrot-set

GPU-accelerated Mandelbrot Set generation with CUDA and OpenGL interoperability.

cuda fractals gpu mandelbrot-fractal parallel-programming

Last synced: 26 Jan 2025

https://github.com/shambac/shamboflow

Fierce tensorflow competitor

cuda cupy machine-learning numpy pypi-package

Last synced: 02 Nov 2024

https://github.com/donaurelio/ansible-playbooks

A Bunch of ansible-playbooks that automate computer infraestruture provisioning

ansible-playbooks cuda docker gromacs openmpi

Last synced: 16 Feb 2025

https://github.com/malolm/football-player-detection-with-yolov8

Football player detection YOLOv8 fine-tuning

cuda jupyterlab python3 yolov8-detection

Last synced: 06 Feb 2025

https://github.com/daviddavo/19gpu

Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab

accelerator cuda gpu-programming

Last synced: 23 Jan 2025

https://github.com/blazekill/hello-cuda

Cpp + Vcpkg + CUDA + VsCode starter project.

cpp cuda vcpkg vscode

Last synced: 01 Jan 2025

https://github.com/h1me01/cuda_neural_network

Cuda version of my previous AVX-512 based neural network.

chess cuda cuda-programming neural-network

Last synced: 07 Jan 2025

https://github.com/starlitdreams/pacman-convolutional-q-learning

This project implements a Deep Q-Network (DQN) using PyTorch to train an agent to play Atari's Ms. Pac-Man. It utilizes reinforcement learning with a convolutional neural network (CNN) for image processing. Features include experience replay, frame preprocessing, and CUDA support, with trained model saving and video rendering of gameplay.

artificial-intelligence artificial-neural-networks atari cuda deep-learning deep-learning-algorithms deep-q-learning deeplearning gymnasium gymnasium-environment python pytorch

Last synced: 07 Feb 2025

https://github.com/brendanm12345/simple_renderer_cs149

Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions

cpp cuda

Last synced: 07 Jan 2025

https://github.com/muhamadajiw/parallel-matrix-inversion

A parallel program for matrix inversion using MPI, OpenMP, and CUDA

cpp cuda mpi openmp

Last synced: 17 Jan 2025

https://github.com/ahmed5827/image_generation

This application provides a graphical user interface (GUI) for generating images using the Stable Diffusion model. The GUI allows users to input a text prompt, and the application generates an image based on the prompt.

ai cuda generative-ai image-generation

Last synced: 07 Jan 2025

https://github.com/dlr-amr/t8gpu

Header-only finite volume library targetting GPUs using t8code as meshing backend.

adaptive-mesh-refinement cuda finite-volume gpgpu-computing hpc mesh mpi parallel-computing simulation

Last synced: 06 Feb 2025

https://github.com/kenwuqianghao/c4ai-cuda-birds

Homework assignments for C4AI Beginners in Research-Driven Studies

cuda machine-learning pytorch

Last synced: 18 Feb 2025

https://github.com/saadarazzaq/cuda-device-info

Check if Cuda is correctly configured in your windows 🖥️

cuda pytorch setup windows

Last synced: 23 Jan 2025

https://github.com/materight/pyav-cuda

Extension of PyAV with hardware encoding and decoding support. Compatible with PyTorch and Nvidia codecs.

cuda cuvid ffmpeg libav pytorch

Last synced: 18 Feb 2025

https://github.com/m-torhan/cuda-fractals

CUDA C++ implementation of Fractals visualization

cuda

Last synced: 20 Feb 2025

https://github.com/emanuelemessina/cuda-benchmark

Evaluate matrix calculations time between CPU and GPU (CUDA)

benchmark cuda matrix-calculations

Last synced: 10 Feb 2025

https://github.com/uefi-code/bachelorgraduationdesign

I developed a PyTorch_For_PoorGuys framework and Let it train LLM on NVIDIA GeForce 2080Ti GPU as my Bachelor's Graduation Design Project

chatbot cuda gpu hacking large-language-models pytorch

Last synced: 18 Feb 2025

https://github.com/stevenchang5/canny_edge

Implementation of canny edge detection, with option to use cuda to improve performance

cuda edge-detection opencv

Last synced: 20 Feb 2025

https://github.com/anselm67/cuda_mnist

A CUDA implementation of MNIST - for CUDA beginners.

cuda gpu gpu-computing gpu-programming mnist mnist-classification

Last synced: 20 Feb 2025

https://github.com/hrshl212/custom-cuda-kernels-with-neural-network-implementation

The repository contains custom CUDA kernels for linear layer, softmax and relu which are integrated with python to develop a Neural Network

cuda neural-network python pytorch

Last synced: 20 Feb 2025

https://github.com/aarid/cuda_operations

This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.

conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication

Last synced: 20 Feb 2025

https://github.com/tthebc01/kawpow

Containerized KAWPOW miner.

cuda docker kawpow ravencoin

Last synced: 03 Jan 2025

https://github.com/uwuwuwu363/tts-local

🎤 Natural TTS App: A Python-based text-to-speech GUI with multi-language support, playback controls, and audio export. Built with Tkinter, gTTS, and Pygame. 🚀

chatbot cuda deep-learning multilingual ollama pinokio raspberry-pi speech-recognition speech-to-text text-to-speech tts voice voices wav

Last synced: 10 Feb 2025

https://github.com/jakubfr4czek/concurrent-gauss-elimination

Concurrent gaussian elimination algorithm implemented using traces theory. Parallelism has been achieved employing CUDA cores.

agh agh-ust agh-wi conda cuda cuda-kernels cuda-toolkit diekert-graph graphviz java python python3 traces-theory

Last synced: 20 Feb 2025

https://github.com/elprofesoriqo/ml-optimizer

Python library designed to revolutionize machine learning workflows by automating data preprocessing, tensor optimization, and model selection.

api-rest cuda imagesearch machine-learning machine-learning-algorithms numpy-arrays python pytorch tensor

Last synced: 20 Feb 2025

https://github.com/aeyage/intraday_prices

GPU-accelerated portfolio optimisation

cuda cupy nvidia-gpu

Last synced: 10 Feb 2025

https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker

NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning

ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu

Last synced: 03 Feb 2025

https://github.com/maxenceleguery/3d-render-engine

3D Render engine accelerated with CUDA

3d cuda engine raytracing

Last synced: 18 Feb 2025

https://github.com/juntyr/necsim-rust-docs

Documentation of the spatially explicit biodiversity simulation necsim-rust

biodiversity cuda docs mpi necsim rust simulation

Last synced: 03 Feb 2025

https://github.com/ivanbgd/cuda_quad_c

Calculates a definite integral by using three different rules. Compares sequential to parallel implementations.

cuda integrals parallel-implementations

Last synced: 03 Feb 2025

https://github.com/awikramanayake/optimized-matrix-mult

Optimizing matrix multiplication using parallelism and SIMD (AVX2, CUDA)

avx2 cuda matrix-multiplication

Last synced: 21 Jan 2025

https://github.com/juntyr/necsim-rust-analysis

Analysis of the spatially explicit biodiversity simulation `necsim-rust`

analysis biodiversity cuda mpi necsim rust simulation

Last synced: 25 Jan 2025

https://github.com/timxor/c_code

Some of my C code

c cuda m4 parallel-programming

Last synced: 17 Feb 2025

https://github.com/cerit-sc/scipion-docker

Scipion (Cryo em image processing framework (https://scipion.i2pc.es/)) adapted to run in Kubernetes.

cryo-em cryoem cuda desktop kubernetes scipion vnc

Last synced: 06 Dec 2024

https://github.com/mxm-tr/docker-darknet-opencv

Accelerated objects detection on streams and files, using a Docker darknet YOLO container

cuda docker docker-compose object-recognition opencv-python python3 yolo

Last synced: 17 Jan 2025

https://github.com/tfogal/gemm-db

For creating a cacheable GEMM cost model.

cuda rust

Last synced: 21 Jan 2025

https://github.com/psteinb/gtc2017

Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley

compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim

Last synced: 06 Jan 2025

https://github.com/pvgupta24/parallel-programming

Basic algorithms for parallel programming in CUDA C++, Java and OpenMP

cuda openmp parallel-programming

Last synced: 06 Jan 2025

https://github.com/deepschneider/tinygrad-universal

Universal version of Tinygrad with CUDA and OpenCL support

autograd automatic-differentiation cuda pycuda pyopencl tinygrad tinygrad-cuda

Last synced: 16 Jan 2025

https://github.com/neel-dandiwala/npp_cudaatscale_project

For the enterprise course project, I have created a model that executes the histogram equalisation procedure on the given input image file.

cuda npp

Last synced: 17 Feb 2025

https://github.com/neel-dandiwala/cuda-programs

Miscellaneous programs that grasp the concept of Parallel Computing

cuda gpu-programming parallel-programming

Last synced: 17 Feb 2025

https://github.com/shineiarakawa/particle-stabilizer

A C++ and CUDA-based program for simulating the motion of particles.

cpp cuda n-body particles

Last synced: 13 Jan 2025

https://github.com/drilonaliu/parallel-sierpinski-triangle

GPU-accelerated Sierpinski Triangle generation with CUDA and OpenGL interoperability.

cuda fractals gpu parallel-programming sierpinski-triangle

Last synced: 26 Jan 2025

https://github.com/bdwhst/fluora

A CUDA PBR path tracer

cpp cuda pathtracing pbr rendering

Last synced: 12 Feb 2025

https://github.com/cs550-epfl/report

EPFL CS-550 project report

cuda formal-verification gpu memory-consistency ptx simt

Last synced: 10 Jan 2025

https://github.com/naetherm/derelictcurand

Dynamic bindings to the CuRAND library for the D Programming Language.

cuda curand d derelict dlang

Last synced: 01 Feb 2025

https://github.com/i-m-iron-man/abmax

Abmax is an agent-based modelling framework in Jax, focused on dynamic population size

abm agent agent-based agent-based-modeling agent-based-simulation agents cuda jax python

Last synced: 13 Jan 2025

https://github.com/dbklim/optimized_tensorflow_wheels

Optimized versions TensorFlow and TensorFlow-GPU for specific CPUs and GPUs (for both old and new).

cuda nvidia-cuda nvidia-gpu tensorflow tensorflow-community-wheels tensorflow-gpu tensorflow-packages tensorflow-whells wheels

Last synced: 10 Jan 2025

https://github.com/naetherm/derelictcublas

Dynamic bindings to the CuBLAS library for the D Programming Language.

cublas cuda d derelict dlang

Last synced: 01 Feb 2025

https://github.com/ojeda-e/fokker-planck

Numerical solution of the Fokker-Planck equation in large times using CUDA/C.

cuda fokker-planck-equations

Last synced: 17 Feb 2025

https://github.com/drtey/cuda-zero

CUDA Programming

c cpp cuda makefile

Last synced: 13 Jan 2025

https://github.com/dongskie43/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers

Last synced: 03 Feb 2025

https://github.com/dreamjet31/licence_plate_detection

Automated License Plate recognition system

cuda opencv python pytorch ultralytics yolov8

Last synced: 10 Feb 2025