Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/jtschwar/tomo_tv

C++ library for Regularized 2D and 3D Tomography Reconstructions.

3d-reconstruction cuda inverse-problems regularization tomography

Last synced: 10 Nov 2024

https://github.com/abhishekyana/cyclegans-pytorch

CycleGANs-PyTorch applied on Young to Old image converter.

cuda cyclegan faceapp gan python pytorch resnet tutorial-code young2old

Last synced: 18 Nov 2024

https://github.com/yashassamaga/convolutionbuildingblocks

GEMM and Winograd based convolutions using CUTLASS

convolution cuda cutlass deep-learning

Last synced: 03 Dec 2024

https://github.com/PINTO0309/Open3D-build

Provide Docker build sequences of Open3D for various environments.

cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow

Last synced: 27 Oct 2024

https://github.com/Bruce-Lee-LY/matrix_multiply

Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.

coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling

Last synced: 19 Nov 2024

https://github.com/sparselinearalgebra/spbla

Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations

boolean-algebra cplusplus cuda graph-algorithms graphblas opencl python sparse-matrix suitesparse

Last synced: 12 Oct 2024

https://github.com/cartersusi/pacman_cuda

[AUR][Pacman] Current Cuda compatibility with Tensorflow and Torch on Arch Linux

arch arch-linux archlinux aur compatibility cuda guide installer linux pacman script tensorflow torch

Last synced: 10 Nov 2024

https://github.com/lnstadrum/fastaugment

A handy data augmentation toolkit for image classification put in a single efficient TensorFlow/PyTorch op.

augmentation-transformations brightness-correction cuda cutout data-augmentation gamma-correction gpu mixup perspective-distortions tensorflow-op

Last synced: 28 Oct 2024

https://github.com/dark-art108/gpu-docker-deployment-text-summarization

Text Summarization using Transformer on GPU Docker Deployment

cuda docker fastapi gpu-acceleration huggingface

Last synced: 12 Nov 2024

https://github.com/bruce-lee-ly/decoding_attention

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

cuda cuda-core decoding-attention flash-attention flashinfer gpu inference large-language-model llm mha multi-head-attention nvidia

Last synced: 23 Oct 2024

https://github.com/cggos/hpc

High-Performance Computing: CPU Instructions, GPU OpenCL & CUDA, etc. :sunny:

cuda heterogeneous-parallel-programming multi-threading neon opencl openmp simd sse

Last synced: 28 Oct 2024

https://github.com/bfrg/vim-cuda-syntax

CUDA syntax highlighting for Vim

cuda highlighting syntax vim vim-syntax

Last synced: 30 Oct 2024

https://github.com/pinto0309/open3d-build

Provide Docker build sequences of Open3D for various environments.

cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow

Last synced: 23 Oct 2024

https://github.com/jishanshaikh4/cuda-programs

CUDA Programs for Hadoop/CUDA Lab at MANIT, Bhopal

c cuda hadoop

Last synced: 10 Nov 2024

https://github.com/mberr/torch-max-mem

Decorators for maximizing memory utilization with PyTorch & CUDA

cuda python pytorch torch

Last synced: 27 Oct 2024

https://github.com/ivangabriele/docker-cuda-desktop

Ubuntu PyTorch CUDA Docker image with KDE Plasma Desktop & VNC. Ideal for LLM & Deep Learning remote work.

cuda d-bus dbus deep-learning desktop docker gpu large-language-models llm nvidia python pytorch remote-desktop server ubuntu ubuntu-desktop vnc vnc-server x11

Last synced: 23 Oct 2024

https://github.com/lennyerik/cutransform

CUDA kernels in any language supported by LLVM

c cuda gpgpu gpu-compute llvm llvm-ir nvidia ptx rust zig

Last synced: 13 Nov 2024

https://github.com/pinto0309/jetson-tensorflow-pytorch-build

Provides an environment for compiling TensorFlow or PyTorch with CUDA for aarch64 on an x86 machine. This is for Jetson. If you build using an EC2 m6g.16xlarge (aarch64) instance, TensorFlow can be fully built in about 30 minutes. It can be used as a cross-compilation environment not only for TensorFlow and PyTorch, but also for various other packages and libraries.

cross-compile cuda docker jetson jetson-nano l4t pytorch tensorflow

Last synced: 23 Oct 2024

https://github.com/bruce-lee-ly/matrix_multiply

Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.

coppersmith-winograd cpp11 cpu cublas cuda kahan matrix-multiply naive nvidia reordering shared-memory strassen tiling

Last synced: 15 Nov 2024

https://github.com/shivaraj-bh/ollama-flake

Run ollama natively - powered by Nix

cuda flakes nix ollama open-webui rocm services

Last synced: 12 Nov 2024

https://github.com/krk/cuda-webcam

Webcam Image Processing with CUDA using OpenCV

c-plus-plus cuda opencv

Last synced: 05 Nov 2024

https://github.com/nvidia/numbast

Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.

cuda numba

Last synced: 29 Oct 2024

https://github.com/pinto0309/pytorch-build

Provide Docker build sequences of PyTorch for various environments.

cuda cudnn docker pytorch

Last synced: 23 Oct 2024

https://github.com/kerneltuner/kernel_launcher

Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner

cpp cuda gpu kernel-tuner

Last synced: 15 Nov 2024

https://github.com/tinybiggames/infero

An easy to use, high performant CUDA powered LLM inference library.

cuda llamacpp llm-inference win64 windows-10 windows-11

Last synced: 10 Oct 2024

https://github.com/prg-titech/ikra-ruby

A Rubygem for array-based scientific computations using GPGPU

arrays compiler cuda gpgpu ruby

Last synced: 10 Nov 2024

https://github.com/gianlucapaolocci/background-subtraction-on-gpu-with-cuda-and-opencv

In this code is provided a simple, efficient and fast method to calculate motion and backgroud dynamically using nVidia GPUs power

background-subtraction cuda image-processing nvidia opencv parallel-computing

Last synced: 23 Oct 2024

https://github.com/thecomputekid/premake5-cuda

Premake5 module that enables CUDA development in Visual Studio using the native CUDA Toolkit integration.

cuda premake-module premake5 visual-studio

Last synced: 07 Jan 2025

https://github.com/egecetin/libkaleidoscope

A library to create kaleidoscope effect on images with CUDA. You can build on all platforms using CMake

c cpp cuda image-filter image-filtering image-manipulation image-processing kaleidoscope python real-time real-time-processing video-filter video-filtering video-processing

Last synced: 15 Oct 2024

https://github.com/boyan-soubachov/excelerator

A Microsoft Excel calculation speed-up add in.

calculation-speed cuda excel formulae gpgpu microsoft

Last synced: 13 Oct 2024

https://github.com/r-barnes/barnes2019-landscape

Landscape evolution models and graph processing on the GPU

algorithm cuda gpu

Last synced: 28 Nov 2024

https://github.com/aperim/docker-nvidia-cuda-ffmpeg

A docker container, with ffmpeg that supports scale_cuda among other things

cuda ffmpeg gpu hacktoberfest nvidia

Last synced: 08 Nov 2024

https://github.com/shapelets/shapelets-compute

Shapelets Compute is an accelerated platform for time series analysis

cuda matrixprofile opencl time-series

Last synced: 13 Nov 2024

https://github.com/dusanerdeljan/tensor-math-library

Header only lazy evaluation tensor math library with multi-backend parallel eager execution support (TBB, OpenMP, Parallel STL and in the future CUDA and OpenCL)

cuda eager-execution lazy-evaluation matrix-library opencl openmp parallel-computing tbb tensor-library

Last synced: 11 Oct 2024

https://github.com/nolmoonen/cuda-lbvh

CUDA implementation of a linear bounding volume hierarchy (LBVH).

cuda lbvh path-tracer

Last synced: 10 Dec 2024

https://github.com/fluiddyn/fluidfft

:chart_with_upwards_trend: Common API (C++ and Python) for Fast Fourier Transform HPC libraries (publish-only mirror)

cuda cython fft fftw3-binding mpi pythran spectral-methods

Last synced: 01 Dec 2024

https://github.com/noahgift/nuclear_powered_command_line_tools

Nuclear Powered Command-Line Tools

cuda jit machine-learning numba python

Last synced: 11 Oct 2024

https://github.com/veriblock/nodecore-pow-cuda-miner

VeriBlock CUDA PoW Miner

cuda ptx vblake veriblock

Last synced: 23 Jan 2025

https://github.com/denzp/rust-ptx-support

Experiments with achieving better ergonomics in Rust CUDA workflow

cuda rust

Last synced: 10 Oct 2024

https://github.com/stellar-group/blaze

Fork of the Blaze library for compatibility with Blaze CUDA · https://bitbucket.org/blaze-lib/blaze · https://github.com/STEllAR-GROUP/blaze_cuda

cpp cpp14 cuda hpc linear-algebra metaprogramming

Last synced: 12 Nov 2024

https://github.com/abelcarreras/cuda_functions

Python functions to calculate the FFT and autocorrelation function using GPU (Cuda)

autocorrelation-functions complex cuda cuda-functions fft gpu power-spectrum pypi python-api

Last synced: 07 Nov 2024

https://github.com/feifeibear/pstensor

PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.

cuda deeplearning machinelearning pytorch tensorflow2

Last synced: 23 Jan 2025

https://github.com/tomaszrewak/rotatingvoxels

In this project I use C#, Alea GPU and OpenGL.Net to create a simple, hardware-accelerated, 3d animation of rotating cubes.

alea-gpu-library csharp cuda gpu opengl voxel

Last synced: 09 Nov 2024

https://github.com/minnukota381/cuda-parallel-c-programming

This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.

cuda cuda-programming hpc nvcc nvidia

Last synced: 21 Nov 2024

https://github.com/3p3r/pf-localization

Localization using a Particle Filter (and random walk model)

cuda localization matlab particle-filter slam

Last synced: 03 Nov 2024

https://github.com/shadyboukhary/gpu-research-fft-openacc-cuda

Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.

acceleration cuda fast-fourier-transform fft gpu-acceleration gpu-computing gpu-programming nvcc openacc parallel-computing pgi pgi-compiler radix-2

Last synced: 09 Nov 2024

https://github.com/johnaparker/pybind_examples

Examples of pybind11 based projects (using cmake)

cmake cpp cuda mpi openmp pybind11 python

Last synced: 28 Nov 2024

https://github.com/cea-hpc/HARP

Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA

cuda gpgpu-computing hpc opencl rust

Last synced: 19 Nov 2024

https://github.com/pvgupta24/graph-betweenness-centrality

Parallelizing Graph Betweenness Centrality with CUDA

betweenness-centrality cuda graphs

Last synced: 06 Jan 2025

https://github.com/yyaadet/aigc

An Web UI with intelligent prompts of AIGC. For example Stable Diffusion with Core ML on Apple Silicon M1/M2 and CUDA and CPU

bootstrap5 cuda django django-project image-generation jquery llm m1-mac python stable-diffusion stable-diffusion-webui text2image webapp webui

Last synced: 12 Jan 2025

https://github.com/ogrecave/ogre-gpgpu

GPGPU compute with Ogre using CUDA or OpenCL

cuda gpgpu-computing ogre3d opencl

Last synced: 05 Nov 2024

https://github.com/cea-hpc/harp

Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA

cuda gpgpu-computing hpc opencl rust

Last synced: 14 Dec 2024

https://github.com/qureshizawar/cuda-quartic-solver

A general cubic equation solver and quartic equation minimisation solver written for CPU and Nvidia GPUs, for more details and results, see: https://arxiv.org/abs/1903.10041. The library is available for C++/CUDA as well as Python using Pybind11.

cmake cubic-equations cuda cuda-quartic-solver gpu minimisation numpy nvidia-gpus openmp optimization pip pybind11 python quartic quartic-equations quartic-functions quartic-minimisation solver

Last synced: 11 Oct 2024

https://github.com/sukunis/cunfft

Nonequispaced FFTs on GPUs (based on NFFT: http://www.nfft.org)

cuda cunfft gpu nfft

Last synced: 03 Dec 2024

https://github.com/achirkin/real-salient

A GPU-only implementation of DenseCut for a RealSense camera

crf cuda gmm grabcut realsense

Last synced: 29 Oct 2024

https://github.com/phrb/gpu-autotuning

Autotuning NVCC Compiler Parameters, published @ CCPE Journal

autotuning cuda nvcc opentuner

Last synced: 19 Oct 2024

https://github.com/tgymnich/shallowwater.jl

🌊 Simple Finite Volumes models that solve the shallow water equations

cuda hpc julia shallow-water-equations simulation tsunami

Last synced: 25 Oct 2024

https://github.com/pkestene/incremental-fluids-kokkos

Simple, single-file fluid solvers for learning purposes revisited with parallel programing (Kokkos: OpenMP / Cuda)

cfd cuda kokkos openmp parallel-programming

Last synced: 18 Dec 2024

https://github.com/yottaawesome/cuda-by-example

Source code contained in CUDA By Example: An Introduction to General Purpose GPU Programming

c cpp cuda

Last synced: 13 Nov 2024

https://github.com/prg-titech/kani-cuda

A program synthesizer for CUDA like GPGPU language

cuda racket

Last synced: 18 Nov 2024

https://github.com/bobbui/tensorflow-serving-cuda-docker

Docker image for tensorflow serving with Nvidia CUDA, CuDNN

cuda cudnn docker docker-image tensorflow tensorflow-serving ubuntu1604

Last synced: 25 Jan 2025

https://github.com/vc-bonn/charonload

Develop C++/CUDA extensions with PyTorch like Python scripts

cmake cpp cuda jit python pytorch torch

Last synced: 16 Dec 2024

https://github.com/finmath/finmath-lib-cuda-extensions

Classes enabling finmath-lib to run its Monte-Carlo models on Cuda GPUs

cuda finmath-lib gpu

Last synced: 23 Oct 2024

https://github.com/ema2159/equirectangular-cubemaptransform

OpenCV with CUDA and OpenMP implementations for transforming equirectangular images to cube maps and vice versa

cubemap-to-equirectangular cuda equirectangular-to-cubemap opencv openmp

Last synced: 16 Nov 2024

https://github.com/evilfreelancer/docker-whisper-server

whisper.cpp HTTP transcription server with OpenAI-like API in Docker

api api-server asr cuda docker docker-compose dockerfile nvidia openai openai-api whisper whisper-cpp

Last synced: 09 Oct 2024

https://github.com/dexter2206/ising

Ising: a Python package for exactly solving abritrary Ising model instances using exhaustive search.

cuda ising optimization

Last synced: 23 Oct 2024

https://github.com/previsionio/damavand

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

cuda distributed-computing hpc multi-gpu multithreading quantum-computing rust simulator

Last synced: 28 Nov 2024

https://github.com/balos1/shi_tomasi_feature_detection

CUDA, OpenMP, and regular serial C implementations of Shi Tomasi feature detection

cuda image-processing openmp shi-tomasi-detection

Last synced: 29 Oct 2024

https://github.com/NAGAGroup/Scalix

Scalix is a data parallel compute library that automatically scales to the available compute resources.

cuda hpc scientific-computing

Last synced: 02 Nov 2024

https://github.com/ktaletsk/gpu_dsm

🔗Accessible quantitative polymer rheology predictions with slip-links on GPU

c-plus-plus cuda gpu polymer rheology

Last synced: 31 Dec 2024

https://github.com/comcast/rapid-ip-checker

A GPU accelerated tool to compare large lists of IPv4/IPv6 addresses.

cuda ipv4 ipv6 numba parallel

Last synced: 14 Nov 2024

https://github.com/yalue/cudabrot

A CUDA renderer for the Buddhabrot fractal

amd buddhabrot buddhabrot-fractal cuda gpu hip mandelbrot mandelbrot-fractal rocm

Last synced: 23 Oct 2024

https://github.com/joaomlneto/cpds-heat

Heat Equation using different solvers (Jacobi, Red-Black, Gaussian) in C using different paradigms (sequential, OpenMP, MPI, CUDA) - Assignments for the Concurrent, Parallel and Distributed Systems course @ UPC 2013

cuda cuda-support gauss-seidel gaussian heat-equation jacobi mpi mpi-applications openmp openmp-applications openmp-parallelization openmp-support openmpi paradigms performance red-black solvers

Last synced: 09 Nov 2024

https://github.com/yhmtsai/ci_windows_cuda

This Repo creates the dockerfiles for using cuda in windows docker and provides the gitlab/github windows shared vm runner config.

continuous-integration cuda docker github-actions gitlab windows

Last synced: 27 Nov 2024

https://github.com/pkestene/cuda-proj-tmpl

A minimal cmake based project skeleton for developping a CUDA application

cea cmake cuda gpu gpu-computing parallel-computing parallel-programming template

Last synced: 18 Dec 2024

https://github.com/pkestene/kokkos-proj-tmpl

A minimal cmake based project skeleton for developping a kokkos application

cea cuda gpu kokkos openmp parallel-computing parallelization performance-portability

Last synced: 18 Dec 2024

https://github.com/ghost---shadow/near-duplicate-image-detector

CUDA implementation of some perceptual hashing algorithms

cuda image-hashing thrust

Last synced: 11 Oct 2024

https://github.com/101001000/tfg-pathtracer

CUDA Path tracing render engine, with MIS and the Disney BRDF

cuda pathtracing raytracing renderer

Last synced: 14 Nov 2024

https://github.com/aknvictor/culingam

CULiNGAM accelerates LiNGAM analysis on GPUs.

causal-discovery cuda lingam

Last synced: 01 Nov 2024