Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-01-30 00:06:35 UTC
- JSON Representation
https://github.com/fangq/mcx
Monte Carlo eXtreme (MCX) - GPU-accelerated photon transport simulator
3d c cuda matlab monte-carlo optical-imaging pascal photon-transport physics-simulation ray-tracing volumetric-rendering voxel-based
Last synced: 25 Jan 2025
https://github.com/fhamborg/newsmtsc
Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
cuda dataset deep-learning news-articles pytorch sentiment-analysis sentiment-classification text-classification tsc
Last synced: 29 Jan 2025
https://github.com/roastduck/FreeTensor
A language and compiler for irregular tensor programs.
ast automatic-differentiation code-generation cuda gpu jit openmp tensor
Last synced: 07 Nov 2024
https://github.com/qengineering/jetson-nano-image
Jetson Nano image with deep learning frameworks
cuda deep-learning jetson-nano mnn ncnn opencv pytorch sd-card-image team-viewer tegra tensorflow torch torchvision
Last synced: 28 Jan 2025
https://github.com/AnicetNgrt/jiro-nn
A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.
adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd
Last synced: 16 Jan 2025
https://github.com/electronic-structure/SIRIUS
Domain specific library for electronic structure calculations
cuda density-functional-theory electronic-structure-calculations full-potential gpu lapw mpi planewave pseudopotential rocm
Last synced: 20 Nov 2024
https://github.com/anicetngrt/jiro-nn
A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.
adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd
Last synced: 30 Jan 2025
https://github.com/rsnk96/Ubuntu-Setup-Scripts
Scripts to help you set up your Ubuntu quickly, especially if you're in any subfield of Data Science or AI!
anaconda cuda deep-learning deeplearning dl ffmpeg installers ml opencv python pytorch tensorflow tensorflow-setup ubuntu zsh
Last synced: 06 Nov 2024
https://github.com/GooFit/GooFit
Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP
cuda fitting gpu gpu-computing omp physics root-cern thrust
Last synced: 06 Nov 2024
https://github.com/naeioi/pbf-cuda
Position Based Fluids CUDA implementation
cuda fluid-solver opengl real-time simulation
Last synced: 11 Nov 2024
https://github.com/inoryy/tensorflow-optimized-wheels
TensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
avx2 cuda cudnn python sse tensorflow tensorflow-gpu tensorflow-wheels wheels xla
Last synced: 03 Nov 2024
https://github.com/ihhub/penguinv
Computer vision library with focus on heterogeneous systems
avx computer-vision cpp cuda gpu hacktoberfest heterogeneous-systems image-processing opencl python simd sse thread-pool
Last synced: 24 Jan 2025
https://github.com/jdermody/brightwire
Bright Wire is an open source machine learning library for .NET with GPU support (via CUDA)
convolutional-neural-networks csharp cuda cuda-support gpu gpu-support machine-learning machine-learning-library machinelearning neural-network recurrent-neural-networks
Last synced: 27 Jan 2025
https://github.com/MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
cuda cuda-cpp cuda-programming
Last synced: 20 Nov 2024
https://github.com/sniklaus/pytorch-extension
an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors
cuda cupy deep-learning python pytorch
Last synced: 14 Nov 2024
https://github.com/helyim/helyim
seaweedfs implemented in pure Rust
cuda dpdk erasure-coding hdfs iouring kernel-bypass object-storage rdma s3 spdk webdav
Last synced: 30 Jan 2025
https://github.com/glotzerlab/fresnel
Publication quality path tracing in real time.
cuda optix path-tracing python simulation soft-matter
Last synced: 27 Jan 2025
https://github.com/src-d/minhashcuda
Weighted MinHash implementation on CUDA (multi-gpu).
cuda lsh machine-learning minhash
Last synced: 29 Jan 2025
https://github.com/gpmueller/eigen-cuda
MWE for using the Eigen library in CUDA kernels
Last synced: 01 Nov 2024
https://github.com/cgtuebingen/Flex-Convolution
Source code for: Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds), accepted at ACCV 2018
3d-point-clouds cuda deep-learning deep-neural-networks machine-learning pointcloud pointcloudprocessing research research-paper segmentation shapenet tensorflow
Last synced: 27 Oct 2024
https://github.com/sandialabs/omega_h
Simplex mesh adaptivity for HPC
cmake cpp cpp14 cuda geometry gpu hpc mesh mesh-generation meshing mpi openmp parallel parallel-computing parallelism sandia-national-laboratories scr-2203 snl-science-libs triangulation
Last synced: 30 Jan 2025
https://github.com/pytorch/extension-script
Example repository for custom C++/CUDA operators for TorchScript
Last synced: 19 Jan 2025
https://github.com/qdLMF/LIO-SAM-GPU-ScanToMapOpt
A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.
3d-mapping cuda faster-lio gpu ivox knn lidar lidar-inertial-odometry lidar-slam lio lio-sam loam slam
Last synced: 27 Oct 2024
https://github.com/arbor-sim/arbor
The Arbor multi-compartment neural network simulation library.
cuda gpu hip hpc modern-cpp mpi neuroscience
Last synced: 25 Jan 2025
https://github.com/cair/tmu
Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-vs-one multi-class classifier. TMU is written in Python with wrappers for C and CUDA-based clause evaluation and updating.
absorbing-states autoencoder convolution cuda gpu incremental incremental-computation multi-output pattern-recognition propositional-logic regression relational-logic sparse tsetlin-machine
Last synced: 30 Jan 2025
https://github.com/acdslab/mppi-generic
Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)
cpp cuda model-predictive-control model-predictive-path-integral robotics stochastic-optimization
Last synced: 29 Jan 2025
https://github.com/leimao/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
Last synced: 06 Dec 2024
https://github.com/rocm/hip-cpu
An implementation of HIP that works on CPUs, across OSes.
cpp17 cuda cuda-programming hip hip-kernel-language hip-portability hip-runtime parallel-algorithms spmd stl-algorithms
Last synced: 07 Nov 2024
https://github.com/upul/aurora
Minimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.
cplusplus cuda cudnn cython deep-learning python3 system-design
Last synced: 18 Nov 2024
https://github.com/ANNetGPGPU/ANNetGPGPU
A GPU (CUDA) based Artificial Neural Network library
c-plus-plus-11 cuda propagation-network self-organizing-map
Last synced: 10 Nov 2024
https://github.com/nvidia-merlin/hierarchicalkv
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.
cuda dynamic-embedding embedding-storage gpu hashtable key-value-store recommender-system
Last synced: 24 Nov 2024
https://github.com/seeed-projects/recomputer-jetson-for-beginners
Beginner's Guide to reComputer Jetson
ai application beginner-friendly cuda cv deepstream examples generative-ai guide jetson llm ml nlp nvidia recomputer robotics tao tensorrt
Last synced: 28 Jan 2025
https://github.com/mitmul/pynvvl
A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
cuda cupy gpu numpy nvidia-video-loader nvvl python video video-processing
Last synced: 25 Jan 2025
https://github.com/pennylaneai/pennylane-lightning
The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.
cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm
Last synced: 25 Jan 2025
https://github.com/coreylowman/llama-dfdx
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
cuda deep-learning inference language-model llama neural-network rust rust-lang
Last synced: 07 Nov 2024
https://github.com/Heteroflow/Heteroflow
Concurrent CPU-GPU Programming using Task Models
cpu-gpu-scheduling cuda gpu gpu-acceleration gpu-computing gpu-programming heterogeneous-computing heterogeneous-parallel-programming heterogeneous-systems multithreaded multithreading task-parallelism
Last synced: 02 Nov 2024
https://github.com/osai-ai/dokai
Collection of Docker images for ML/DL and video processing projects
cuda deep-learning docker docker-image ffmpeg opencv python pytorch tensorrt video-processing
Last synced: 05 Nov 2024
https://github.com/harrism/mini-nbody
A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.
astrophysics benchmark cuda nbody
Last synced: 28 Oct 2024
https://github.com/juliagpu/acceleratedkernels.jl
Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library
Last synced: 29 Jan 2025
https://github.com/mseitzer/gpu-monitor
Script to remotely check GPU servers for free GPUs
cuda cudnn deep-learning gpu nvidia-smi remote ssh
Last synced: 14 Oct 2024
https://github.com/sonots/cumo
Cumo (pronounced like "koomo") is CUDA aware numerical library whose interface is highly compatible with Ruby Numo
cuda numo ruby scicentific-computing
Last synced: 24 Jan 2025
https://github.com/1ytic/pytorch-edit-distance
Levenshtein edit-distance on PyTorch and CUDA
asr cuda edit-distance levenshtein nlp pytorch
Last synced: 30 Jan 2025
https://github.com/mratsim/Arch-Data-Science
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
archlinux cuda cudnn data-science deep-learning lightgbm machine-learning mkl mxnet natural-language-processing natural-language-understanding nervana opencv package pandas pytorch scikit-learn spacy tensorflow xgboost
Last synced: 27 Nov 2024
https://github.com/mratsim/arch-data-science
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
archlinux cuda cudnn data-science deep-learning lightgbm machine-learning mkl mxnet natural-language-processing natural-language-understanding nervana opencv package pandas pytorch scikit-learn spacy tensorflow xgboost
Last synced: 09 Nov 2024
https://github.com/sasagawa888/deeppipe2
Deep Learning library using GPU(CUDA/cuBLAS)
cublas cuda deep-learning elixir gpu
Last synced: 26 Oct 2024
https://github.com/windqaq/mpm
Simulating on GPU using Material Point Method and rendering.
cuda gvdb-voxels material-point-method raytracer simulation thrust
Last synced: 28 Oct 2024
https://github.com/braintwister/docker-devel-env
Fast, reproducible, and portable software development environments
clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode
Last synced: 09 Oct 2024
https://github.com/nvidia/nvimagecodec
A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface
computer-vision cpp cuda dali data-processing deep-learning fast-data-pipeline gpu image-processing machine-learning nvidia python pytorch
Last synced: 29 Jan 2025
https://github.com/PennyLaneAI/pennylane-lightning
The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane
cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm
Last synced: 17 Nov 2024
https://github.com/BrainTwister/docker-devel-env
Fast, reproducible, and portable software development environments
clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode
Last synced: 09 Dec 2024
https://github.com/pwhiddy/fat-clouds
GPU Fluid Simulation with Volumetric Rendering
blaze cuda fluid gpu navier-stokes raymarching smoke
Last synced: 26 Nov 2024
https://github.com/eomii/rules_ll
An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming
bazel bleeding-edge build-system clang clang-tidy cpp cuda gpu-programming hermetic hip llvm nix openmp remote-caching remote-execution reproducible sanitizers
Last synced: 28 Jan 2025
https://github.com/sunsetquest/cudapad
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
cuda cuda-programming gpu nvidia ptx ptx-utils windows
Last synced: 01 Dec 2024
https://github.com/mitsuba-renderer/drjit-core
Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering (core library)
Last synced: 29 Jan 2025
https://github.com/shinmorino/sqaod
Solvers/annealers for simulated quantum annealing on CPU and CUDA(NVIDIA GPU).
accelearated cplusplus-11 cuda gpu linux monte-carlo-simulation nvidia-gpu python quantum-annealing quantum-computing windows
Last synced: 20 Dec 2024
https://github.com/spcl/daceml
A Data-Centric Compiler for Machine Learning
compiler cuda deep-learning fpga high-performance-computing machine-learning pytorch
Last synced: 06 Nov 2024
https://github.com/amypad/cuvec
Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory
array buffer c cpp cpu cpython cpython-api cpython-extensions cuda cxx gpu hacktoberfest pybind11 python swig vector
Last synced: 25 Jan 2025
https://github.com/matifali/dockerdl
Deep Learning Docker Image
cuda deep-learning docker jupyter numpy python pytorch tensorflow
Last synced: 29 Jan 2025
https://github.com/ashvardanian/parallelreductionsbenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!
apple avx512 cuda glsl gpgpu gpu gpu-acceleration gpu-computing hpc intel metal nvidia opencl openmp parallel simd stl tbb thrust
Last synced: 30 Jan 2025
https://github.com/openmm/NNPOps
High-performance operations for neural network potentials
cuda gpu machine-learning molecular-dynamics molecular-modeling
Last synced: 13 Nov 2024
https://github.com/juliagpu/gemmkernels.jl
Flexible and performant GEMM kernels in Julia
Last synced: 28 Jan 2025
https://github.com/zenustech/zpc
zenus parallel computing library for zenus physics-based simulations
cuda gpu hpc math physics simulation
Last synced: 30 Jan 2025
https://github.com/JuliaGPU/GemmKernels.jl
Flexible and performant GEMM kernels in Julia
Last synced: 13 Nov 2024
https://github.com/shi-yan/FreeWill
A deep learning library in C++/CUDA
cnn cuda deep-learning dnn machine-learning neural-network qt5
Last synced: 20 Nov 2024
https://github.com/devxt-llc/ezlocalai
ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.
ai artificial-intelligence cuda llamacpp local
Last synced: 24 Jan 2025
https://github.com/ttsiodras/mandelbrotsse
Real-time Mandelbrot zoom via SSE, AVX, OpenMP, CUDA, XaoS...
Last synced: 21 Dec 2024
https://github.com/kaslanarian/pydynet
NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)
autograd cnn cuda cupy deep-learning-framework numpy python pytorch pytorch-implementation rnn transformer
Last synced: 29 Dec 2024
https://github.com/projectchrono/dem-engine
A dual-GPU DEM solver with complex grain geometry support
chrono cuda discrete-element-method gpu multi-gpu simulation
Last synced: 30 Jan 2025
https://github.com/qengineering/pytorch-jetson-nano
PyTorch installation wheels for Jetson Nano
aarch64 cuda cuda-10 cudnn cudnn8 jetson-nano jetson-tx2 jetson-xavier libtorch python-wheel python3 pytorch pytorch-installation torch torchvision wheel
Last synced: 27 Nov 2024
https://github.com/dbraun/pytorchtop
GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV
cuda libtorch opencv pytorch touchdesigner
Last synced: 08 Nov 2024
https://github.com/anicusan/acceleratedkernels.jl
Cross-architecture parallel algorithms for Julia's GPU backends, from a unified KernelAbstractions.jl codebase. Targets Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library
Last synced: 27 Oct 2024
https://github.com/nolanzzz/mtmct
Design and Implementation of a Multi-Target Multi-Camera Tracking Solution
cuda deep-learning detection machine-learning opencv python pytorch reidentification research-project resnet tracking
Last synced: 28 Oct 2024
https://github.com/DevXT-LLC/ezlocalai
ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.
ai artificial-intelligence cuda llamacpp local
Last synced: 13 Oct 2024
https://github.com/una-dinosauria/local-search-quantization
State-of-the-art method for large-scale ANN search as of Oct 2016. Presented at ECCV 16.
computer-vision cuda eccv-16 gpu julia multi-codebook quantization
Last synced: 29 Oct 2024
https://github.com/heethesh/computer-vision-and-deep-learning-setup
Tutorial on how to setup your system with a NVIDIA GPU and to install Deep Learning Frameworks like TensorFlow, Darknet for YOLO, Theano, and Keras; OpenCV; and NVIDIA drivers, CUDA, and cuDNN libraries on Ubuntu 16.04, 17.10 and 18.04.
cuda cudnn gpu install keras nvidia opencv python tensorflow ubuntu ubuntu1710
Last synced: 25 Dec 2024
https://github.com/jamjamjon/usls
A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.
clip cuda dinov2 florence2 grounding-dino ml ocr onnx onnxruntime rust rust-yolo sam sapiens tensorrt yolo yolo-rs yolo-rust yolov11 yolov8
Last synced: 26 Jan 2025
https://github.com/yalue/cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
benchmark cuda cuda-kernels gpu gpu-scheduling mandelbrot
Last synced: 29 Jan 2025
https://github.com/celeritas-project/celeritas
Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
computational-physics cuda detector-simulation gpu hep high-energy-physics hip monte-carlo particle-transport
Last synced: 27 Jan 2025
https://github.com/BobMcDear/neural-network-cuda
Neural network from scratch in CUDA/C++
cplusplus cuda deep-learning machine-learning neural-network
Last synced: 21 Dec 2024
https://github.com/pypr/compyle
Execute a subset of Python on HPC platforms
cuda cython high-performance-computing opencl openmp python transpile
Last synced: 25 Jan 2025
https://github.com/harrism/sublimetext-cuda-cpp
CUDA C++ package for Sublime Text 2 & 3
cuda snippets sublime-text tmlanguage
Last synced: 28 Oct 2024
https://github.com/openmlsys/openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
Last synced: 16 Nov 2024