Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-01-30 00:06:35 UTC
- JSON Representation
https://github.com/NVIDIA-Genomics-Research/GenomeWorks
SDK for GPU accelerated genome assembly and analysis
alignment cuda genomics gpu mapping nvidia partial-order-alignment poa python-api
Last synced: 15 Nov 2024
https://github.com/GoodAI/BrainSimulator
Brain Simulator is a platform for visual prototyping of artificial intelligence architectures.
ai brain-simulator cuda machine-learning
Last synced: 20 Nov 2024
https://github.com/gezp/docker-ubuntu-desktop
Docker Image for Ubuntu Desktop which support HW GPU accelerated GUI apps. you can access the Container with ssh or remote desktop, just like Cloud VM.
cuda docker kasmvnc nomachine nvidia-gpu opengl remote-desktop ubuntu virtualgl
Last synced: 07 Nov 2024
https://github.com/JuliaGPU/CuArrays.jl
A Curious Cumulation of CUDA Cuisine
Last synced: 29 Nov 2024
https://github.com/nvidia/cuda-checkpoint
CUDA checkpoint and restore utility
Last synced: 30 Jan 2025
https://github.com/rapidsai/cuvs
cuVS - a library for vector search and clustering on the GPU
anns clustering cuda distance gpu information-retrieval llm machine-learning nearest-neighbors neighborhood-methods similarity-search sparse statistics vector-search vector-similarity vector-store
Last synced: 24 Jan 2025
https://github.com/bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
Last synced: 25 Jan 2025
https://github.com/rentainhe/pytorch-distributed-training
Simple tutorials on Pytorch DDP training
apex cuda ddp-training deep-learning pytorch
Last synced: 30 Jan 2025
https://github.com/ashvardanian/less_slow.cpp
Learning how to write "Less Slow" code in C++ 20, C 99, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
assembly assembly-language avx512 benchmark coroutines cpp cpp-programming cpp17 cpp20 cuda gcc google-benchmark hpc io-uring linux-kernel llvm ranges tutorial tutorials
Last synced: 27 Jan 2025
https://github.com/zjhellofss/kuiperllama
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2
Last synced: 27 Jan 2025
https://github.com/llnl/blt
A streamlined CMake build system foundation for developing HPC software
blt build-system build-tools cmake cpp cuda hpc radiuss testing
Last synced: 25 Jan 2025
https://github.com/pcb9382/FaceAlgorithm
face detection face recognition包含人脸检测(retinaface,yolov5face,yolov7face,yolov8face),人脸检测跟踪(ByteTracker),人脸角度计算(Face_Angle)人脸矫正(Face_Aligner),人脸识别(Arcface),口罩检测(MaskRecognitiion),年龄性别检测(Gender_age),静默活体检测(Silent_Face_Anti_Spoofing),FaceAlignment(106keypoints)
cuda face-alignment face-detection face-recognition tensorrt yolov5face yolov7face yolov8face
Last synced: 27 Oct 2024
https://github.com/bwohlberg/sporco
Sparse Optimisation Research Code
admm convolutional-dictionary-learning convolutional-sparse-coding cuda dictionary-learning fista optimization optimization-algorithms plug-and-play-priors python robust-pca sparse-coding sparse-representations sparsity total-variation total-variation-minimization
Last synced: 27 Jan 2025
https://github.com/LLNL/blt
A streamlined CMake build system foundation for developing HPC software
blt build-system build-tools cmake cpp cuda hpc radiuss testing
Last synced: 09 Nov 2024
https://github.com/trinkle23897/fast-poisson-image-editing
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
cpp cuda high-performance-computing image-processing jacobi-iteration jacobi-method mpi numpy openmp parallel-computing poisson-image-editing pybind11 python
Last synced: 26 Jan 2025
https://github.com/marian-nmt/marian-dev
Fast Neural Machine Translation in C++ - development repository
cpp11 cuda fast gpu-acceleration neural-machine-translation
Last synced: 25 Jan 2025
https://github.com/zjhellofss/KuiperLLama
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2
Last synced: 03 Jan 2025
https://github.com/Trinkle23897/Fast-Poisson-Image-Editing
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
cpp cuda high-performance-computing image-processing jacobi-iteration jacobi-method mpi numpy openmp parallel-computing poisson-image-editing pybind11 python
Last synced: 03 Nov 2024
https://github.com/asmirnou/watsor
Object detection for video surveillance
camera coral cuda detection ffmpeg gpu hardware-acceleration homeassistant ip mpegts mqtt person-detector python realtime stream surveillance tensorrt tensrflow video zones
Last synced: 27 Jan 2025
https://github.com/AmusementClub/vs-mlrt
Efficient CPU/GPU/Vulkan ML Runtimes for VapourSynth (with built-in support for waifu2x, DPIR, RealESRGANv2/v3, Real-CUGAN, RIFE, SCUNet and more!)
artificial-intelligence cuda deep-learning directml dpir gpu migraphx ncnn neural-network onnx onnxruntime openvino real-cugan real-esrgan rife tensorrt vapoursynth vulkan waifu2x
Last synced: 29 Oct 2024
https://github.com/koide3/gtsam_points
A collection of GTSAM factors and optimizers for point cloud SLAM
bundle-adjustment continuous-time cuda factor-graph gpu gtsam kdtree localization mapping point-cloud registration slam voxelmap
Last synced: 25 Jan 2025
https://github.com/ritchieng/dlami
A Deep Learning Amazon Web Service (AWS) AMI that is open, free and works. Run in less than 5 minutes. TensorFlow, Keras, PyTorch, Theano, MXNet, CNTK, Caffe and all dependencies.
ami aws cuda cudnn5 keras python tensorflow ubuntu
Last synced: 26 Jan 2025
https://github.com/shapelets/khiva
An open-source library of algorithms to analyse time series in GPU and CPU.
clustering cpp cuda data-series discords distances gpu khiva kshape matrix-profile motifs multicore opencl shapelets snippets time-series timeseries
Last synced: 27 Dec 2024
https://github.com/pmeier/light-the-torch
Install PyTorch distributions with computation backend auto-detection
Last synced: 25 Jan 2025
https://github.com/opendilab/di-hpc
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
cuda hpc lstm pytorch reinforcement-learning triton
Last synced: 21 Jan 2025
https://github.com/marnovo/macos-egpu-cuda-guide
Set up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU
apple cuda deep-learning egpu gaming gpu guide hacktoberfest mac machine-learning macos nvidia
Last synced: 19 Dec 2024
https://github.com/marnovo/macOS-eGPU-CUDA-guide
Set up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU
apple cuda deep-learning egpu gaming gpu guide hacktoberfest mac machine-learning macos nvidia
Last synced: 22 Nov 2024
https://github.com/Hellisotherpeople/CX_DB8
a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)
contextual-summarization cuda debate-evidence embeddings extractive-summarization flair python semantic-search semantic-summarization summarization summarizer token-level-summarization universal-sentence-encoder
Last synced: 22 Nov 2024
https://github.com/bh107/bohrium
Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX
cuda gpu gpu-acceleration multi-core numpy opencl parallel-computing
Last synced: 12 Nov 2024
https://github.com/bytedance/abq-llm
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
cuda llm-inference mlsys quantized-networks research
Last synced: 28 Jan 2025
https://github.com/llnl/hiop
HPC solver for nonlinear optimization problems
acopf bfgs constrained-optimization cuda gpu-support hpc interior-point-method interior-point-optimizer math-physics mpi nonlinear-optimization nonlinear-programming nonlinear-programming-algorithms nonsmooth-optimization optimization parallel-programming quasi-newton radiuss rocm solver
Last synced: 26 Jan 2025
https://github.com/DeMoriarty/TorchPQ
Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
cuda nearest-neighbor-search pytorch
Last synced: 02 Nov 2024
https://github.com/openucx/ucc
Unified Collective Communication Library
collectives cuda deep-learning hpc infiniband mpi openshmem pgas pytorch roce sharp
Last synced: 24 Jan 2025
https://github.com/1ytic/warp-rnnt
CUDA-Warp RNN-Transducer
cuda forward-backward pytorch rnn-transducer tensorflow warp
Last synced: 27 Jan 2025
https://github.com/modelscope/dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
cpu cuda guided-decoding llm llm-inference native-engine
Last synced: 26 Jan 2025
https://github.com/andrewkchan/yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
cpp cuda inference-engine llama llamacpp llm llm-inference machine-learning mistral
Last synced: 25 Jan 2025
https://github.com/demoriarty/torchpq
Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
cuda nearest-neighbor-search pytorch
Last synced: 26 Jan 2025
https://github.com/helmut-hoffer-von-ankershoffen/jetson
Helmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
ansible archiconda cuda docker edge-devices hoffer-von-ankershoffen jupyter k8s kubeflow kubernetes kustomize machine-learning ml nvidia-jetson-nano nvidia-jetson-xavier skaffold smart-iot software-engineering tensorflow-serving virtualbox
Last synced: 07 Jan 2025
https://github.com/ceed/libceed
CEED Library: Code for Efficient Extensible Discretizations
api ceed cuda ecp exascale-computing gpu high-order high-performance-computing hpc julia linear-algebra
Last synced: 24 Jan 2025
https://github.com/CEED/libCEED
CEED Library: Code for Efficient Extensible Discretizations
api ceed cuda ecp exascale-computing gpu high-order high-performance-computing hpc julia linear-algebra
Last synced: 14 Nov 2024
https://github.com/dividiti/ck-caffe
Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
accuracy android caffe collaborative-optimization collective-knowledge costs cuda customizable-workflows dnn-as-a-service dnn-optimization json-api linux opencl performance-portability portable-package-manager reproducible-experiments resources windows
Last synced: 13 Nov 2024
https://github.com/nobuyuki83/delfem2
Research prototyping framework for physics simulation written in C++
cuda fem-simulation finite-element-methods geometry-processing opengl physics-simulation simulation
Last synced: 26 Jan 2025
https://github.com/mkeeter/mpr
Reference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
cad cuda gpu implicit-surfaces rendering
Last synced: 27 Oct 2024
https://github.com/rapidsai/node
GPU-accelerated data science and visualization in node
cuda data-science data-visualization gpgpu gpu nodejs
Last synced: 27 Jan 2025
https://github.com/LambdaLabsML/distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
cluster cuda deepspeed distributed-training fsdp gpu gpu-cluster kuberentes lambdalabs mpi nccl pytorch sharding slurm
Last synced: 21 Oct 2024
https://github.com/wangzyon/NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
Last synced: 05 Nov 2024
https://github.com/msminhas93/nviwatch
NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
bash command-line-tool cuda deeplearning gpu gpu-monitoring linux monitoring nvidia nvidia-smi nvml performant process-monitoring ratatui resource-monitoring rust terminal top tui ubuntu
Last synced: 27 Jan 2025
https://github.com/zhongkaifu/seq2seqsharp
Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
attention-model cuda deep-learning encoder-decoder gpu image lstm machine-translation neural-network seq2seq sequence-to-sequence tensor text transformer transformer-architecture transformer-encoder translation vision-transformer
Last synced: 25 Jan 2025
https://github.com/toruniina/lbvh
an implementation of parallel linear BVH (LBVH) on GPU
bvh cuda gpu nearest-neighbor-search parallel thrust
Last synced: 20 Dec 2024
https://github.com/lxxue/frnn
Fixed Radius Nearest Neighbor Search on GPU
cuda nearest-neighbor-search pytorch
Last synced: 24 Jan 2025
https://github.com/nvidia/gmat
A toolkit showing GPU's all-round capability in video processing
codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec
Last synced: 03 Jan 2025
https://github.com/uncomplicate/clojurecuda
Clojure library for CUDA development
clojure clojure-library cuda cuda-development gpu-acceleration gpu-computing high-performance java
Last synced: 25 Jan 2025
https://github.com/zjin-lcf/HeCBench
benchmark cuda gpu-computing hip hpc-applications openmp scientific-computing sycl test-driven-development
Last synced: 05 Nov 2024
https://github.com/NVIDIA/GMAT
A toolkit showing GPU's all-round capability in video processing
codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec
Last synced: 05 Nov 2024
https://github.com/pykeio/diffusers
A modular Rust library for super fast Stable Diffusion inference - 45% faster than PyTorch 🔮
cuda diffusion-models onnx onnxruntime onnxruntime-gpu rust stable-diffusion stable-diffusion-v2
Last synced: 31 Oct 2024
https://github.com/primitiv/primitiv
A Neural Network Toolkit.
cmake cpp cuda deep-learning framework gpu neural-network opencl
Last synced: 14 Nov 2024
https://github.com/hmunachi/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 26 Jan 2025
https://github.com/supranational/sppark
Zero-knowledge template library
bls12-377 bls12-381 cuda ntt pasta-curves zero-knowledge zero-knowledge-proofs zk-snarks zk-starks
Last synced: 30 Jan 2025
https://github.com/zpzim/scamp
The fastest way to compute matrix profiles on CPU and GPU!
cuda gpu matrix-profile python time-series time-series-analysis
Last synced: 27 Jan 2025
https://github.com/yilingqiao/dmrf
Dynamic Mesh-Aware Radiance Fields (ICCV2023): Raytracing rendering and interactive simulating mesh with NeRF
cuda nerf raytracing simulation
Last synced: 23 Jan 2025
https://github.com/cuMF/cumf_als
CUDA Matrix Factorization Library with Alternating Least Square (ALS)
als cuda gpu machine machine-learning matrix-factorization
Last synced: 13 Nov 2024
https://github.com/cnugteren/cltune
CLTune: An automatic OpenCL & CUDA kernel tuner
Last synced: 19 Dec 2024
https://github.com/HMUNACHI/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 12 Nov 2024
https://github.com/unitaryfund/qrack
Comprehensive, GPU accelerated framework for developing universal virtual quantum processors
cuda distributed-quantum-computing gpu hpc opencl physics physics-simulation quantum quantum-computer-simulator quantum-computing quantum-information quantum-simulator qubits
Last synced: 03 Nov 2024
https://github.com/librapid/librapid
A highly optimised C++ library for mathematical applications and neural networks.
array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd
Last synced: 24 Jan 2025
https://github.com/LibRapid/librapid
A highly optimised C++ library for mathematical applications and neural networks.
array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd
Last synced: 06 Dec 2024
https://github.com/acceleratehs/accelerate-llvm
LLVM backend for Accelerate
accelerate compiler cuda gpu gpu-computing hacktoberfest haskell llvm parallel-computing
Last synced: 26 Jan 2025
https://github.com/nvidia/dl4agx
Deep Learning tools and applications for NVIDIA AGX platforms.
autonomous-driving computer-vision cuda deep-learning drive-agx embedded
Last synced: 25 Jan 2025
https://github.com/rocm/gpufort
GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm
Last synced: 19 Dec 2024
https://github.com/ROCm/gpufort
GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm
Last synced: 23 Oct 2024
https://github.com/jimver/cuda-toolkit
GitHub Action to install CUDA
action cuda cuda-toolkit github-actions nvidia nvidia-cuda
Last synced: 24 Jan 2025
https://github.com/qengineering/install-opencv-jetson-nano
OpenCV installation script with CUDA and cuDNN support
cuda cudnn jetson-nano jetson-xavier opencv opencv4
Last synced: 28 Jan 2025
https://github.com/p-ranav/PhotoLab
AI-Powered Photo Editor (Python, PyQt6, PyTorch)
animegan colorization cuda human-segmentation interactive nuitka numpy opencv photo-editor pillow portrait-mode pyqt6 pyqt6-desktop-application python python3 pytorch scikit-image spot-removal stacking
Last synced: 06 Nov 2024
https://github.com/pythonlessons/tensorflow-object-detection-tutorial
The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
classifier cuda cudnn detection detection-api detection-classifier detection-tutorial gpu grabscreen labels object-detection pil python-mss tensorflow tensorflow-cpu tensorflow-gpu tensorflow-models tutorial
Last synced: 09 Oct 2024
https://github.com/dvlab-research/SparseTransformer
A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).
3d-point-cloud cuda sparse-transformer transformer
Last synced: 28 Oct 2024
https://github.com/proger/accelerated-scan
Accelerated First Order Parallel Associative Scan
cuda cumulative-sum recurrent-neural-networks state-space-model torch
Last synced: 04 Nov 2024
https://github.com/hijkzzz/cuda-neural-network
Convolutional Neural Network with CUDA (MNIST 99.23%)
cnn cpp cuda mnist neural-network
Last synced: 12 Nov 2024
https://github.com/sjtu-ipads/phoenixos
Fast OS-level support for GPU checkpoint and restore
checkpoint-restore criu cuda gpu
Last synced: 24 Jan 2025
https://github.com/coderonion/awesome-cuda-and-hpc
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.
awesome blas cublas cuda cudnn fortran gemm gpu hpc lapack llama llm mojo numpy openblas parallel-computing pytorch scipy tensorrt yolo
Last synced: 05 Oct 2024
https://github.com/arborx/arborx
Performance-portable geometric search library
bounding-volume-hierarchy c-plus-plus clustering cpp cuda dbscan distributed gpu hdbscan high-performance-computing hpc knn-search kokkos mpi nearest-neighbors parallel
Last synced: 21 Jan 2025
https://github.com/patwie/cuda-design-patterns
Some CUDA design patterns and a bit of template magic for CUDA
bazel cpp11 cuda cuda-development cuda-device cuda-kernels cuda-utils gpu template-metaprogramming
Last synced: 01 Nov 2024
https://github.com/chenhunghan/ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
ai cloudnative cuda ggml gptq gpu helm kubernetes langchain llamacpp llm llm-inference llm-serving openai python
Last synced: 20 Jan 2025
https://github.com/BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
cuda deep-learning machine-learning openai openai-triton pytorch triton
Last synced: 21 Dec 2024
https://github.com/merzlab/QUICK
QUICK: A GPU-enabled ab intio quantum chemistry software package
chemistry computational-chemistry cuda density-functional-theory electronic-structure-calculations gpu gpu-acceleration hartree-fock parallel-computing quantum-chemistry
Last synced: 20 Nov 2024
https://github.com/bobmcdear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
cuda deep-learning machine-learning openai openai-triton pytorch triton
Last synced: 20 Dec 2024
https://github.com/kibae/onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime
Last synced: 27 Jan 2025
https://github.com/rust-nvml/nvml-wrapper
Safe Rust wrapper for the NVIDIA Management Library
cuda ffi ffi-bindings ffi-wrapper gpu hardware-management hardware-monitoring library monitoring nvidia nvml opencl
Last synced: 27 Jan 2025
https://github.com/cldfire/nvml-wrapper
Safe Rust wrapper for the NVIDIA Management Library
cuda ffi ffi-bindings ffi-wrapper gpu hardware-management hardware-monitoring library monitoring nvidia nvml opencl
Last synced: 13 Jan 2025