Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/dr-noob/gpufetch

Simple yet fancy GPU architecture fetching tool

cuda gpu igpu intel nvidia

Last synced: 31 Oct 2024

https://github.com/fangq/mcx

Monte Carlo eXtreme (MCX) - GPU-accelerated photon transport simulator

3d c cuda matlab monte-carlo optical-imaging pascal photon-transport physics-simulation ray-tracing volumetric-rendering voxel-based

Last synced: 25 Jan 2025

https://github.com/dizcza/docker-hashcat

Latest hashcat docker for CUDA, OpenCL, and POCL. Deployed on Vast.ai

cuda docker hashcat nvidia opencl pocl vast-ai

Last synced: 02 Nov 2024

https://github.com/fhamborg/newsmtsc

Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.

cuda dataset deep-learning news-articles pytorch sentiment-analysis sentiment-classification text-classification tsc

Last synced: 29 Jan 2025

https://github.com/roastduck/FreeTensor

A language and compiler for irregular tensor programs.

ast automatic-differentiation code-generation cuda gpu jit openmp tensor

Last synced: 07 Nov 2024

https://github.com/AnicetNgrt/jiro-nn

A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.

adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd

Last synced: 16 Jan 2025

https://github.com/rocm/hipblas

ROCm BLAS marshalling library

blas cuda hip rocm

Last synced: 29 Jan 2025

https://github.com/anicetngrt/jiro-nn

A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.

adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd

Last synced: 30 Jan 2025

https://github.com/rsnk96/Ubuntu-Setup-Scripts

Scripts to help you set up your Ubuntu quickly, especially if you're in any subfield of Data Science or AI!

anaconda cuda deep-learning deeplearning dl ffmpeg installers ml opencv python pytorch tensorflow tensorflow-setup ubuntu zsh

Last synced: 06 Nov 2024

https://github.com/GooFit/GooFit

Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP

cuda fitting gpu gpu-computing omp physics root-cern thrust

Last synced: 06 Nov 2024

https://github.com/JulianAssmann/opencv-cuda-docker

Dockerfiles for OpenCV compiled with CUDA, opencv_contrib modules and Python 3 bindings

cuda docker gpu nvidia opencv

Last synced: 05 Nov 2024

https://github.com/naeioi/pbf-cuda

Position Based Fluids CUDA implementation

cuda fluid-solver opengl real-time simulation

Last synced: 11 Nov 2024

https://github.com/ROCm/hipBLAS

ROCm BLAS marshalling library

blas cuda hip rocm

Last synced: 30 Nov 2024

https://github.com/inoryy/tensorflow-optimized-wheels

TensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA

avx2 cuda cudnn python sse tensorflow tensorflow-gpu tensorflow-wheels wheels xla

Last synced: 03 Nov 2024

https://github.com/ihhub/penguinv

Computer vision library with focus on heterogeneous systems

avx computer-vision cpp cuda gpu hacktoberfest heterogeneous-systems image-processing opencl python simd sse thread-pool

Last synced: 24 Jan 2025

https://github.com/MuGdxy/muda

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

cuda cuda-cpp cuda-programming

Last synced: 20 Nov 2024

https://github.com/sniklaus/pytorch-extension

an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors

cuda cupy deep-learning python pytorch

Last synced: 14 Nov 2024

https://github.com/glotzerlab/fresnel

Publication quality path tracing in real time.

cuda optix path-tracing python simulation soft-matter

Last synced: 27 Jan 2025

https://github.com/src-d/minhashcuda

Weighted MinHash implementation on CUDA (multi-gpu).

cuda lsh machine-learning minhash

Last synced: 29 Jan 2025

https://github.com/gpmueller/eigen-cuda

MWE for using the Eigen library in CUDA kernels

cuda eigen eigen-cuda mwe

Last synced: 01 Nov 2024

https://github.com/cgtuebingen/Flex-Convolution

Source code for: Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds), accepted at ACCV 2018

3d-point-clouds cuda deep-learning deep-neural-networks machine-learning pointcloud pointcloudprocessing research research-paper segmentation shapenet tensorflow

Last synced: 27 Oct 2024

https://github.com/pytorch/extension-script

Example repository for custom C++/CUDA operators for TorchScript

cpp cuda pytorch

Last synced: 19 Jan 2025

https://github.com/qdLMF/LIO-SAM-GPU-ScanToMapOpt

A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.

3d-mapping cuda faster-lio gpu ivox knn lidar lidar-inertial-odometry lidar-slam lio lio-sam loam slam

Last synced: 27 Oct 2024

https://github.com/rocm/rocrand

RAND library for HIP programming language

cuda gpu hip random rng rocm

Last synced: 27 Jan 2025

https://github.com/ROCm/rocRAND

RAND library for HIP programming language

cuda gpu hip random rng rocm

Last synced: 18 Dec 2024

https://github.com/arbor-sim/arbor

The Arbor multi-compartment neural network simulation library.

cuda gpu hip hpc modern-cpp mpi neuroscience

Last synced: 25 Jan 2025

https://github.com/cair/tmu

Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-vs-one multi-class classifier. TMU is written in Python with wrappers for C and CUDA-based clause evaluation and updating.

absorbing-states autoencoder convolution cuda gpu incremental incremental-computation multi-output pattern-recognition propositional-logic regression relational-logic sparse tsetlin-machine

Last synced: 30 Jan 2025

https://github.com/acdslab/mppi-generic

Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)

cpp cuda model-predictive-control model-predictive-path-integral robotics stochastic-optimization

Last synced: 29 Jan 2025

https://github.com/leimao/CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

cuda

Last synced: 06 Dec 2024

https://github.com/upul/aurora

Minimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.

cplusplus cuda cudnn cython deep-learning python3 system-design

Last synced: 18 Nov 2024

https://github.com/cryinkfly/rhinoceros-3d-for-linux

This is a project, where I give you a way to use Rhino 3D on Linux!

archlinux cuda fedora international linux linuxmint manjaro nvidia opengl opensuse ubuntu wine

Last synced: 29 Jan 2025

https://github.com/ANNetGPGPU/ANNetGPGPU

A GPU (CUDA) based Artificial Neural Network library

c-plus-plus-11 cuda propagation-network self-organizing-map

Last synced: 10 Nov 2024

https://github.com/nvidia-merlin/hierarchicalkv

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.

cuda dynamic-embedding embedding-storage gpu hashtable key-value-store recommender-system

Last synced: 24 Nov 2024

https://github.com/mitmul/pynvvl

A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python

cuda cupy gpu numpy nvidia-video-loader nvvl python video video-processing

Last synced: 25 Jan 2025

https://github.com/pennylaneai/pennylane-lightning

The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.

cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm

Last synced: 25 Jan 2025

https://github.com/coreylowman/llama-dfdx

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

cuda deep-learning inference language-model llama neural-network rust rust-lang

Last synced: 07 Nov 2024

https://github.com/osai-ai/dokai

Collection of Docker images for ML/DL and video processing projects

cuda deep-learning docker docker-image ffmpeg opencv python pytorch tensorrt video-processing

Last synced: 05 Nov 2024

https://github.com/harrism/mini-nbody

A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.

astrophysics benchmark cuda nbody

Last synced: 28 Oct 2024

https://github.com/juliagpu/acceleratedkernels.jl

Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.

amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library

Last synced: 29 Jan 2025

https://github.com/mseitzer/gpu-monitor

Script to remotely check GPU servers for free GPUs

cuda cudnn deep-learning gpu nvidia-smi remote ssh

Last synced: 14 Oct 2024

https://github.com/sonots/cumo

Cumo (pronounced like "koomo") is CUDA aware numerical library whose interface is highly compatible with Ruby Numo

cuda numo ruby scicentific-computing

Last synced: 24 Jan 2025

https://github.com/1ytic/pytorch-edit-distance

Levenshtein edit-distance on PyTorch and CUDA

asr cuda edit-distance levenshtein nlp pytorch

Last synced: 30 Jan 2025

https://github.com/sasagawa888/deeppipe2

Deep Learning library using GPU(CUDA/cuBLAS)

cublas cuda deep-learning elixir gpu

Last synced: 26 Oct 2024

https://github.com/windqaq/mpm

Simulating on GPU using Material Point Method and rendering.

cuda gvdb-voxels material-point-method raytracer simulation thrust

Last synced: 28 Oct 2024

https://github.com/braintwister/docker-devel-env

Fast, reproducible, and portable software development environments

clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode

Last synced: 09 Oct 2024

https://github.com/nvidia/nvimagecodec

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

computer-vision cpp cuda dali data-processing deep-learning fast-data-pipeline gpu image-processing machine-learning nvidia python pytorch

Last synced: 29 Jan 2025

https://github.com/PennyLaneAI/pennylane-lightning

The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane

cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm

Last synced: 17 Nov 2024

https://github.com/BrainTwister/docker-devel-env

Fast, reproducible, and portable software development environments

clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode

Last synced: 09 Dec 2024

https://github.com/pwhiddy/fat-clouds

GPU Fluid Simulation with Volumetric Rendering

blaze cuda fluid gpu navier-stokes raymarching smoke

Last synced: 26 Nov 2024

https://github.com/eomii/rules_ll

An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming

bazel bleeding-edge build-system clang clang-tidy cpp cuda gpu-programming hermetic hip llvm nix openmp remote-caching remote-execution reproducible sanitizers

Last synced: 28 Jan 2025

https://github.com/sunsetquest/cudapad

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

cuda cuda-programming gpu nvidia ptx ptx-utils windows

Last synced: 01 Dec 2024

https://github.com/mitsuba-renderer/drjit-core

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering (core library)

cuda jit llvm

Last synced: 29 Jan 2025

https://github.com/shinmorino/sqaod

Solvers/annealers for simulated quantum annealing on CPU and CUDA(NVIDIA GPU).

accelearated cplusplus-11 cuda gpu linux monte-carlo-simulation nvidia-gpu python quantum-annealing quantum-computing windows

Last synced: 20 Dec 2024

https://github.com/kozyilmaz/nheqminer-macos

nheqminer for macOS with AVX and CUDA

apple cuda gpu-miner macos nheqminer osx zcash

Last synced: 28 Oct 2024

https://github.com/meiqua/pose_refine

cuda icp for 6D pose estimation

cuda icp renderer

Last synced: 15 Oct 2024

https://github.com/llnl/aluminum

High-performance, GPU-aware communication library

cpp cuda gpu hpc mpi

Last synced: 26 Jan 2025

https://github.com/spcl/daceml

A Data-Centric Compiler for Machine Learning

compiler cuda deep-learning fpga high-performance-computing machine-learning pytorch

Last synced: 06 Nov 2024

https://github.com/amypad/cuvec

Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory

array buffer c cpp cpu cpython cpython-api cpython-extensions cuda cxx gpu hacktoberfest pybind11 python swig vector

Last synced: 25 Jan 2025

https://github.com/ashvardanian/parallelreductionsbenchmark

Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!

apple avx512 cuda glsl gpgpu gpu gpu-acceleration gpu-computing hpc intel metal nvidia opencl openmp parallel simd stl tbb thrust

Last synced: 30 Jan 2025

https://github.com/openmm/NNPOps

High-performance operations for neural network potentials

cuda gpu machine-learning molecular-dynamics molecular-modeling

Last synced: 13 Nov 2024

https://github.com/juliagpu/gemmkernels.jl

Flexible and performant GEMM kernels in Julia

cuda gpu julia

Last synced: 28 Jan 2025

https://github.com/zenustech/zpc

zenus parallel computing library for zenus physics-based simulations

cuda gpu hpc math physics simulation

Last synced: 30 Jan 2025

https://github.com/rocm/rocalution

Next generation library for iterative sparse solvers for ROCm platform

cplusplus cuda fortran mpi opencl openmp solver sparse

Last synced: 25 Jan 2025

https://github.com/JuliaAttic/CUDArt.jl

Julia wrapper for CUDA runtime API

cuda gpu julia

Last synced: 29 Nov 2024

https://github.com/JuliaGPU/GemmKernels.jl

Flexible and performant GEMM kernels in Julia

cuda gpu julia

Last synced: 13 Nov 2024

https://github.com/rxwei/cuda-swift

Parallel Computing Library for Linux and macOS & NVIDIA CUDA Wrapper

cublas cuda gpu parallel swift

Last synced: 11 Nov 2024

https://github.com/shi-yan/FreeWill

A deep learning library in C++/CUDA

cnn cuda deep-learning dnn machine-learning neural-network qt5

Last synced: 20 Nov 2024

https://github.com/devxt-llc/ezlocalai

ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.

ai artificial-intelligence cuda llamacpp local

Last synced: 24 Jan 2025

https://github.com/ttsiodras/mandelbrotsse

Real-time Mandelbrot zoom via SSE, AVX, OpenMP, CUDA, XaoS...

avx cuda openmp sse

Last synced: 21 Dec 2024

https://github.com/kaslanarian/pydynet

NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)

autograd cnn cuda cupy deep-learning-framework numpy python pytorch pytorch-implementation rnn transformer

Last synced: 29 Dec 2024

https://github.com/projectchrono/dem-engine

A dual-GPU DEM solver with complex grain geometry support

chrono cuda discrete-element-method gpu multi-gpu simulation

Last synced: 30 Jan 2025

https://github.com/dbraun/pytorchtop

GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV

cuda libtorch opencv pytorch touchdesigner

Last synced: 08 Nov 2024

https://github.com/anicusan/acceleratedkernels.jl

Cross-architecture parallel algorithms for Julia's GPU backends, from a unified KernelAbstractions.jl codebase. Targets Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.

amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library

Last synced: 27 Oct 2024

https://github.com/tmcdonell/cuda

Haskell FFI bindings to CUDA

cuda ffi-bindings haskell

Last synced: 28 Jan 2025

https://github.com/nolanzzz/mtmct

Design and Implementation of a Multi-Target Multi-Camera Tracking Solution

cuda deep-learning detection machine-learning opencv python pytorch reidentification research-project resnet tracking

Last synced: 28 Oct 2024

https://github.com/DevXT-LLC/ezlocalai

ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.

ai artificial-intelligence cuda llamacpp local

Last synced: 13 Oct 2024

https://github.com/una-dinosauria/local-search-quantization

State-of-the-art method for large-scale ANN search as of Oct 2016. Presented at ECCV 16.

computer-vision cuda eccv-16 gpu julia multi-codebook quantization

Last synced: 29 Oct 2024

https://github.com/rocm/hipfort

Fortran interfaces for ROCm libraries

blas cuda fft fortran gpgpu gpu hip interoperability random rocm solver sparse

Last synced: 26 Jan 2025

https://github.com/heethesh/computer-vision-and-deep-learning-setup

Tutorial on how to setup your system with a NVIDIA GPU and to install Deep Learning Frameworks like TensorFlow, Darknet for YOLO, Theano, and Keras; OpenCV; and NVIDIA drivers, CUDA, and cuDNN libraries on Ubuntu 16.04, 17.10 and 18.04.

cuda cudnn gpu install keras nvidia opencv python tensorflow ubuntu ubuntu1710

Last synced: 25 Dec 2024

https://github.com/jamjamjon/usls

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.

clip cuda dinov2 florence2 grounding-dino ml ocr onnx onnxruntime rust rust-yolo sam sapiens tensorrt yolo yolo-rs yolo-rust yolov11 yolov8

Last synced: 26 Jan 2025

https://github.com/yalue/cuda_scheduling_examiner_mirror

A tool for examining GPU scheduling behavior.

benchmark cuda cuda-kernels gpu gpu-scheduling mandelbrot

Last synced: 29 Jan 2025

https://github.com/cuihaoleo/gpg-fingerprint-filter-gpu

Generate OpenPGP keys with fingerprints that match a specific pattern (a.k.a. vanity keys)

cuda gnupg gpg gpu pgp vanity

Last synced: 28 Nov 2024

https://github.com/celeritas-project/celeritas

Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.

computational-physics cuda detector-simulation gpu hep high-energy-physics hip monte-carlo particle-transport

Last synced: 27 Jan 2025

https://github.com/BobMcDear/neural-network-cuda

Neural network from scratch in CUDA/C++

cplusplus cuda deep-learning machine-learning neural-network

Last synced: 21 Dec 2024

https://github.com/pypr/compyle

Execute a subset of Python on HPC platforms

cuda cython high-performance-computing opencl openmp python transpile

Last synced: 25 Jan 2025

https://github.com/ar-ray-code/darknet_ros_fp16

darknet + ROS2 Humble + OpenCV4 + CUDA 11(cuDNN, Jetson Orin)

cuda cudnn darknet object-detection opencv4 ros ros2-foxy yolo yolo-tiny yolov3 yolov7

Last synced: 03 Dec 2024

https://github.com/harrism/sublimetext-cuda-cpp

CUDA C++ package for Sublime Text 2 & 3

cuda snippets sublime-text tmlanguage

Last synced: 28 Oct 2024

https://github.com/openmlsys/openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

cuda gpu machine-learning

Last synced: 16 Nov 2024