Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/lablup/backend.ai-accelerator-cuda

The Backend.AI CUDA Accelerator Plugin

backendai cuda

Last synced: 03 Jan 2025

https://github.com/lord-turmoil/cudacmakedemo

A demo for building CUDA program with CMake

cuda tutorial

Last synced: 23 Jan 2025

https://github.com/dongskie43/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers

Last synced: 03 Feb 2025

https://github.com/matiasvlevi/cuno

Provides cuda bindings, kernel maps and device memory managment for Dannjs computations. [Experimental and not complete]

addon cuda dann dannjs machine-learning nodejs

Last synced: 11 Jan 2025

https://github.com/dreamjet31/licence_plate_detection

Automated License Plate recognition system

cuda opencv python pytorch ultralytics yolov8

Last synced: 10 Feb 2025

https://github.com/adesoji1/youtubesummaryai

Python script for YouTube summary. The service should summarize an YouTube video by url. It should works for long video and for different languages.

cuda googleapi python3 speech-recognition transformers youtube-api-v3 youtube-dl

Last synced: 10 Feb 2025

https://github.com/materight/pyav-cuda

Extension of PyAV (ffmpeg bindings) with hardware decoding support. Compatible with PyTorch and Nvidia codecs.

cuda cuvid ffmpeg libav pytorch

Last synced: 13 Oct 2024

https://github.com/hrolive/data-analytics-in-the-era-of-large-scale-machine-learning

Slides and other material for the Cyprus NCC training event about "Data analytics in the era of large-scale machine learning".

cuda deep-learning gpu-acceleration gradient-boosting large-language-models machine-learning preprocessing python pytorch

Last synced: 04 Jan 2025

https://github.com/rgryta/jetsonnano-pytorch

Repository containing built PyTorch wheels for Jetson Nano

cuda jetson python pytorch tegra wheel

Last synced: 11 Jan 2025

https://github.com/jonastoth/cuda_raytracer

University project to implement a basic Raytracer in CUDA

cpp14 cuda raytracer

Last synced: 02 Feb 2025

https://github.com/kenwuqianghao/c4ai-cuda-birds

Homework assignments for C4AI Beginners in Research-Driven Studies

cuda machine-learning pytorch

Last synced: 27 Dec 2024

https://github.com/d-krylov/cuda_to_opengl

Simple examples for CUDA OpenGL interoperability

cuda cuda-opengl opengl

Last synced: 11 Jan 2025

https://github.com/sebftw/interp2gpu

GPU-accelerated 2D spline interpolation, à la interp2(..., "spline"), in MATLAB.

cuda gpu gpu-acceleration matlab spline spline-interpolation

Last synced: 14 Dec 2024

https://github.com/ray-chew/modified_ch

Density functional theory (DFT) and self-consistent field theory (SCFT) simulation of diblock copolymers

cuda density-functional-theory diblock-copolymer numerical-analysis numerical-methods self-consistent-field-theory

Last synced: 11 Jan 2025

https://github.com/ivanbuccella/sf2bio

Deep reinforcement learning for de novo drug design: a ReLeaSe method execution on a Docker Environment

cuda deep-learning deep-reinforcement-learning docker docker-compose machine-learning nvidia-cuda nvidia-docker reinforcement-learning release release-method

Last synced: 11 Jan 2025

https://github.com/sandialabs/tenzing

Core library for optimizing CUDA+MPI programs as sequential decision problems.

cuda mpi scr-2759 sequential-decision-problem

Last synced: 11 Jan 2025

https://github.com/llm-db/understanding-gpu-architecture-implications-on-llm-serving-workloads

Understanding GPU Architecture Implications on LLM Serving Workloads (Master Thesis, ETH Zürich, 2024)

cuda inference pytorch rocm transformer

Last synced: 14 Dec 2024

https://github.com/proafxin/cuda-docker

High performance computing Images with pycuda and tensorrt preinstalled

cuda docker dockerfile libcudnn nvidia-tensorrt pycuda python tensorrt

Last synced: 12 Jan 2025

https://github.com/usman619/pdc

Parallel and Distributed Computing

cuda distributed-computing distributed-systems nextcloud

Last synced: 13 Jan 2025

https://github.com/neugence/acehub

AI Champions for Excellence: Fresh, informative courses and content designed to help developers, researchers, and leaders advance in the field of AI.

ai cuda cv ml mlops nlp pytorch rl rlhf tensorflow

Last synced: 13 Oct 2024

https://github.com/parxd/fasterdl

cuBLAS/CUDA tensor library with auto-diff support

cublas cuda cudnn deep-learning machine-learning

Last synced: 06 Jan 2025

https://github.com/hubenchang0515/fft-benchmark

一些 FFT 库的性能测试

cuda fft

Last synced: 18 Jan 2025

https://github.com/daskol/gpgpu

cuda gpgpu

Last synced: 12 Jan 2025

https://github.com/hr-fahim/transformer-model-optimization

Sample GPT Transformer Model from Scratch.

cuda few-shot-learning transfomers

Last synced: 24 Jan 2025

https://github.com/saiccoumar/cuda-programming-exercises

Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.

cuda cuda-programming nvcc nvidia

Last synced: 18 Jan 2025

https://github.com/deep-1704/coa-lab-repo

Computer Organization and Architecture lab assignments.

cuda

Last synced: 18 Jan 2025

https://github.com/deep-1704/coa_lab_repo_grp01

COA Lab assignments

cuda gpgpu-sim

Last synced: 18 Jan 2025

https://github.com/kanchishimono/python-images

Ubuntu based Python container images, including CUDA images

container-image cuda docker dockerfile machine-learning python python3

Last synced: 26 Jan 2025

https://github.com/parxd/cuda-optim

optimizing CUDA kernels

cuda machine-learning

Last synced: 30 Jan 2025

https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is

Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization

ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow

Last synced: 12 Oct 2024

https://github.com/xaionaro/cufft-grpc

Export cuFFT through gRPC

cmake cuda cufft fft fourier go golang gpu grpc transformation

Last synced: 18 Jan 2025

https://github.com/sferez/sspp_sparse_matrix_cuda

Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA

cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix

Last synced: 13 Jan 2025

https://github.com/betarixm/csed490c

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cuda gpu parallel-computing postech

Last synced: 19 Jan 2025

https://github.com/amitkumarj441/deep-learning-on-your-finger

A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:

cuda cudnn gcp

Last synced: 26 Jan 2025

https://github.com/dhruvsrikanth/fastconv

Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.

convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming

Last synced: 25 Dec 2024

https://github.com/illagrenan/cuda-80-cudnn6-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA Dockerfile

cuda dockerfile ubuntu

Last synced: 19 Jan 2025

https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile

cuda dockerfile python ubuntu

Last synced: 19 Jan 2025

https://github.com/larygwil/cuda-samples-old

nvidia cuda samples old (5.0 - 7.5)

cuda nvidia

Last synced: 02 Feb 2025

https://github.com/dwain-barnes/llm-gguf-auto-converter

Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization

Last synced: 31 Jan 2025

https://github.com/rssr25/cuda

Following Cuda By Example book.

cpp cuda cuda-programming hpc shaders

Last synced: 24 Dec 2024

https://github.com/phrutis/minikeys_for_sale

GPU program for brute MiniKeys Casascius Serie1 (22 characters)

bitcoin brute-force btc casascius cuda gpu minikeys program uncompressed

Last synced: 24 Jan 2025

https://github.com/storterald/neural-network

Simple neural network implementation in C++ and CUDA

asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network

Last synced: 02 Feb 2025

https://github.com/lixk28/knn-cuda

cuda knn

Last synced: 07 Feb 2025

https://github.com/abdelrahman-amen/active_learning_with_different_query_strategies

This project explores the implementation of active learning techniques, focusing on various query strategies to optimize the selection of informative data points for model training. It aims to reduce the amount of labeled data required while improving model performance, especially in scenarios with limited labeled data.

activelearning cuda entropy kldivergence margin numpy python pyto uncertainty

Last synced: 24 Jan 2025

https://github.com/sugarcane-mk/finetuning_wav2vec2

This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers

asr asr-model cuda facebook fairseq fine-tuning finetuning huggingface librosa python torch transformers wav2vec2 wav2vec2-large-960h

Last synced: 24 Jan 2025

https://github.com/marius311/cudadistributedtools.jl

A set of utility tools for multi-GPU + multi-process workflows

cuda distributed julia

Last synced: 07 Feb 2025

https://github.com/ramyacp14/document-based-question-and-answers

Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.

chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit

Last synced: 12 Oct 2024

https://github.com/bjornmelin/ml-production-engineering

⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯

cuda deployment docker fastapi gpu-computing kubernetes mlops production

Last synced: 25 Jan 2025

https://github.com/amypad/miutil

Basic functionality needed for AMYPAD

cuda matlab medical-imaging python

Last synced: 31 Oct 2024

https://github.com/bjornmelin/ml-vision-lab

👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸

computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics

Last synced: 25 Jan 2025

https://github.com/bjornmelin/nlp-engineering-hub

📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤

cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers

Last synced: 25 Jan 2025

https://github.com/mateuszk098/parallel-programming-examples

Simple parallel programming examples with CUDA, MPI and OpenMP.

cpp cuda mpi openmp parallel-programming

Last synced: 28 Dec 2024

https://github.com/neel-dandiwala/cuda-programs

Miscellaneous programs that grasp the concept of Parallel Computing

cuda gpu-programming parallel-programming

Last synced: 26 Dec 2024

https://github.com/corazon-code/pyloo

Python package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS) for Bayesian Modeling

bayes bayesian-data-analysis cross-validation cuda dump fuzzy-matching looker loot-table machine-learning minecraft model-comparison python spreadsheet tensorflow

Last synced: 09 Feb 2025

https://github.com/voltr0x/raytracing-cuda

Raytracing in a weekend using CUDA

cpp11 cuda raytracing sdl2

Last synced: 20 Jan 2025

https://github.com/jpodivin/gputomata

Cellular automata running on CUDA capable GPUs

cellular-automata cellular-automaton cuda

Last synced: 27 Dec 2024

https://github.com/bjornmelin/ml-algorithm-playground

🧪 Core ML algorithm implementations with GPU acceleration. Featuring optimized implementations across various libraries with comprehensive analysis. 📈

algorithms cuda gpu-computing lightgbm machine-learning python scikit-learn xgboost

Last synced: 25 Jan 2025

https://github.com/xstupi00/N-Body-CUDA

PCG - Parallel Computations on GPU - Project - N-Body-CUDA

cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit

Last synced: 23 Oct 2024

https://github.com/miferreiro/cdap-cuda

CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020

c cuda scan

Last synced: 27 Dec 2024

https://github.com/muneeb706/cuda

sample programs implemented using cuda (gpu)

cplusplus cuda gpu-programming

Last synced: 31 Jan 2025

https://github.com/jamesnulliu/learning-programming-massively-parallel-processors

Leaning notes of Programming Massively Parallel Processors, 4-th edition.

cuda notes pytorch

Last synced: 02 Feb 2025

https://github.com/xza85hrf/flag_prediction_project

This application predicts the name of a country (or countries) based on an input flag image. It uses advanced image processing techniques and deep learning models built with PyTorch to classify flags accurately.

cross-validation cuda data-augmentation docker efficientnetb0 flag-recognition image-classification machine-learning mixed-precision-training mobilenetv2 python pytorch resnet resnet-50 transfer-learning

Last synced: 31 Jan 2025

https://github.com/sangioai/torchpace

PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.

cuda pytorch transformer

Last synced: 02 Feb 2025

https://github.com/rajshrestha86/kmeans-clusterize-cuda

Implementation of K-Means algorithm from scratch using CUDA.

c cuda kmeans-clustering

Last synced: 07 Feb 2025

https://github.com/bjornmelin/edge-ai-engineering

📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖

cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite

Last synced: 02 Feb 2025

https://github.com/ne0nwinds/gpupuzzles

My solutions to srush/GPU-Puzzles using CUDA

cuda

Last synced: 02 Feb 2025

https://github.com/ysl1016/cudadigitfilter

CUDA-based parallel image filtering system for MNIST dataset

computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing

Last synced: 02 Feb 2025

https://github.com/bjornmelin/ai-system-design

🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️

architecture cuda distributed-systems engineering gpu-computing production scalability system-design

Last synced: 02 Feb 2025

https://github.com/jiriklepl/bits-knn-jpdc2024

Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search

bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k

Last synced: 26 Jan 2025

https://github.com/wiktor2718/matrix_flow

Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.

adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust

Last synced: 26 Jan 2025

https://github.com/lruizap/testcuda

Guide to install and use cuda for programming

cuda cudnn nvidia pytorch

Last synced: 02 Feb 2025

https://github.com/marcorentap/kokkos-docker-cluster

Deploy Docker containers with Kokkos, OpenMP, OpenMPI and CUDA as a Docker swarm.

cuda docker hpc kokkos

Last synced: 23 Oct 2024

https://github.com/amruthapatil/nyu-cudamatrixoperations

Optimizing CUDA programs for vector addition and matrix multiplication

cuda high-performance-computing

Last synced: 20 Jan 2025

https://github.com/amruthapatil/nyu-cudaconvolution

Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN

cuda high-performance

Last synced: 20 Jan 2025

https://github.com/0xhilsa/pycu

PyCu

cpp cuda nvcc python3

Last synced: 15 Feb 2025

https://github.com/trentonom0r3/raft-analysis

Simple analysis script 'demotest.py' using RAFT optical flow to get flow vectors, occlusion masks, and Information on keyframes with significant motion changes

cuda flow-maps occlusion-masks opticalflow python pytorch raft

Last synced: 08 Feb 2025

https://github.com/cppshizoids/cuda

This is my basic lessons of CUDA

cuda cuda-demo cuda-programming

Last synced: 14 Feb 2025

https://github.com/popke523/rybki

A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.

boids cuda opengl

Last synced: 08 Feb 2025

https://github.com/voschezang/holographic-projector-simulations

Optimizations of Simulations of Holographic Projectors using CUDA

cuda gpu holography parallel-computing photonics

Last synced: 05 Jan 2025

https://github.com/camille-004/cusprec

🏁 Sparse signal recovery library written in PyCUDA.

cuda ml python signal-processing sparse-recovery

Last synced: 19 Dec 2024

https://github.com/phantom7knight/cuda-fusion

This project is for learning CUDA to understand the GPU work better.

cuda cuda-programming gpgpu gpu

Last synced: 08 Feb 2025