Projects in Awesome Lists tagged with data-parallelism
A curated list of projects in awesome lists tagged with data-parallelism .
https://github.com/hpcaitech/colossalai
Making large AI models cheaper, faster and more accessible
ai big-model data-parallelism deep-learning distributed-computing foundation-models heterogeneous-training hpc inference large-scale model-parallelism pipeline-parallelism
Last synced: 09 Sep 2025
https://github.com/hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
ai big-model data-parallelism deep-learning distributed-computing foundation-models heterogeneous-training hpc inference large-scale model-parallelism pipeline-parallelism
Last synced: 19 Mar 2025
https://github.com/deepspeedai/deepspeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billion-parameters compression data-parallelism deep-learning gpu inference machine-learning mixture-of-experts model-parallelism pipeline-parallelism pytorch trillion-parameters zero
Last synced: 15 Jan 2026
https://github.com/microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billion-parameters compression data-parallelism deep-learning gpu inference machine-learning mixture-of-experts model-parallelism pipeline-parallelism pytorch trillion-parameters zero
Last synced: 02 Apr 2025
https://github.com/deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billion-parameters compression data-parallelism deep-learning gpu inference machine-learning mixture-of-experts model-parallelism pipeline-parallelism pytorch trillion-parameters zero
Last synced: 19 Oct 2025
https://github.com/cerndb/dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow
Last synced: 03 Oct 2025
https://github.com/mratsim/weave
A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
data-parallelism fork-join message-passing multithreading openmp parallelism runtime scheduler task-parallelism task-scheduler threadpool work-stealing
Last synced: 05 Apr 2025
https://github.com/paddlepaddle/paddlefleetx
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
benchmark cloud data-parallelism distributed-algorithm elastic fleet-api large-scale lightning model-parallelism paddlecloud paddlepaddle pipeline-parallelism pretraining self-supervised-learning unsupervised-learning
Last synced: 13 Apr 2025
https://github.com/Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
data-parallelism deep-learning distributed-training large-scale model-parallelism nlp oneflow pipeline-parallelism self-supervised-learning transformer vision-transformer
Last synced: 09 May 2025
https://github.com/oneflow-inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
data-parallelism deep-learning distributed-training large-scale model-parallelism nlp oneflow pipeline-parallelism self-supervised-learning transformer vision-transformer
Last synced: 08 Apr 2025
https://github.com/alibaba/easyparallellibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
data-parallelism deep-learning distributed-training gpu memory-efficient model-parallelism pipeline-parallelism
Last synced: 14 Oct 2025
https://github.com/alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
data-parallelism deep-learning distributed-training gpu memory-efficient model-parallelism pipeline-parallelism
Last synced: 04 Apr 2025
https://github.com/dkeras-project/dkeras
Distributed Keras Engine, Make Keras faster with only one line of code.
data-parallelism deep-learning deep-neural-networks distributed distributed-deep-learning distributed-keras-engine distributed-systems keras keras-classification-models keras-models keras-neural-networks keras-tensorflow machine-learning neural-network parallel-computing plaidml python ray tensorflow tensorflow-models
Last synced: 02 May 2025
https://github.com/vertexclique/orkhon
Orkhon: ML Inference Framework and Server Runtime
async data-parallelism inference-server machine-learning multiprocessing python3 tensorflow
Last synced: 07 Apr 2025
https://github.com/xrsrke/pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
3d-parallelism data-parallelism distributed-optimizers huggingface-transformers large-scale-language-modeling megatron megatron-lm mixture-of-experts model-parallelism moe pipeline-parallelism sequence-parallelism tensor-parallelism transformers zero-1
Last synced: 10 Jul 2025
https://github.com/hkproj/pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
collective-communication data-parallelism deep-learning distributed-data-parallel distributed-training gradient-accumulation machine-learning model-parallelism pytorch tutorial
Last synced: 06 May 2025
https://github.com/kuixu/keras_multi_gpu
Multi-GPU training for Keras
data-parallelism keras multi-gpu
Last synced: 29 Oct 2025
https://github.com/ryantd/veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
data-parallelism deep-learning distributed distributed-computing heterogeneity model-parallelism parameter-server pytorch ray sparsity
Last synced: 10 Apr 2025
https://github.com/tcoppex/cpu-gbfilter
:hotsprings: Optimized Gaussian blur filter on CPU.
blur bmp-image cache-efficiency data-parallelism gaussian-blur image-processing multithreaded openmp
Last synced: 11 Aug 2025
https://github.com/daekeun-ml/sm-distributed-training-step-by-step
This repository provides hands-on labs on PyTorch-based Distributed Training and SageMaker Distributed Training. It is written to make it easy for beginners to get started, and guides you through step-by-step modifications to the code based on the most basic BERT use cases.
data-parallelism distributed-training pytorch-ddp sagemaker
Last synced: 12 Oct 2025
https://github.com/ler0ever/hpgo
Development of Project HPGO | Hybrid Parallelism Global Orchestration
data-parallelism distributed-training gpipe machine-learning model-parallelism pipedream pipeline-parallelism pytorch rust tensorflow
Last synced: 15 Jul 2025
https://github.com/murrellgroup/conflux.jl
Single-node data parallelism in Julia with CUDA
cuda data-parallelism flux julia nccl
Last synced: 15 Mar 2025
https://github.com/anvesham/enhancing-performance-of-big-data-machine-learning-models-on-google-cloud-platform
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
cache data-parallelism dataproc-clusters google-cloud-ai-platform google-cloud-platform google-colaboratory keras-tensorflow ml model-parallelism pyspark rdd
Last synced: 05 Mar 2025
https://github.com/nicholaswmin/multi-leven
The Levenshtein edit-distance algorithm, in Javascript, parallelised across workers [WIP]
concurrency data-parallelism fuzzy-search levenshtein-distance parallelism
Last synced: 06 Apr 2025
https://github.com/thomas-bouvier/distributed-continual-learning
Towards Rehearsal-based Continual Learning at Scale: distributed CL using Horovod + PyTorch on up to 128 GPUs
continual-learning data-parallelism deep-learning experience-replay hpc ptychography rehearsal
Last synced: 06 Oct 2025
https://github.com/teambipartite/bipartite-gemm
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Last synced: 07 Feb 2026
https://github.com/TeamBipartite/bipartite-gemm
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Last synced: 14 Jan 2026
https://github.com/dudeperf3ct/llm-parallelism-pytorch
Implementing various LLM training parallelism strategies for fun!
data-parallelism llm-parallelism pytorch
Last synced: 18 Jan 2026
https://github.com/nikhilr612/safire
A small library for simulated annealing using arrayfire.
annealing arrayfire data-parallelism optimization-algorithms rust
Last synced: 20 Mar 2025