Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with model-parallelism
A curated list of projects in awesome lists tagged with model-parallelism .
https://github.com/hpcaitech/colossalai
Making large AI models cheaper, faster and more accessible
ai big-model data-parallelism deep-learning distributed-computing foundation-models heterogeneous-training hpc inference large-scale model-parallelism pipeline-parallelism
Last synced: 09 Nov 2024
https://github.com/hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
ai big-model data-parallelism deep-learning distributed-computing foundation-models heterogeneous-training hpc inference large-scale model-parallelism pipeline-parallelism
Last synced: 27 Oct 2024
https://github.com/microsoft/deepspeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billion-parameters compression data-parallelism deep-learning gpu inference machine-learning mixture-of-experts model-parallelism pipeline-parallelism pytorch trillion-parameters zero
Last synced: 17 Dec 2024
https://github.com/microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billion-parameters compression data-parallelism deep-learning gpu inference machine-learning mixture-of-experts model-parallelism pipeline-parallelism pytorch trillion-parameters zero
Last synced: 28 Oct 2024
https://github.com/kakaobrain/torchgpipe
A GPipe implementation in PyTorch
checkpointing deep-learning gpipe model-parallelism parallelism pipeline-parallelism pytorch
Last synced: 15 Dec 2024
https://github.com/paddlepaddle/paddlefleetx
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
benchmark cloud data-parallelism distributed-algorithm elastic fleet-api large-scale lightning model-parallelism paddlecloud paddlepaddle pipeline-parallelism pretraining self-supervised-learning unsupervised-learning
Last synced: 18 Dec 2024
https://github.com/oneflow-inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
data-parallelism deep-learning distributed-training large-scale model-parallelism nlp oneflow pipeline-parallelism self-supervised-learning transformer vision-transformer
Last synced: 15 Dec 2024
https://github.com/Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
data-parallelism deep-learning distributed-training large-scale model-parallelism nlp oneflow pipeline-parallelism self-supervised-learning transformer vision-transformer
Last synced: 16 Nov 2024
https://github.com/alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
data-parallelism deep-learning distributed-training gpu memory-efficient model-parallelism pipeline-parallelism
Last synced: 05 Nov 2024
https://github.com/xrsrke/pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
3d-parallelism data-parallelism distributed-optimizers huggingface-transformers large-scale-language-modeling megatron megatron-lm mixture-of-experts model-parallelism moe pipeline-parallelism sequence-parallelism tensor-parallelism transformers zero-1
Last synced: 20 Nov 2024
https://github.com/tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
differential-privacy diffusion-models distributed-training fedavg federated-learning flan-t5-xxl gemma image-captioning jax large-language-models llama maml meta-learning mixed-precision mlsys model-parallelism ppo reinforcement-learning seq2seq stable-diffusion
Last synced: 15 Dec 2024
https://github.com/hkproj/pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model
collective-communication data-parallelism deep-learning distributed-data-parallel distributed-training gradient-accumulation machine-learning model-parallelism pytorch tutorial
Last synced: 17 Nov 2024
https://github.com/ryantd/veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
data-parallelism deep-learning distributed distributed-computing heterogeneity model-parallelism parameter-server pytorch ray sparsity
Last synced: 14 Nov 2024
https://github.com/shenggan/atp
Adaptive Tensor Parallelism for Foundation Models
attention distributed-training gpt large-model model-parallelism pytorch transformer
Last synced: 18 Dec 2024
https://github.com/ler0ever/hpgo
Development of Project HPGO | Hybrid Parallelism Global Orchestration
data-parallelism distributed-training gpipe machine-learning model-parallelism pipedream pipeline-parallelism pytorch rust tensorflow
Last synced: 29 Oct 2024
https://github.com/anvesham/enhancing-performance-of-big-data-machine-learning-models-on-google-cloud-platform
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
cache data-parallelism dataproc-clusters google-cloud-ai-platform google-cloud-platform google-colaboratory keras-tensorflow ml model-parallelism pyspark rdd
Last synced: 15 Nov 2024
https://github.com/d4l3k/axe
A simple graph partitioning algorithm written in Go. Designed for use for partitioning neural networks across multiple devices which has an added cost when crossing device boundaries.
graph-partitioning machine-learning model-parallelism
Last synced: 15 Dec 2024
https://github.com/olk/mnist-performance
performance test of MNIST hand writings usign MXNet + TF
classification gluon horovod keras mirrored-strategy mnist model-parallelism multi-gpu multi-gpu-training mxnet python tensorflow
Last synced: 04 Dec 2024