Awesome-Distributed-Deep-Learning

A curated list of awesome Distributed Deep Learning resources.
https://github.com/bharathgs/Awesome-Distributed-Deep-Learning

Last synced: 1 day ago
JSON representation

Frameworks
- **[Contributing](#contributing)** -->
  - go-mxnet-predictor - Go binding for MXNet c_predict_api to do inference with pre-trained model.
  - deeplearning4j - Distributed Deep Learning Platform for Java, Clojure, Scala.
  - Elephas - Elephas is an extension of Keras, which allows you to run distributed deep learning models at scale with Spark.
  - Distributed Machine learning Tool Kit (DMTK) - A distributed machine learning (parameter server) framework by Microsoft. Enables training models on large data sets across multiple machines. Current tools bundled with it include: LightLDA and Distributed (Multisense) Word Embedding.
  - MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
  - Horovod - Distributed training framework for TensorFlow.
Blogs
- **[Contributing](#contributing)** -->
Books
- General:
  - Distributed Machine Learning Patterns - world scenarios and hands-on projects.
Model Consistency:
- Synchronization:
  - Model Accuracy and Runtime Tradeoff in Distributed Deep Learning
  - Deep learning with COTS HPC systems - The-Shelf High Performance Computing (COTS HPC) technology, a cluster of GPU servers with Infiniband interconnects and MPI.
  - SparkNet
  - 1-Bit SGD - Bit Stochastic Gradient Descent and Application to
  - Multi-GPU Training of ConvNets.
  - A Fast Learning Algorithm for Deep Belief Nets.
  - Heterogeneity-aware Distributed Parameter Servers. - aware Distributed Parameter Servers. In Proc. 2017 ACM International Conference on Management of Data (SIGMOD ’17). 463–478.
  - Staleness-Aware Async-SGD for Distributed Deep Learning - aware async-SGD for Distributed Deep Learning. In Proc. Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16). 2350–2356.
  - A Unified Analysis of HOGWILD!-style Algorithms. - style Algorithms. In Proc. 28th Int’l Conf. on NIPS - Volume 2. 2674–2682.
  - Asynchronous Parallel Stochastic Gradient Descent
  - GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training.
  - HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent - Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems 24. 693–701.
  - Asynchronous stochastic gradient descent for DNN training
  - GossipGraD
  - How to scale distributed deep learning
- Parameter Distribution and Communication:
  - Building High-level Features Using Large Scale Unsupervised Learning - level Features Using Large Scale Unsupervised Learning. In Proc. 29th Int’l Conf. on Machine Learning (ICML’12). 507–514.
  - Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data - supervised Classification for Scientific Data. In Proc. Int’l Conf. for High Performance Computing, Networking, Storage and Analysis (SC ’17). 7:1–7:11.
  - gaia - distributed Machine Learning Approaching LAN Speeds. In Proc. 14th USENIX Conf. on NSDI. 629–647.
  - FireCaffe - Linear Acceleration of Deep Neural Network Training on Compute Clusters. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  - DeepSpark - Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility. (2016).
  - Scaling Distributed Machine Learning with the Parameter Server
  - Large Scale Distributed Deep Networks - Volume 1 (NIPS’12). 1223–1231.
  - Poseidon - based Deep Learning on Multiple Machines. (2015). arXiv:1512.06216
Papers
- General:
  - Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Programming Languages

Python 3 C++ 1 Java 1 Go 1

Categories

Blogs 46 Model Consistency: 23 Frameworks 6 Books 1 Papers 1

Sub Categories

**[Contributing](#contributing)** --> 52 Synchronization: 15 Parameter Distribution and Communication: 8 General: 2

Keywords

deep-learning 3 spark 3 machine-learning 3 mxnet 3 keras 2 tensorflow 2 python 2 deeplearning 2 distributed-computing 1 scala 1 neural-nets 1 matrix-library 1 linear-algebra 1 java 1 intellij 1 hadoop 1 gpu 1 dl4j 1 deeplearning4j 1 clojure 1 artificial-intelligence 1 inference 1 golang 1 cgo 1 uber 1 ray 1 pytorch 1 mpi 1 machinelearning 1 baidu 1 mlops 1 manning-publications 1 machine-learning-pipelines 1 large-scale-machine-learning 1 kubernetes 1 kubeflow 1 distributed-systems 1 distributed-machine-learning 1 devops 1 data-science 1 cloud-native 1 cloud-computing 1 book 1 argo-workflows 1 argo 1 neural-networks 1