https://github.com/habibslim/distdnns

Speeding up DNN training for image classification, with OpenMPI
https://github.com/habibslim/distdnns

computer-vision distributed-learning machine-learning openmpi

Last synced: 4 months ago
JSON representation

Speeding up DNN training for image classification, with OpenMPI

Host: GitHub
URL: https://github.com/habibslim/distdnns
Owner: HabibSlim
Created: 2021-01-03T14:29:11.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-10-27T09:18:22.000Z (over 3 years ago)
Last Synced: 2025-01-14T14:45:51.191Z (6 months ago)
Topics: computer-vision, distributed-learning, machine-learning, openmpi
Language: C++
Homepage:
Size: 73.7 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        




    Speeding up DNN training in distributed environments



[![C++](https://img.shields.io/badge/C++-17-red?logo=c%2B%2B&logoColor=white)](https://en.wikipedia.org/wiki/C%2B%2B17)

[![Eigen](https://img.shields.io/badge/Eigen-3.0-brown?logo=Eigen&logoColor=white)](http://eigen.tuxfamily.org/)

[![OpenMPI](https://img.shields.io/badge/OpenMPI-2.1.1-blue?logo=Eigen&logoColor=white)](https://www.open-mpi.org/)

\[[Report](./report.pdf)\]



     

 



## Summary

* [Introduction](#introduction)

* [Running](#running)

* [References](#references)

## Introduction

This repository contains a minimal framework to quickly prototype deep architectures and facilitate weight and gradients sharing among processing nodes.

We introduce and implement parallel DNN optimization algorithms and conduct a (hopefully complete) benchmark of the different methods, evaluated on the MNIST and Fashion-MNIST datasets.

For a full description of the project, you can check out the [project report](./report.pdf)!

## Running

This project uses the following dependencies:

- Eigen3, for basic matrix operations (also included in the repository)

- OpenMPI 2.1.1

The list of experiments available is as follows:

- `param_avg`: Weight averaging algorithm described in the report.

- `parallel_sgd`: Gradient averaging algorithm described in the report.

- `w_param_avg`: Weighted parameter averaging algorithm described in the report.

To compile and run the experiments, from the root directory:

```bash

make experiment_name

```

And then, for an MPI experiment:

```bash

mpiexec -n n_cores runmpi -options

```

Parameters are as follows. For all experiments, the following arguments are available:

- `-batch_size`: Size of each batch

- `-eval_acc`  : To be set to `1` if validation accuracies must be evaluated, `0` otherwise (in which case epoch durations are logged instead).

- `-n_epochs`  : Total number of epochs

Specifically to the following methods, additional parameters are available

**param_avg, w_param_avg**

- `-avg_freq`: Weight averaging frequency (in epochs).

**w_param_avg**

- `-lambda`: Value of the lambda parameter (integer, divided by 100).

## References

1. [Ben-Nun et al., 2018] Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

2. [Ericson et al., 2017] On the performance of network parallel training in artificial neural networks

3. [Han Xiao et al., 2017] Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/habibslim/distdnns

Awesome Lists containing this project

README

Speeding up DNN training in distributed environments