https://github.com/zakuro-ai/sakura

Sakura is the ML library of the Zakuro framework. It provides asynchronous distributed training for Pytorch.
https://github.com/zakuro-ai/sakura

asynchronous-programming deeplearning machine-learning ml p2p p2p-network python sakura sakura-ml zakuro zakuro-ai

Last synced: 24 days ago
JSON representation

Sakura is the ML library of the Zakuro framework. It provides asynchronous distributed training for Pytorch.

Host: GitHub
URL: https://github.com/zakuro-ai/sakura
Owner: zakuro-ai
Created: 2019-11-14T09:44:13.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2025-06-27T02:58:50.000Z (8 months ago)
Last Synced: 2025-06-27T03:48:25.551Z (8 months ago)
Topics: asynchronous-programming, deeplearning, machine-learning, ml, p2p, p2p-network, python, sakura, sakura-ml, zakuro, zakuro-ai
Language: Python
Homepage:
Size: 54.2 MB
Stars: 16
Watchers: 2
Forks: 2
Open Issues: 11
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          


  


  





  Modules •

  Code structure •

  Code design •

  Installing the application •

  Makefile commands •

  Environments •

  Running the application



--------------------------------------------------------------------------------

Sakura is a simple but powerfull tool to reduce training time by running the train/test asynchronously. It provides two features:

- A simple ML framework for asynchronous training.

- An integration with PyTorch. 

You can reuse your favorite Python framework such as Pytorch, Tensorflow or PaddlePaddle.

# Modules

At a granular level, Sakura is a library that consists of the following components:

| Component | Description |

| ---- | --- |

| **sakura** | Contains the sakura modules. |

| **sakura.ml** | Contains the code related to ml processing |

# Code structure

```python

from setuptools import setup

from sakura import __version__

setup(

    name="sakura-ml",

    version=__version__,

    short_description="Sakura provides asynchronous training for DNN.",

    long_description="Sakura provides asynchronous training for DNN.",

    url='https://zakuro.ai',

    packages=[

        "sakura",

        "sakura.lightning",

    ],

    include_package_data=True,

    package_data={"": ["*.yml"]},

    install_requires=[r.rsplit()[0] for r in open("requirements.txt")],

    license='MIT',

    author='ZakuroAI',

    python_requires='>=3.6',

    author_email='git@zakuro.ai',

    description='Sakura provides asynchronous training for DNN.',

    platforms="linux_debian_10_x86_64",

    classifiers=[

        "Programming Language :: Python :: 3",

        "License :: OSI Approved :: MIT License",

    ]

)

```

# Code design

If you worked with PyTorch in your project your would find a common structure. 

Simply change the `test` and `train` in your trainer as shown in `mnist_demo`. 

```python

import os

import lightning as L

import torch

from torch import nn

from torch.nn import functional as F

from torch.utils.data import DataLoader

from torchvision import transforms

from torchvision.datasets import MNIST

import argparse

from sakura.lightning import SakuraTrainer

class MNISTModel(L.LightningModule):

    def __init__(self):

        super(MNISTModel, self).__init__()

        self.conv1 = nn.Conv2d(1, 32, 3, 1)

        self.conv2 = nn.Conv2d(32, 64, 3, 1)

        self.dropout1 = nn.Dropout(0.25)

        self.dropout2 = nn.Dropout(0.5)

        self.fc1 = nn.Linear(9216, 128)

        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):

        x = self.conv1(x)

        x = F.relu(x)

        x = self.conv2(x)

        x = F.relu(x)

        x = F.max_pool2d(x, 2)

        x = self.dropout1(x)

        x = torch.flatten(x, 1)

        x = self.fc1(x)

        x = F.relu(x)

        x = self.dropout2(x)

        x = self.fc2(x)

        output = F.log_softmax(x, dim=1)

        return output

    def training_step(self, batch, batch_nb):

        x, y = batch

        loss = F.cross_entropy(self(x), y)

        return loss

    def validation_step(self, batch, batch_nb):

        with torch.no_grad():

            x, y = batch

            loss = F.cross_entropy(self(x), y)

        return loss

    def configure_optimizers(self):

        return torch.optim.Adam(self.parameters(), lr=0.02)

if __name__ == "__main__":

    PATH_DATASETS = os.environ.get("PATH_DATASETS", ".")

    BATCH_SIZE = 2000 if torch.cuda.is_available() else 64

    # Init our model

    mnist_model = MNISTModel()

    # Init DataLoader from MNIST Dataset

    train_ds = MNIST(

        PATH_DATASETS, train=True, download=True, transform=transforms.ToTensor()

    )

    train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE)

    # Init DataLoader from MNIST Dataset

    val_ds = MNIST(

        PATH_DATASETS, train=False, download=True, transform=transforms.ToTensor()

    )

    val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE)

    trainer = SakuraTrainer(

        accelerator="auto",

        max_epochs=10,

    )

    trainer.run(

        mnist_model, train_loader, val_loader, model_path="models/best_model.pth"

    )

```

# Installing the application

To clone and run this application, you'll need the following installed on your computer:

- [Git](https://git-scm.com)

- Docker Desktop

   - [Install Docker Desktop on Mac](https://docs.docker.com/docker-for-mac/install/)

   - [Install Docker Desktop on Windows](https://docs.docker.com/desktop/install/windows-install/)

   - [Install Docker Desktop on Linux](https://docs.docker.com/desktop/install/linux-install/)

- [Python](https://www.python.org/downloads/)

### Clone the code and install the binary

```bash

# Clone this repository and install the code

git clone https://github.com/zakuro-ai/sakura

# Go into the repository

cd sakura

# Update global variables

source .env

# Install sakura

curl https://get.zakuro.ai/sakura/install | sh

```

### Check that the binary has been downloaded

```bash

which sakura

```

# Running the application

```bash

sakura main.py

```

You should be able to see this output with no delay between epochs (asynchronous testing).

```

   _____           _                               __  __   _      

  / ____|         | |                             |  \/  | | |     

 | (___     __ _  | | __  _   _   _ __    __ _    | \  / | | |     

  \___ \   / _` | | |/ / | | | | | '__|  / _` |   | |\/| | | |     

  ____) | | (_| | |   <  | |_| | | |    | (_| |   | |  | | | |____ 

 |_____/   \__,_| |_|\_\  \__,_| |_|     \__,_|   |_|  |_| |______|

(0) MNIST | Epoch: 1/10 | Acc: 0.0000 / (0.0000) | Loss:0.0000 / (0.0000): 100%|██████████| 18/18 [00:06<00:00,  2.69it/s]

(1) MNIST | Epoch: 2/10 | Acc: 0.0000 / (0.0000) | Loss:0.0000 / (0.0000): 100%|██████████| 18/18 [00:05<00:00,  3.36it/s]

(2) MNIST | Epoch: 3/10 | Acc: 90.4600 / (90.4600) | Loss:0.4034 / (0.4034): 100%|██████████| 18/18 [00:05<00:00,  3.42it/s]

(3) MNIST | Epoch: 4/10 | Acc: 95.3246 / (95.3246) | Loss:0.1907 / (0.1907): 100%|██████████| 18/18 [00:05<00:00,  3.43it/s]

(4) MNIST | Epoch: 5/10 | Acc: 96.9332 / (96.9332) | Loss:0.1379 / (0.1379): 100%|██████████| 18/18 [00:05<00:00,  3.38it/s]

(5) MNIST | Epoch: 6/10 | Acc: 97.3693 / (97.3693) | Loss:0.1167 / (0.1167): 100%|██████████| 18/18 [00:05<00:00,  3.42it/s]

(6) MNIST | Epoch: 7/10 | Acc: 97.7237 / (97.7237) | Loss:0.1040 / (0.1040): 100%|██████████| 18/18 [00:05<00:00,  3.41it/s]

(7) MNIST | Epoch: 8/10 | Acc: 98.0172 / (98.0172) | Loss:0.0938 / (0.0938): 100%|██████████| 18/18 [00:05<00:00,  3.31it/s]

(8) MNIST | Epoch: 9/10 | Acc: 98.2402 / (98.2402) | Loss:0.0886 / (0.0886): 100%|██████████| 18/18 [00:05<00:00,  3.41it/s]

```

FYI the meaning of the above notation is:

```

([best_epoch]) [name_exp] | Epoch: [current]/[total] | Acc: [current_test_acc] / ([best_test_acc]) | Loss:[current_test_loss] / ([best_test_loss]): 100%|███| [batch_k]/[batch_n] [[time_train]<[time_left], [it/s]]

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zakuro-ai/sakura

Awesome Lists containing this project

README