{"id":29565538,"url":"https://github.com/botkop/akkordeon","last_synced_at":"2025-07-18T21:34:41.935Z","repository":{"id":303881231,"uuid":"151669722","full_name":"botkop/akkordeon","owner":"botkop","description":"training neural networks with akka","archived":false,"fork":false,"pushed_at":"2020-01-01T16:23:30.000Z","size":245,"stargazers_count":56,"open_issues_count":0,"forks_count":5,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-07-10T08:36:19.063Z","etag":null,"topics":["actor-model","akka","deep-learning","neural-network","scala"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/botkop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-10-05T04:18:12.000Z","updated_at":"2025-07-02T14:14:56.000Z","dependencies_parsed_at":"2025-07-10T08:36:21.329Z","dependency_job_id":"92ab1af3-06a2-4331-b1c0-8b49901a730a","html_url":"https://github.com/botkop/akkordeon","commit_stats":null,"previous_names":["botkop/akkordeon"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/botkop/akkordeon","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botkop%2Fakkordeon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botkop%2Fakkordeon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botkop%2Fakkordeon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botkop%2Fakkordeon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/botkop","download_url":"https://codeload.github.com/botkop/akkordeon/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/botkop%2Fakkordeon/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265837696,"owners_count":23836558,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor-model","akka","deep-learning","neural-network","scala"],"created_at":"2025-07-18T21:34:41.291Z","updated_at":"2025-07-18T21:34:41.901Z","avatar_url":"https://github.com/botkop.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"\"What I cannot create, I do not understand.\" - Richard Feynman.\n\n\nThis project shows how to train an artificial neural network in an actor framework. Traditional neural networks are monolithic blobs trained on static hardware infrastructure. Here I propose an approach that distributes the components of a neural net (layers, data providers...) over multiple processes that run independently (async and concurrent), possibly on different machines. It also allows to dynamically add or remove training, validation and test modules, and thus provides the infrastructure for online learning.\n\n\n# Akkordeon: Training a neural net with Akka\n\nThe world is asynchronous. \n\nThis project shows how to train a neural net with Akka.\n\nThe mechanics are as follows:\n\nA layer is embedded in a [gate](#gate). A gate is an actor. \nThe results of the forward and backward pass are passed as messages from one gate to the next.\nCalculations inside a layer are performed asynchronously from other layers.\nThus, a layer does not have to wait for the backward pass in order to perform the forward pass of the next batch.\n\nEvery gate has its optimizer.\nOptimization on a gate runs asynchronously from other gates. \nTo alleviate the 'delayed gradient' problem, I use an implementation of the ['Asynchronous Stochastic Gradient Descent with Delay Compensation'](https://arxiv.org/abs/1609.08326) optimizer.\n\nData providers are embedded in [sentinels](#sentinel) and implemented as actors. You can have mutiple sentinels running at the same time, each with a subset of the training data for example.\nThis also allows me to run the training and validation phases concurrently.\n\nAll actors can be deployed on a single machine or in a cluster of machines, leveraging both horizontal and vertical computing power.\n\n## Components\n\n![components](doc/training.png \"Logo Title Text 1\")\n\n\n### Gate\nA gate is similar to a layer. \nEvery gate is an actor. \nWhereas in a traditional network there is only one optimizer for the complete network, here every gate has its optimizer. \nThere is however no difference in functionality, since optimizers do not share data between layers. \n\nA gate can consist of an arbitrarily complex network in itself. \nYou can put multiple convolutional, pooling, batchnorm, dropouts, ... and so on in one gate. \nOr you can assign them to different gates, thus distributing the work over multiple actors.\n\n### Network\nA network is a sequence of gates.\nThe sequence is open. \nYou can attach multiple sentinels, each with its data provider, to the network.\n\n### Sentinel\nThe sentinel is an actor, and does a couple of things:\n- provide data, through the data provider, for training, validation and test\n- calculate and report loss and accuracy during training and validation\n- trigger the forward pass for each batch during training, validation and test\n- trigger the backward pass for each batch when training\n\nYou can attach multiple sentinels to a network. \nTypically, one or more sentinels are provided for training, and one for validation. \nThe latter runs every 20 seconds for example, whereas the training sentinels run continuously.\n\n## Prepare\n\nAfter having cloned/downloaded the source code of this project, get the MNIST dataset by executing the script `scripts/download_mnist.sh`\nor by manually downloading the files from the URLs in the script, and putting them in a folder `data/mnist`.\n\nYou will need [sbt](https://www.scala-sbt.org/download.html) to build the project.\n\n## Build and run\n\n### Single JVM\n```\nsbt 'runMain botkop.akkordeon.SimpleAkkordeon'\n```\n\nThis will produce output similar to this:\n\n```\n[info] tdp        epoch:     1 loss:  2.939994 duration: 7105.075212ms scores: (0.22618558114035087)\n[info] tdp        epoch:     2 loss:  1.848889 duration: 2339.476822ms scores: (0.4044360040590681)\n[info] tdp        epoch:     3 loss:  1.463448 duration: 2278.748975ms scores: (0.5158070709745762)\n[info] tdp        epoch:     4 loss:  1.136699 duration: 2245.955278ms scores: (0.6229231711161338)\n[info] tdp        epoch:     5 loss:  0.968350 duration: 2309.301106ms scores: (0.6776098002821712)\n[info] tdp        epoch:     6 loss:  0.880695 duration: 2259.42184ms scores: (0.7060564301781735)\n[info] tdp        epoch:     7 loss:  0.892328 duration: 2856.552759ms scores: (0.7027704402551813)\n[info] vdp        epoch:     1 loss:  0.866831 duration: 1768.835725ms scores: (0.7107204861111112)\n```\n\n### Multiple JVMs\nIn this scenario, I show how to deploy the neural net on one JVM, and the sentinels on other JVMs.\nThe JVMs can be deployed on the same machine, or on different machines.\nNote that when deploying the sentinels on separate machines, you will need to make the data accessible on those machines.\n\nAnother scenario that comes to mind is to split the network itself in separate entities, and deploy those on different JVMs.\nLet's say that for now, I leave this as an exercise for the reader.\n\nObtain the IP address of the machine on which you want to run the neural net. \nIf you run all JVMs on the same machine, then you can use `127.0.0.1`.\nAppend a free port number separated by colon:\n```\nexport NNADDR=192.168.1.23:25520\n```\nStart the neural net in a terminal window:\n```\nsbt \"runMain botkop.akkordeon.examples.NetworkApp $NNADDR\"\n```\nObtain the IP address of a machine on which you want to run a sentinel.\nIf you run all JVMs on the same machine, then you can use `127.0.0.1`.\nThe parameter `60000` is the number of samples from the data set you want to use. \nStart a training sentinel in another terminal:\n```\nMY_IP=192.168.0.158\nsbt \"runMain botkop.akkordeon.examples.SentinelApp $MY_IP train 60000 $NNADDR\"\n```\nAnd another one:\n```\nMY_IP=192.168.0.159\nsbt \"runMain botkop.akkordeon.examples.SentinelApp $MY_IP train 3000 $NNADDR\"\n```\nAlso start a validation sentinel. \n```\nMY_IP=192.168.0.160\nsbt \"runMain botkop.akkordeon.examples.SentinelApp $MY_IP validate 10000 $NNADDR\"\n```\n\n\n# References\n\n- [Asynchronous Stochastic Gradient Descent with Delay Compensation](https://arxiv.org/abs/1609.08326)\n- [An analysis of the Delayed Gradients Problem in asynchronous SGD](https://pdfs.semanticscholar.org/716b/a3d174006c19220c985acf132ffdfc6fc37b.pdf)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbotkop%2Fakkordeon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbotkop%2Fakkordeon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbotkop%2Fakkordeon/lists"}