{"id":13717117,"url":"https://github.com/koz4k/dni-pytorch","last_synced_at":"2025-05-07T06:31:56.635Z","repository":{"id":80025465,"uuid":"107472288","full_name":"koz4k/dni-pytorch","owner":"koz4k","description":"Decoupled Neural Interfaces using Synthetic Gradients for PyTorch","archived":false,"fork":false,"pushed_at":"2019-01-12T20:24:41.000Z","size":2004,"stargazers_count":236,"open_issues_count":4,"forks_count":38,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-11-14T05:33:52.526Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/koz4k.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-18T22:59:51.000Z","updated_at":"2024-08-11T00:31:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"2c865fd2-5499-4f36-b9e5-c3c29590dfc9","html_url":"https://github.com/koz4k/dni-pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koz4k%2Fdni-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koz4k%2Fdni-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koz4k%2Fdni-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koz4k%2Fdni-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/koz4k","download_url":"https://codeload.github.com/koz4k/dni-pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252826899,"owners_count":21810201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T00:01:18.155Z","updated_at":"2025-05-07T06:31:56.628Z","avatar_url":"https://github.com/koz4k.png","language":"Python","funding_links":[],"categories":["Pytorch \u0026 related libraries","Python","Pytorch \u0026 related libraries｜Pytorch \u0026 相关库"],"sub_categories":["Other libraries:","Other libraries｜其他库:"],"readme":"Decoupled Neural Interfaces for PyTorch\n=======================================\n\nThis tiny library is an implementation of\n`Decoupled Neural Interfaces using Synthetic Gradients \u003chttps://arxiv.org/abs/1608.05343\u003e`_\nfor `PyTorch \u003chttp://pytorch.org/\u003e`_.\nIt's very simple to use as it was designed to enable researchers to integrate\nDNI into existing models with minimal amounts of code.\n\nTo install, run::\n\n    $ python setup.py install\n\nDescription of the library and how to use it in some typical cases is provided\nbelow. For more information, please read the code.\n\nTerminology\n-----------\n\nThis library uses a message passing abstraction introduced in the paper. Some\nterms used in the API (matching those used in the paper wherever possible):\n\n- ``Interface`` - A Decoupled Neural Interface that decouples two parts (let's\n  call them part A and part B) of the network and lets them communicate via\n  ``message`` passing. It may be ``Forward``, ``Backward`` or\n  ``Bidirectional``.\n- ``BackwardInterface`` - A type of ``Interface`` that the paper focuses on.\n  It can be used to prevent update locking by predicting gradient for part A\n  of the decoupled network based on the activation of its last layer.\n- ``ForwardInterface`` - A type of ``Interface`` that can be used to prevent\n  forward locking by predicting input for part B of the network based on some\n  information known to both parts - in the paper it's the input of the whole\n  network.\n- ``BidirectionalInterface`` - A combination of ``ForwardInterface`` and\n  ``BackwardInterface``, that can be used to achieve a complete unlock.\n- ``message`` - Information that is passed through an ``Interface`` -\n  activation of the last layer for ``ForwardInterface`` or gradient w.r.t.\n  that activation for ``BackwardInterface``. Note that no original information\n  passes through. A ``message`` is consumed by one end of the ``Interface``\n  and used to update a ``Synthesizer``. Then the ``Synthesizer`` can be used\n  produce a synthetic ``message`` at the other end of the ``Interface``.\n- ``trigger`` - Information based on which ``message`` is synthesized. It needs\n  to be accessible by both parts of the network. For ``BackwardInterface``, it's\n  activation of the layer w.r.t. which gradient is to be synthesized. For\n  ``ForwardInterface`` it can be anything - in the paper it's the input of\n  the whole network.\n- ``context`` - Additional information normally not shown to the network at\n  the forward pass, that can condition an ``Interface`` to provide a better\n  estimate of the ``message``. The paper uses labels for this purpose and calls\n  DNI with context cDNI.\n- ``send`` - A method of an ``Interface``, that takes as input ``message``\n  and ``trigger``, based on which that ``message`` should be generated,\n  and updates ``Synthesizer`` to improve the estimate.\n- ``receive`` - A method of an ``Interface``, that takes as input ``trigger``\n  and returns a ``message`` generated by a ``Synthesizer``.\n- ``Synthesizer`` - A regression model that estimates ``message`` based on\n  ``trigger`` and ``context``.\n\nTypical use cases\n-----------------\n\nSynthetic Gradient for Feed-Forward Networks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this case we want to decouple two parts A and B of a neural network to\nachieve an update unlock, so that there is a normal forward pass from part A to\nB, but part A learns using synthetic gradient generated by the DNI.\n\n.. image:: images/feedforward-update-unlock.png\n\nFollowing the paper's convention, solid black arrows are update-locked forward\nconnections, dashed black arrows are update-unlocked forward connections, green\narrows are real error gradients and blue arrows are synthetic error gradients.\nFull circles denote synthetic gradient loss computation and ``Synthesizer``\nupdate.\n\nWe can use a ``BackwardInterface`` to do that:\n\n.. code-block:: python\n\n    class Network(torch.nn.Module):\n    \n        def __init__(self):\n            # ...\n\n            # 1. create a BackwardInterface, assuming that dimensionality of\n            #    the activation for which we want to synthesize gradients is\n            #    activation_dim\n            self.backward_interface = dni.BackwardInterface(\n                dni.BasicSynthesizer(output_dim=activation_dim, n_hidden=1)\n            )\n\n            # ...\n\n        def forward(self, x):\n            # ...\n\n            # 2. call the BackwardInterface at the point where we want to\n            #    decouple the network\n            x = self.backward_interface(x)\n\n            # ...\n\n            return x\n\nThat's it! During the forward pass, ``BackwardInterface`` will use a\n``Synthesizer`` to generate synthetic gradient w.r.t. activation, backpropagate\nit and add to the computation graph a node that will intercept\nthe real gradient during the backward pass and use it to update the\n``Synthesizer``'s estimate.\n\nThe ``Synthesizer`` used here is ``BasicSynthesizer`` - a multi-layer\nperceptron with ReLU activation function. Writing a custom ``Synthesizer`` is\ndescribed at `Writing custom Synthesizers`_.\n\nYou can specify a ``context`` by passing ``context_dim`` (dimensionality of the\ncontext vector) to the ``BasicSynthesizer`` constructor and wrapping all DNI\ncalls in the ``dni.synthesizer_context`` context manager:\n\n.. code-block:: python\n\n    class Network(torch.nn.Module):\n    \n        def __init__(self):\n            # ...\n\n            self.backward_interface = dni.BackwardInterface(\n                dni.BasicSynthesizer(\n                    output_dim=activation_dim, n_hidden=1,\n                    context_dim=context_dim\n                )\n            )\n\n            # ...\n\n        def forward(self, x, y):\n            # ...\n\n            # assuming that context is labels given in variable y\n            with dni.synthesizer_context(y):\n                x = self.backward_interface(x)\n\n            # ...\n\n            return x\n\nExample code for digit classification on MNIST is at\n`examples/mnist-mlp \u003cexamples/mnist-mlp\u003e`_.\n\nComplete Unlock for Feed-Forward Networks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this case we want to decouple two parts A and B of a neural network to\nachieve forward and update unlock, so that part B receives synthetic input and\npart A learns using synthetic gradient generated by the DNI.\n\n.. image:: images/feedforward-complete-unlock.png\n\nRed arrows are synthetic inputs.\n\nWe can use a ``BidirectionalInterface`` to do that:\n\n.. code-block:: python\n\n    class Network(torch.nn.Module):\n    \n        def __init__(self):\n            # ...\n\n            # 1. create a BidirectionalInterface, assuming that dimensionality of\n            #    the activation for which we want to synthesize gradients is\n            #    activation_dim and dimensionality of the input of the whole\n            #    network is input_dim\n            self.bidirectional_interface = dni.BidirectionalInterface(\n                # Synthesizer generating synthetic inputs for part B, trigger\n                # here is the input of the network\n                dni.BasicSynthesizer(\n                    output_dim=activation_dim, n_hidden=1,\n                    trigger_dim=input_dim\n                ),\n                # Synthesizer generating synthetic gradients for part A,\n                # trigger here is the last activation of part A (no need to\n                # specify dimensionality)\n                dni.BasicSynthesizer(\n                    output_dim=activation_dim, n_hidden=1\n                )\n            )\n\n            # ...\n\n        def forward(self, input):\n            x = input\n\n            # ...\n\n            # 2. call the BidirectionalInterface at the point where we want to\n            #    decouple the network, need to pass both the last activation\n            #    and the trigger, which in this case is the input of the whole\n            #    network\n            x = self.backward_interface(x, input)\n\n            # ...\n\n            return x\n\nDuring the forward pass, ``BidirectionalInterface`` will receive real\nactivation, use it to update the input ``Synthesizer``, generate synthetic\ngradient w.r.t. that activation using the gradient ``Synthesizer``,\nbackpropagate it, generate synthetic input using the input ``Synthesizer``\nand attach to it a computation graph node that will intercept the real gradient\nw.r.t. the synthetic input and use it to update the gradient ``Synthesizer``.\n\nExample code for digit classification on MNIST is at\n`examples/mnist-full-unlock \u003cexamples/mnist-full-unlock\u003e`_.\n\nWriting custom Synthesizers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis library includes only ``BasicSynthesizer`` - a very simple ``Synthesizer``\nbased on a multi-layer perceptron with ReLU activation function. It may not be\nsufficient for all cases, for example for classifying MNIST digits using a CNN\nthe paper uses a ``Synthesizer`` that is also a CNN.\n\nYou can easily write a custom ``Synthesizer`` by subclassing\n``torch.nn.Module`` with method ``forward`` taking ``trigger`` and ``context``\nas arguments and returning a synthetic ``message``:\n\n.. code-block:: python\n\n    class CustomSynthesizer(torch.nn.Module):\n\n        def forward(self, trigger, context):\n            # synthesize the message\n            return message\n\n``trigger`` will be a ``torch.autograd.Variable`` and ``context`` will be\nwhatever is passed to the ``dni.synthesizer_context`` context manager, or\n``None`` if ``dni.synthesizer_context`` is not used.\n\nExample code for digit classification on MNIST using a CNN is at\n`examples/mnist-cnn \u003cexamples/mnist-cnn\u003e`_.\n\nSynthetic Gradient for Recurrent Networks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this case we want to use DNI to approximate gradient from an\ninfinitely-unrolled recurrent neural network and feed it to the last step of\nthe RNN unrolled by truncated BPTT.\n\n.. image:: images/rnn-update-unlock.png\n\nWe can use methods ``make_trigger`` and ``backward`` of ``BackwardInterface``\nto do that:\n\n.. code-block:: python\n\n    class Network(torch.nn.module):\n\n        def __init__(self):\n            # ...\n\n            # 1. create a BackwardInterface, assuming that dimensionality of\n            #    the RNN hidden state is hidden_dim\n            self.backward_interface = dni.BackwardInterface(\n                dni.BasicSynthesizer(output_dim=hidden_dim, n_hidden=1)\n            )\n\n            # ...\n\n        def forward(self, input, hidden):\n            # ...\n\n            # 2. call make_trigger on the first state of the unrolled RNN\n            hidden = self.backward_interface.make_trigger(hidden)\n            # run the RNN\n            (output, hidden) = self.rnn(input, hidden)\n            # 3. call backward on the last state of the unrolled RNN\n            self.backward_interface.backward(hidden)\n\n            # ...\n\n    # in the training loop:\n    with dni.defer_backward():\n        (output, hidden) = model(input, hidden)\n        loss = criterion(output, target)\n        dni.backward(loss)\n\n``BackwardInterface.make_trigger`` marks the first hidden state as a\n``trigger`` used to update the gradient estimate. During the backward pass,\ngradient passing through the ``trigger`` will be compared to synthetic gradient\ngenerated based on the same ``trigger`` and the ``Synthesizer`` will be\nupdated. ``BackwardInterface.backward`` computes synthetic gradient based on\nthe last hidden state and backpropagates it.\n\nBecause we are passing both real and synthetic gradients through the same nodes\nin the computation graph, we need to use ``dni.defer_backward`` and\n``dni.backward``. ``dni.defer_backward`` is a context manager that accumulates\nall gradients passed to ``dni.backward`` (including those generated by\n``Interfaces``) and backpropagates them all at once in the end. If we don't do\nthat, PyTorch will complain about backpropagating twice through the same\ncomputation graph.\n\nExample code for word-level language modeling on Penn Treebank is at\n`examples/rnn \u003cexamples/rnn\u003e`_.\n\nDistributed training with a Complete Unlock\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe paper describes distributed training of complex neural architectures as one\nof the potential uses of DNI. In this case we have a network split into parts\nA and B trained independently, perhaps on different machines, communicating via\nDNI. We can use methods ``send`` and ``receive`` of ``BidirectionalInterface``\nto do that:\n\n.. code-block:: python\n\n    class PartA(torch.nn.Module):\n\n        def forward(self, input):\n            x = input\n\n            # ...\n\n            # send the intermediate results computed by part A via DNI\n            self.bidirectional_interface.send(x, input)\n\n    class PartB(torch.nn.Module):\n\n        def forward(self, input):\n            # receive the intermediate results computed by part A via DNI\n            x = self.bidirectional_interface.receive(input)\n\n            # ...\n\n            return x\n\n``PartA`` and ``PartB`` have their own copies of the\n``BidirectionalInterface``. ``BidirectionalInterface.send`` will compute\nsynthetic gradient w.r.t. ``x`` (intermediate results computed by ``PartA``)\nbased on ``x`` and ``input`` (input of the whole network), backpropagate it and\nupdate the estimate of ``x``. ``BidirectionalInterface.receive`` will compute\nsynthetic ``x`` based on ``input`` and in the backward pass, update the\nestimate of the gradient w.r.t. ``x``. This should work as long as\n``BidirectionalInterface`` parameters are synchronized between ``PartA`` and\n``PartB`` once in a while.\n\nThere is no example code for this use case yet. Contributions welcome!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoz4k%2Fdni-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkoz4k%2Fdni-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoz4k%2Fdni-pytorch/lists"}