{"id":13935578,"url":"https://github.com/Bluefog-Lib/bluefog","last_synced_at":"2025-07-19T20:33:18.585Z","repository":{"id":40432713,"uuid":"225537951","full_name":"Bluefog-Lib/bluefog","owner":"Bluefog-Lib","description":"Distributed and decentralized training framework for PyTorch over graph","archived":false,"fork":false,"pushed_at":"2024-07-25T10:59:34.000Z","size":158921,"stargazers_count":292,"open_issues_count":32,"forks_count":71,"subscribers_count":28,"default_branch":"master","last_synced_at":"2024-11-16T03:24:49.105Z","etag":null,"topics":["asynchronous","decentralized","deeplearning","distributed-computing","machine-learning","mpi","nccl","one-sided","pytorch"],"latest_commit_sha":null,"homepage":"https://bluefog-lib.github.io/bluefog/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Bluefog-Lib.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-03T05:27:21.000Z","updated_at":"2024-11-13T13:00:25.000Z","dependencies_parsed_at":"2022-08-09T20:20:20.133Z","dependency_job_id":"a3f62d41-ab59-4d21-85c5-fdbec33c4ed5","html_url":"https://github.com/Bluefog-Lib/bluefog","commit_stats":{"total_commits":997,"total_committers":10,"mean_commits":99.7,"dds":"0.22066198595787367","last_synced_commit":"9ee376a7f7399f18d864f75b399dcfa6962fa4fb"},"previous_names":["ybc1991/bluefog"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bluefog-Lib%2Fbluefog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bluefog-Lib%2Fbluefog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bluefog-Lib%2Fbluefog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bluefog-Lib%2Fbluefog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Bluefog-Lib","download_url":"https://codeload.github.com/Bluefog-Lib/bluefog/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226677118,"owners_count":17666009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asynchronous","decentralized","deeplearning","distributed-computing","machine-learning","mpi","nccl","one-sided","pytorch"],"created_at":"2024-08-07T23:01:54.041Z","updated_at":"2024-11-27T03:30:51.876Z","avatar_url":"https://github.com/Bluefog-Lib.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"BlueFog\n=======\n\n.. image:: https://github.com/Bluefog-Lib/bluefog/actions/workflows/ci.yml/badge.svg\n    :target: https://github.com/Bluefog-Lib/bluefog/actions/workflows/ci.yml/badge.svg\n\n.. image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg\n    :target: https://img.shields.io/badge/License-Apache%202.0-blue.svg\n    :alt: License\n\n.. image:: https://zenodo.org/badge/225537951.svg\n   :target: https://zenodo.org/badge/latestdoi/225537951\n\n.. image:: https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat\n    :target: https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat\n    \n.. raw:: html\n\n    \u003cp align=\"center\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/65107588/82258821-62d66b80-990f-11ea-9393-bf5456af67e6.png\" alt=\"Logo\" width=\"450\"/\u003e\u003c/p\u003e\n    \nBlueFog is a high-performance distributed training framework built with **decentralized optimization** algorithms. The goal of Bluefog is to make decentralized algorithms easy to use, fault-tolerant, friendly to heterogeneous environment, and even faster than training frameworks built with parameter server, or ring-allreduce.\n\nPerformance\n-----------\n\nBelow are the charts representing the performance of BlueFog that was done on ResNet50 benchmark. Each machine has 8 V100 GPUs (64GB memory) with NVLink-enabled and the inter-connected communication speed is 25Gbps. This is the same hardware setup you can get on AWS_. We test the scaling efficiency with a batch size of 64 for a computationally intensive scenario, and a batch size of 32 for a communicationally intensive scenario.\n\n\n.. raw:: html\n\n    \u003cp align=\"center\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/16711681/98315290-bce5ee80-1f8c-11eb-931f-297a99d958ed.png\" alt=\"Benchmark 1\" width=\"400\"/\u003e\u003cimg src=\"https://user-images.githubusercontent.com/16711681/98315305-c2433900-1f8c-11eb-91b8-1b17f31dce68.png\" alt=\"Benchmark 2\" width=\"400\"/\u003e\u003c/p\u003e\n\n\nIn the figures, the black box represents the ideal linear scaling. It is observed that Bluefog can achieve over 95% scaling efficiency while Horovod reaches around 66% sacling efficiency with batch size 64 on 128 GPUs. For the communicationally intensive scenario with batch size 32, the scaling efficiency gap between Bluefog and Horovod becomes even larger. To \nunderstand more details about the BlueFog benchmark, checkout our `performance page \u003chttps://bluefog-lib.github.io/bluefog/performance.html\u003e`_.\n\nOverview\n--------\nBlueFog is built with decentralized optimization algorithms. This is fundamentally different from other popular distributed training frameworks, such as DistributedDataParallel provided by PyTorch, Horovod, BytePS, etc. \n\nIn each communication stage, neither the typical star-shaped parameter-server toplogy, nor the pipelined ring-allreduce topology is used. Instead, BlueFog will exploit a virtual and probably dynamic network topology (that can be in any shape) to achieve most communication efficiency.\n\n\n..\n    \n    Main Idea: Replace expensive allreduce averaging over gradients by cheap neighbor averaging over parameters\n\nFor each training iteration, one process (or agent) will update its model with information received from its **direct** neighbors defined by the virtual topology. It is observed all communications only occur over the predefied virtual topolgy and no global communication is required. This is why the algorithms is named *decentralized*. \nDecentralized training algorithms are proved in literature that it can converge to the same solution as their standard centralized counterparts. \n\nThe topology decides the communication efficiency. BlueFog supports both **static** topology and **dynamic** topology usages. After tremendous trials, the dynamic Exponential-2 graph is observed to achieve the best performance\nif the number of agents is the power of 2, such as 4, 32, 128 agents. In Exponential-2 graph, each agent will \ncommunicates with the neighbors that are  2 :sup:`0`, 2 :sup:`1`, ..., 2 :sup:`t` hops away. **Dynamic** toplogy means all agents select\none neighbor only in one iteration and select next neighbor in next iteration as illustrated in the following figure:\n\n.. raw:: html\n\n    \u003cp align=\"center\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/16711681/97928035-04654400-1d1b-11eb-91d2-2da890b4522e.png\" alt=\"one-peer-exp2\" width=\"650\"/\u003e\u003c/p\u003e\n\nIn this scenario, the communcation cost for each iteration is only one unit delay, one standard parameter size to transmit and no communication conflict happens, which is better than what parameter server or ring-allreduce promises. As for loss and accuracy guarantees, please check out our theoratical paper and our `slides \u003chttps://github.com/Bluefog-Lib/bluefog/blob/master/resources/Faster_Learning_over_Networks_and_BlueFog.pdf\u003e`_ preseneted on MLA'20. [A full tutorial will be added in future].\n\n\nQuick Start\n-----------\n\nFirst, make sure your environment is with ``python\u003e=3.7`` and ``openmpi \u003e= 4.0``.\nThen, install Bluefog with: ``pip install --no-cache-dir bluefog`` or\n``BLUEFOG_WITH_NCCL=1 pip install bluefog`` if NCCL is supported (``NCCL\u003e=2.7``). Check\nthe `install_bluefog \u003chttps://bluefog-lib.github.io/bluefog/install.html\u003e`_ page if you need more information or other install options.\n\nUsing BlueFog With Jupyter Notebook\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBlueFog is able to run interactively with Jupyte Notebook. Please check out our `hello world notebook \u003chttps://github.com/Bluefog-Lib/bluefog/blob/master/examples/interactive_bluefog_helloworld.ipynb\u003e`_ or other notebooks in the example folder to start with.\nInteracitve BlueFog is great for research and algorithm experiment. For the large-scale machine learning problem, we recommand\nto use BlueFog with script.\n\nUsing BlueFog With Script\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe provide high-level wrapper for torch optimizer. You just need to modify\nthe existing script to distributed implementation is wrapping the optimizer\nwith our ``DistributedNeighborAllreduceOptimizer``,\nthen run it through ``bfrun``. That is it!\n\n.. code-block:: python\n\n   # Execute Python functions in parallel through\n   # bfrun -np 4 python file.py\n\n   import torch \n   import bluefog.torch as bf\n   ...\n   bf.init()\n   optimizer = optim.SGD(model.parameters(), lr=lr * bf.size())\n   optimizer = bf.DistributedNeighborAllreduceOptimizer(\n      optimizer, model=model\n   )\n   ...\nPrevious example is for static topology usage. For dynamic topology case, you need a little bit\nmore code:\n\n.. code-block:: python\n   \n  from bluefog.common import topology_util\n  ...\n  # Same setup code as previous snippets\n  dynamic_neighbors_gen = topology_util.GetInnerOuterExpo2DynamicSendRecvRanks(\n            bf.size(), local_size=bf.local_size(), self_rank=bf.rank())\n  def dynamic_topology_update(epoch, batch_idx):\n    send_neighbors, recv_neighbors = next(dynamic_neighbors_gen)\n    avg_weight = 1/(len(recv_neighbors) + 1)\n    optimizer.send_neighbors = to_neighbors\n    optimizer.neighbor_weights = {r: avg_weight for r in recv_neighbors}\n    optimizer.self_weight = avg_weight\n\n  # Torch training code\n  for epoch in range(epochs):\n    for batch_idx, (data, target) in enumerate(train_loader):\n        dynamic_topology_update(epoch, batch_idx)\n        ...\n        loss.backward()\n        optimizer.step()\n\nCheck our BlueFog `dynamic topology neighbor averaging \u003chttps://bluefog-lib.github.io/bluefog/neighbor_average.html\u003e`_\npage to see more on how to control and use topology. See BlueFog `examples`_ folder for full code.\n\n\nWe also provide lots of low-level functions, which you can use those as building\nblocks to construct your own distributed training algorithm. The following example\nillustrates how to run a simple consensus algorithm through bluefog.\n\n.. code-block:: python\n\n   import torch\n   import bluefog.torch as bf\n\n   bf.init()\n   x = torch.Tensor([bf.rank()])\n   for _ in range(100):\n      x = bf.neighbor_allreduce(x)\n   print(f\"{bf.rank()}: Average value of all ranks is {x}\")\n\nCheckout our `API explanation page \u003chttps://bluefog-lib.github.io/bluefog/bluefog_ops.html\u003e`_ to see all supported *synchronous* and *asynchronous* features.\n\nThe Bluefog source code was based off `Horovod \u003chttps://github.com/horovod/horovod\u003e`_ repository. Hence, BlueFog shared lots of common features from Horovod such as `timeline \u003chttps://bluefog-lib.github.io/bluefog/timeline.html\u003e`_, tensor-fusion, etc. Here, we want to express our gratitude to the Horovod team. \n\nMaterials\n---------\n*Bluefog: Make decentralized algorithms practical for optimization and deep learning*. B. Ying, K. Yuan, H. Hu, Y. Chen, and W. Yin.  arXiv preprint arXiv:2111.04287, 2021. `[link] \u003chttps://arxiv.org/abs/2111.04287\u003e`_\n\n*Faster Learning over Networks and BlueFog*, BlueFog Team, invited talk at MLA, 2020 `[slides] \u003chttps://github.com/Bluefog-Lib/bluefog/blob/master/resources/Faster_Learning_over_Networks_and_BlueFog.pdf\u003e`_\n\n\nCite\n---------\nBluefog is uploaded to Zenodo. An equivalent BibTex format reference is below for all the versions:\n  \n.. code-block:: bibtex\n\n     % System paper\n     @article{bluefog,\n       author       = {Ying, Bicheng and Yuan, Kun and Hu, Hanbin and Chen, Yiming and Yin, Wotao},\n       title        = {BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning},\n       journal.     = {arXiv preprint arXiv:2111.04287},\n       year         = {2021},\n     }\n\n     % Theoratical Papers\n     @article{ying2021exponential,\n       title={Exponential Graph is Provably Efficient for Decentralized Deep Training},\n       author={Ying, Bicheng and Yuan, Kun and Chen, Yiming and Hu, Hanbin and Pan, Pan and Yin, Wotao},\n       journal={Advances in Neural Information Processing Systems (NeurIPS), 34.\n                Also available at arXiv:2110.13363},\n       year={2021}\n     }\n\n     @inproceedings{yuan2021decentlam,\n        title={DecentLaM: Decentralized Momentum SGD for Large-Batch Deep Training},\n        author={Yuan, Kun and Chen, Yiming and Huang, Xinmeng and Zhang, Yingya and Pan, Pan and Xu, Yinghui and Yin, Wotao},\n        booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},\n        pages={3029--3039},\n        year={2021}\n     }\n\n     @article{yuan2020influence,\n        title={On the influence of bias-correction on distributed stochastic optimization},\n        author={Yuan, Kun and Alghunaim, Sulaiman A and Ying, Bicheng and Sayed, Ali H},\n        journal={IEEE Transactions on Signal Processing},\n        volume={68},\n        pages={4352--4367},\n        year={2020},\n        publisher={IEEE}\n     }\n\nTroubleshooting\n---------\nImport bluefog.torch failed\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you see the error message below, it means that bluefog is not installed properly. Please install bluefog using github source and recompile bluefog （e.g. make clean \u0026\u0026 make -j $(nproc)  \u0026\u0026 BLUEFOG_WITH_NCCL=1 pip install .）\n\n.. code-block:: python\n\n    import bluefog.torch as bf\n    Traceback (most recent call last):\n    File \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\n    File \"/usr/local/lib/python3.7/dist-packages/bluefog/torch/__init__.py\", line 34, in \u003cmodule\u003e\n        from bluefog.torch.mpi_ops import init, shutdown\n    File \"/usr/local/lib/python3.7/dist-packages/bluefog/torch/mpi_ops.py\", line 23, in \u003cmodule\u003e\n        from bluefog.torch import mpi_lib  # C library\n    ImportError: /usr/local/lib/python3.7/dist-packages/bluefog/torch/mpi_lib.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN7bluefog6common14NCCLController9AllreduceERNS0_16TensorTableEntryE\n\n\n.. _AWS: https://aws.amazon.com/about-aws/whats-new/2018/12/introducing-amazon-ec2-p3dn-instances-our-most-powerful-gpu-instance-yet/\n.. _examples: https://github.com/Bluefog-Lib/bluefog/tree/master/examples\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBluefog-Lib%2Fbluefog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBluefog-Lib%2Fbluefog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBluefog-Lib%2Fbluefog/lists"}