{"id":27913185,"url":"https://github.com/pytorch/gloo","last_synced_at":"2025-12-11T22:45:51.980Z","repository":{"id":39717941,"uuid":"80786957","full_name":"pytorch/gloo","owner":"pytorch","description":"Collective communications library with various primitives for multi-machine training.","archived":false,"fork":false,"pushed_at":"2025-12-02T06:30:58.000Z","size":1616,"stargazers_count":1374,"open_issues_count":105,"forks_count":338,"subscribers_count":61,"default_branch":"main","last_synced_at":"2025-12-04T16:11:40.388Z","etag":null,"topics":["collectives","distributed-training","pytorch"],"latest_commit_sha":null,"homepage":"https://pytorch.org/docs/stable/distributed.html","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pytorch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2017-02-03T01:37:01.000Z","updated_at":"2025-12-04T15:48:52.000Z","dependencies_parsed_at":"2025-12-03T09:03:24.142Z","dependency_job_id":null,"html_url":"https://github.com/pytorch/gloo","commit_stats":{"total_commits":481,"total_committers":95,"mean_commits":5.063157894736842,"dds":0.5550935550935551,"last_synced_commit":"dc507d1eb822c4396aaca284efff498aba33c7dc"},"previous_names":["pytorch/gloo","facebookincubator/gloo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pytorch/gloo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fgloo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fgloo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fgloo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fgloo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pytorch","download_url":"https://codeload.github.com/pytorch/gloo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fgloo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27671993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-11T02:00:11.302Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collectives","distributed-training","pytorch"],"created_at":"2025-05-06T13:55:15.978Z","updated_at":"2025-12-11T22:45:51.918Z","avatar_url":"https://github.com/pytorch.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"./media/gloo_100k_dark.svg\"\u003e\n    \u003cimg width=\"55%\" src=\"./media/gloo_100k_light.svg\" alt=\"Gloo\"\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003e\nCollective communications library with various primitives for multi-machine training.\n\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n  | \u003ca href=\"https://github.com/facebookincubator/gloo/tree/main/docs\"\u003e\u003cb\u003eGloo Documentation\u003c/b\u003e\u003c/a\u003e\n  | \u003ca href=\"https://pytorch.org/docs/stable/distributed.html\"\u003e\u003cb\u003ePyTorch Distributed Documentation\u003c/b\u003e\u003c/a\u003e\n  | \u003ca href=\"https://docs.google.com/presentation/d/1BX4o0ggV0-1MLwlLYcFkZHgNThyZX1-IJM3qbm0cmaA/edit?usp=sharing\"\u003e\u003cb\u003eIntroduction to Gloo Presentation\u003c/b\u003e\u003c/a\u003e\n  |\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://opensource.fb.com/support-ukraine\"\u003e\u003cimg alt=\"Support Ukraine\" src=\"https://img.shields.io/badge/Support-Ukraine-FFD500?style=flat\u0026labelColor=005BBB\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/facebookincubator/gloo/actions/workflows/build-linux.yml\"\u003e\u003cimg src=\"https://github.com/facebookincubator/gloo/actions/workflows/build-linux.yml/badge.svg\" alt=\"CI-Linux\"\u003e\u003c/a\u003e\n\n\u003c/p\u003e\n\n---\n\n\nGloo is a collective communications library. It comes with a number of\ncollective algorithms useful for machine learning applications. These\ninclude a barrier, broadcast, and allreduce.\n\nTransport of data between participating machines is abstracted so that\nIP can be used at all times, or InifiniBand (or RoCE) when available.\nIn the latter case, if the InfiniBand transport is used, [GPUDirect][gpudirect]\ncan be used to accelerate cross machine GPU-to-GPU memory transfers.\n\n[gpudirect]: https://developer.nvidia.com/gpudirect\n\nWhere applicable, algorithms have an implementation that works with\nsystem memory buffers, and one that works with NVIDIA GPU memory\nbuffers. In the latter case, it is not necessary to copy memory between\nhost and device; this is taken care of by the algorithm implementations.\n\n## Requirements\n\nGloo is built to run on Linux and has no hard dependencies other than libstdc++.\nThat said, it will generally only be useful when used in combination with a few\noptional dependencies below.\n\nOptional dependencies are:\n* [CUDA][cuda] and [NCCL][nccl] -- for CUDA aware algorithms, tests, and benchmark\n* [Google Test][gtest] -- to build and run tests\n* [Hiredis][hiredis] -- for coordinating machine rendezvous through Redis\n* [MPI][mpi] -- for coordinating machine rendezvous through MPI\n\n[cuda]: http://www.nvidia.com/object/cuda_home_new.html\n[nccl]: https://github.com/nvidia/nccl\n[gtest]: https://github.com/google/googletest\n[hiredis]: https://github.com/redis/hiredis\n[mpi]: https://www.open-mpi.org/\n\n## Documentation\n\nPlease refer to [docs/](docs/) for detailed documentation.\n\n## Building\n\nYou can build Gloo using CMake.\n\nSince it is a library, it is most convenient to vendor it in your own\nproject and include the project root in your own CMake configuration.\n\n### Test\n\nBuilding the tests requires Google Test version 1.8 or higher. On\nUbuntu, this version ships with version 17.10 and up. If you run an\nolder version, you'll have to install Google Test yourself, and set\nthe `GTEST_ROOT` CMake variable.\n\nYou can install Google Test using conda with:\n``` shell\nconda install -c anaconda gmock gtest\n```\nBe carefull that you might need to fish for a package that works with your glibc\n\n\nTo build the tests, run:\n\n``` shell\nmkdir -p build\ncd build\ncmake ../ -DBUILD_TEST=1 -DGTEST_ROOT=/some/path (if using custom install)\nmake\nls -l gloo/test/gloo_test*\n```\n\nTo test the CUDA algorithms, specify `USE_CUDA=ON` as well, and the\nCUDA tests are built at `gloo/test/gloo_test_cuda`.\n\n### Benchmark\n\n\nFirst install the dependencies required by the benchmark tool. On\nUbuntu, you can do so by running:\n\n``` shell\nsudo apt-get install -y libhiredis-dev\n```\n\nThen build the benchmark, run:\n\n``` shell\nmkdir build\ncd build\ncmake ../ -DBUILD_BENCHMARK=1\nmake\nls -l gloo/benchmark/benchmark\n```\n\n## Benchmarking\n\nThe benchmark tool depends on Redis/Hiredis for rendezvous.\nThe benchmark tool for CUDA algorithms\nobviously also depends on both CUDA and NCCL.\n\nTo run a benchmark:\n\n1. Copy the benchmark tool to all participating machines\n\n2. Start a Redis server on any host (either a client machine or one of\n   the machines participating in the test). Note that Redis Cluster is **not** supported.\n\n3. Determine some unique ID for the benchmark run (e.g. the `uuid`\n   tool or some number).\n\n4. On each machine, run (or pass `--help` for more options):\n\n    ```\n    ./benchmark \\\n      --size \u003cnumber of machines\u003e \\\n      --rank \u003cindex of this machine, starting at 0\u003e \\\n      --redis-host \u003cRedis host\u003e \\\n      --redis-port \u003cRedis port\u003e \\\n      --prefix \u003cunique identifier for this run\u003e \\\n      --transport tcp \\\n      --elements \u003cnumber of elements; -1 for a sweep\u003e \\\n      --iteration-time 1s \\\n      allreduce_ring_chunked\n    ```\n\nExample output (running on 4 machines with a 40GbE network):\n\n``` text\n   elements   min (us)   p50 (us)   p99 (us)   max (us)    samples\n          1        195        263        342        437       3921\n          2        195        261        346        462       4039\n          5        197        261        339        402       3963\n         10        197        263        338        398       3749\n         20        199        268        343        395       4146\n         50        200        265        344        401       3889\n        100        205        265        351        414       3645\n        200        197        264        328        387       3960\n        500        201        264        329        394       4274\n       1000        200        267        330        380       3344\n       2000        205        263        323        395       3682\n       5000        240        335        424        460       3277\n      10000        271        346        402        457       2721\n      20000        283        358        392        428       2719\n      50000        342        438        495        649       1654\n     100000        413        487        669        799       1687\n     200000       1113       1450       1837       2801        669\n     500000       1099       1294       1665       1959        560\n    1000000       1858       2286       2779       6100        320\n    2000000       3546       3993       4364       4886        252\n    5000000      10030      10608      11106      11628         92\n```\n\n## License\n\nGloo is BSD-licensed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Fgloo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytorch%2Fgloo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Fgloo/lists"}