{"id":25511245,"url":"https://github.com/yottalabsai/bloombee","last_synced_at":"2025-04-07T16:17:48.072Z","repository":{"id":278105538,"uuid":"921399920","full_name":"ai-decentralized/BloomBee","owner":"ai-decentralized","description":"Decentralized LLMs fine-tuning and inference with offloading","archived":false,"fork":false,"pushed_at":"2025-03-14T03:04:18.000Z","size":38405,"stargazers_count":87,"open_issues_count":0,"forks_count":13,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-07T13:06:48.395Z","etag":null,"topics":["deep-learning","distributed-systems","llama","machine-learning","pipeline-parallelism","pytorch","tensor-parallelism"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ai-decentralized.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-23T21:57:49.000Z","updated_at":"2025-03-21T16:12:33.000Z","dependencies_parsed_at":"2025-02-22T08:31:26.295Z","dependency_job_id":null,"html_url":"https://github.com/ai-decentralized/BloomBee","commit_stats":null,"previous_names":["yottalabsai/bloombee","ai-decentralized/bloombee"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ai-decentralized%2FBloomBee","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ai-decentralized%2FBloomBee/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ai-decentralized%2FBloomBee/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ai-decentralized%2FBloomBee/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ai-decentralized","download_url":"https://codeload.github.com/ai-decentralized/BloomBee/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247685628,"owners_count":20979085,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","distributed-systems","llama","machine-learning","pipeline-parallelism","pytorch","tensor-parallelism"],"created_at":"2025-02-19T10:30:16.310Z","updated_at":"2025-04-07T16:17:48.053Z","avatar_url":"https://github.com/ai-decentralized.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e  \n    \u003cimg src=\"figures/bloombee.jpg\" alt=\"Bloombee Logo\" /\u003e\u003cbr\u003e  \n    Run large language models in a heterogeneous decentralized environment with offloading.\u003cbr\u003e\n    \u003cbr\u003e\n    \u003ca href=\"https://pypi.org/project/bloombee/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/bloombee.svg?label=PyPI\u0026color=green\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/ai-decentralized/bloombee/actions\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/ai-decentralized/bloombee/pylint.yml?branch=main\u0026label=Build\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://discord.gg/Ypexx2rxt9\"\u003e\u003cimg src=\"https://img.shields.io/discord/1267714065166241813?label=Discord\u0026logo=discord\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003c/p\u003e  \n\nThe rapid rise of generative AI has boosted demand for large language model (LLM) inference and fine-tuining services. While proprietary models are still favored, advancements in open-source LLMs have made them competitive. However, high costs and limited GPU resources hinder deployment. This work introduces BloomBee, a decentralized offline serving system that leverages idle GPU resources to provide cost-effective access to LLMs.\n\nWe rely on global GPU sharing, which includes more consumer-grade GPUs. If your GPU can only manage a small portion of a large language model, like the Llama3.1 (405B) model, you can connect to a network of servers that load different parts of the model. In this network, you can request inference or fine-tuning services.\n\n\u003cp align=\"center\"\u003e\n    🚀 \u0026nbsp;\u003cb\u003e\u003ca href=\"https://colab.research.google.com/drive/1BZn0KrEGaNA2dlzmCTtTIjJKx3bNzOMs#scrollTo=1Qhi4I2PSGgg\"\u003eTry now in Colab\u003c/a\u003e\u003c/b\u003e\n\u003c/p\u003e\n\n## Installation\n\n#### From Pypi\n```\npip install bloombee\n```\n#### From Source\n```bash  \ngit clone https://github.com/ai-decentralized/BloomBee.git  \ncd BloomBee  \npip install .\n```\n## How to use BloomBee(\u003ca href=\"https://colab.research.google.com/drive/1pENMOEoEV01DqBImZzuX_4jTV3fNwNga#scrollTo=oyCFDemCZsRs\"\u003eTry now in Colab\u003c/a\u003e)\n#### 1. Start the main server \n```\npython -m bloombee.cli.run_dht --host_maddrs /ip4/0.0.0.0/tcp/31340 --identity_path bootstrapp1.id \n\n```\nNow you will get the BloomBee's main server location: \n```\nMon 00 01:23:45.678 [INFO] Running a DHT instance. To connect other peers to this one, use --initial_peers /ip4/YOUR_IP_ADDRESS/tcp/31340/p2p/QmefxzDL1DaJ7TcrZjLuz7Xs9sUVKpufyg7f5276ZHFjbQ\n```  \nYou can provide this address as --initial_peers to workers or other backbone servers.\n\nIf you want your swarm to be accessible outside of your local network, ensure that you have a **public IP address** or set up **port forwarding** correctly, so that your peer is reachable from the outside.\n\n#### 2. Connect the workers to the main bloombee server  \nHere is the BloomBee Server location:\n```\nexport BBSERVER=/ip4/10.52.2.249/tcp/31340/p2p/QmefxzDL1DaJ7TcrZjLuz7Xs9sUVKpufyg7f5276ZHFjbQ  \n\n```\nStart one worker to hold 16 blocks (16 tranformer layers)\n```\npython -m bloombee.cli.run_server huggyllama/llama-7b --initial_peers $BBSERVER --num_blocks 16  --identity_path bootstrap_1.id\n```\nStart second worker to hold another 16 blocks (16 tranformer layers)\n```\npython -m bloombee.cli.run_server huggyllama/llama-7b --initial_peers $BBSERVER --num_blocks 16  --identity_path bootstrap_1.id\n```\n\n#### 3. Run inference or finetune jobs\n\n#### Inference   \n```\ncd BloombBee/\npython benchmarks/benchmark_inference.py --model huggyllama/llama-7b  --initial_peers $BBSERVER --torch_dtype float32 --seq_len 128\n```\n\n#### Finetune \n\n```\ncd BloomBee/\npython benchmarks/benchmark_training.py --model huggyllama/llama-7b  --initial_peers $BBSERVER --torch_dtype float32  --n_steps 20 --batch_size 32 --seq_len 128\n```\n\n\n## Acknowledgements  \n\nBloomBee is built upon a few popular libraries: \n\n  - [Hivemind](https://github.com/learning-at-home/hivemind) - A PyTorch library for decentralized deep learning across the Internet.  \n  - [FlexLLMGen](https://github.com/FMInference/FlexLLMGen) - An offloading-based system running on weak GPUs.  \n  - [Petals](https://github.com/bigscience-workshop/petals) - A library for decentralized LLMs fine-tuning and inference without offloading.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyottalabsai%2Fbloombee","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyottalabsai%2Fbloombee","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyottalabsai%2Fbloombee/lists"}