{"id":13451205,"url":"https://github.com/bigscience-workshop/petals","last_synced_at":"2025-05-13T16:08:50.109Z","repository":{"id":45737284,"uuid":"502482803","full_name":"bigscience-workshop/petals","owner":"bigscience-workshop","description":"🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading","archived":false,"fork":false,"pushed_at":"2024-09-07T11:54:28.000Z","size":4256,"stargazers_count":9570,"open_issues_count":109,"forks_count":552,"subscribers_count":99,"default_branch":"main","last_synced_at":"2025-04-15T08:44:47.702Z","etag":null,"topics":["bloom","chatbot","deep-learning","distributed-systems","falcon","gpt","guanaco","language-models","large-language-models","llama","machine-learning","mixtral","neural-networks","nlp","pipeline-parallelism","pretrained-models","pytorch","tensor-parallelism","transformer","volunteer-computing"],"latest_commit_sha":null,"homepage":"https://petals.dev","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bigscience-workshop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-12T00:10:27.000Z","updated_at":"2025-04-13T23:20:20.000Z","dependencies_parsed_at":"2022-08-12T12:10:21.546Z","dependency_job_id":"25874513-70b7-46bc-9190-3b9119eb1deb","html_url":"https://github.com/bigscience-workshop/petals","commit_stats":{"total_commits":513,"total_committers":21,"mean_commits":"24.428571428571427","dds":0.5730994152046784,"last_synced_commit":"22afba627a7eb4fcfe9418c49472c6a51334b8ac"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2Fpetals","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2Fpetals/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2Fpetals/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2Fpetals/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bigscience-workshop","download_url":"https://codeload.github.com/bigscience-workshop/petals/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250514756,"owners_count":21443208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom","chatbot","deep-learning","distributed-systems","falcon","gpt","guanaco","language-models","large-language-models","llama","machine-learning","mixtral","neural-networks","nlp","pipeline-parallelism","pretrained-models","pytorch","tensor-parallelism","transformer","volunteer-computing"],"created_at":"2024-07-31T07:00:49.772Z","updated_at":"2025-04-23T20:51:35.757Z","avatar_url":"https://github.com/bigscience-workshop.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话","Other","Research \u0026 Data Analysis","Repos","[:robot: machine-learning]([robot-machine-learning)](\u003chttps://github.com/stars/ketsapiwiq/lists/robot-machine-learning\u003e))","chatbot","NLP","Inference UI","4. Fine-Tuning","GitHub projects","Inference","Model Inference"],"sub_categories":["大语言对话模型及数据","Music","Frameworks","Inference Engine"],"readme":"\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://i.imgur.com/7eR7Pan.png\" width=\"400\"\u003e\u003cbr\u003e\n    Run large language models at home, BitTorrent-style.\u003cbr\u003e\n    Fine-tuning and inference \u003ca href=\"https://github.com/bigscience-workshop/petals#benchmarks\"\u003eup to 10x faster\u003c/a\u003e than offloading\n    \u003cbr\u003e\u003cbr\u003e\n    \u003ca href=\"https://pypi.org/project/petals/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/petals.svg?color=green\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://discord.gg/tfHfe8B34k\"\u003e\u003cimg src=\"https://img.shields.io/discord/865254854262652969?label=discord\u0026logo=discord\u0026logoColor=white\"\u003e\u003c/a\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\nGenerate text with distributed **Llama 3.1** (up to 405B), **Mixtral** (8x22B), **Falcon** (40B+) or **BLOOM** (176B) and fine‑tune them for your own tasks \u0026mdash; right from your desktop computer or Google Colab:\n\n```python\nfrom transformers import AutoTokenizer\nfrom petals import AutoDistributedModelForCausalLM\n\n# Choose any model available at https://health.petals.dev\nmodel_name = \"meta-llama/Meta-Llama-3.1-405B-Instruct\"\n\n# Connect to a distributed network hosting model layers\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoDistributedModelForCausalLM.from_pretrained(model_name)\n\n# Run the model as if it were on your computer\ninputs = tokenizer(\"A cat sat\", return_tensors=\"pt\")[\"input_ids\"]\noutputs = model.generate(inputs, max_new_tokens=5)\nprint(tokenizer.decode(outputs[0]))  # A cat sat on a mat...\n```\n\n\u003cp align=\"center\"\u003e\n    🚀 \u0026nbsp;\u003cb\u003e\u003ca href=\"https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing\"\u003eTry now in Colab\u003c/a\u003e\u003c/b\u003e\n\u003c/p\u003e\n\n🦙 **Want to run Llama?** [Request access](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) to its weights, then run `huggingface-cli login` in the terminal before loading the model. Or just try it in our [chatbot app](https://chat.petals.dev).\n\n🔏 **Privacy.** Your data will be processed with the help of other people in the public swarm. Learn more about privacy [here](https://github.com/bigscience-workshop/petals/wiki/Security,-privacy,-and-AI-safety). For sensitive data, you can set up a [private swarm](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm) among people you trust.\n\n💬 **Any questions?** Ping us in [our Discord](https://discord.gg/KdThf2bWVU)!\n\n## Connect your GPU and increase Petals capacity\n\nPetals is a community-run system \u0026mdash; we rely on people sharing their GPUs. You can help serving one of the [available models](https://health.petals.dev) or host a new model from 🤗 [Model Hub](https://huggingface.co/models)!\n\nAs an example, here is how to host a part of [Llama 3.1 (405B) Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) on your GPU:\n\n🦙 **Want to host Llama?** [Request access](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) to its weights, then run `huggingface-cli login` in the terminal before loading the model.\n\n🐧 **Linux + Anaconda.** Run these commands for NVIDIA GPUs (or follow [this](https://github.com/bigscience-workshop/petals/wiki/Running-on-AMD-GPU) for AMD):\n\n```bash\nconda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia\npip install git+https://github.com/bigscience-workshop/petals\npython -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct\n```\n\n🪟 **Windows + WSL.** Follow [this guide](https://github.com/bigscience-workshop/petals/wiki/Run-Petals-server-on-Windows) on our Wiki.\n\n🐋 **Docker.** Run our [Docker](https://www.docker.com) image for NVIDIA GPUs (or follow [this](https://github.com/bigscience-workshop/petals/wiki/Running-on-AMD-GPU) for AMD):\n\n```bash\nsudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \\\n    learningathome/petals:main \\\n    python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct\n```\n\n🍏 **macOS + Apple M1/M2 GPU.** Install [Homebrew](https://brew.sh/), then run these commands:\n\n```bash\nbrew install python\npython3 -m pip install git+https://github.com/bigscience-workshop/petals\npython3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct\n```\n\n\u003cp align=\"center\"\u003e\n    📚 \u0026nbsp;\u003cb\u003e\u003ca href=\"https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions#running-a-server\"\u003eLearn more\u003c/a\u003e\u003c/b\u003e (how to use multiple GPUs, start the server on boot, etc.)\n\u003c/p\u003e\n\n🔒 **Security.** Hosting a server does not allow others to run custom code on your computer. Learn more [here](https://github.com/bigscience-workshop/petals/wiki/Security,-privacy,-and-AI-safety).\n\n💬 **Any questions?** Ping us in [our Discord](https://discord.gg/X7DgtxgMhc)!\n\n🏆 **Thank you!** Once you load and host 10+ blocks, we can show your name or link on the [swarm monitor](https://health.petals.dev) as a way to say thanks. You can specify them with `--public_name YOUR_NAME`.\n\n## How does it work?\n\n- You load a small part of the model, then join a [network](https://health.petals.dev) of people serving the other parts. Single‑batch inference runs at up to **6 tokens/sec** for **Llama 2** (70B) and up to **4 tokens/sec** for **Falcon** (180B) — enough for [chatbots](https://chat.petals.dev) and interactive apps.\n- You can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of **PyTorch** and **🤗 Transformers**.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://i.imgur.com/RTYF3yW.png\" width=\"800\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    📜 \u0026nbsp;\u003cb\u003e\u003ca href=\"https://arxiv.org/pdf/2209.01188.pdf\"\u003eRead paper\u003c/a\u003e\u003c/b\u003e\n    \u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\n    📚 \u0026nbsp;\u003cb\u003e\u003ca href=\"https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions\"\u003eSee FAQ\u003c/a\u003e\u003c/b\u003e\n\u003c/p\u003e\n\n## 📚 Tutorials, examples, and more\n\nBasic tutorials:\n\n- Getting started: [tutorial](https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing)\n- Prompt-tune Llama-65B for text semantic classification: [tutorial](https://colab.research.google.com/github/bigscience-workshop/petals/blob/main/examples/prompt-tuning-sst2.ipynb)\n- Prompt-tune BLOOM to create a personified chatbot: [tutorial](https://colab.research.google.com/github/bigscience-workshop/petals/blob/main/examples/prompt-tuning-personachat.ipynb)\n\nUseful tools:\n\n- [Chatbot web app](https://chat.petals.dev) (connects to Petals via an HTTP/WebSocket endpoint): [source code](https://github.com/petals-infra/chat.petals.dev)\n- [Monitor](https://health.petals.dev) for the public swarm: [source code](https://github.com/petals-infra/health.petals.dev)\n\nAdvanced guides:\n\n- Launch a private swarm: [guide](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm)\n- Run a custom model: [guide](https://github.com/bigscience-workshop/petals/wiki/Run-a-custom-model-with-Petals)\n\n### Benchmarks\n\nPlease see **Section 3.3** of our [paper](https://arxiv.org/pdf/2209.01188.pdf).\n\n### 🛠️ Contributing\n\nPlease see our [FAQ](https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions#contributing) on contributing.\n\n### 📜 Citations\n\nAlexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel.\n[Petals: Collaborative Inference and Fine-tuning of Large Models.](https://arxiv.org/abs/2209.01188)\n_Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)._ 2023.\n\n```bibtex\n@inproceedings{borzunov2023petals,\n  title = {Petals: Collaborative Inference and Fine-tuning of Large Models},\n  author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin},\n  booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},\n  pages = {558--568},\n  year = {2023},\n  url = {https://arxiv.org/abs/2209.01188}\n}\n```\n\nAlexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin Raffel.\n[Distributed inference and fine-tuning of large language models over the Internet.](https://arxiv.org/abs/2312.08361)\n_Advances in Neural Information Processing Systems_ 36 (2023).\n\n```bibtex\n@inproceedings{borzunov2023distributed,\n  title = {Distributed inference and fine-tuning of large language models over the {I}nternet},\n  author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin},\n  booktitle = {Advances in Neural Information Processing Systems},\n  volume = {36},\n  pages = {12312--12331},\n  year = {2023},\n  url = {https://arxiv.org/abs/2312.08361}\n}\n```\n\n--------------------------------------------------------------------------------\n\n\u003cp align=\"center\"\u003e\n    This project is a part of the \u003ca href=\"https://bigscience.huggingface.co/\"\u003eBigScience\u003c/a\u003e research workshop.\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://petals.dev/bigscience.png\" width=\"150\"\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigscience-workshop%2Fpetals","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbigscience-workshop%2Fpetals","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigscience-workshop%2Fpetals/lists"}