{"id":14964477,"url":"https://github.com/meta-llama/llama-stack","last_synced_at":"2025-05-12T05:28:13.061Z","repository":{"id":249841766,"uuid":"820145382","full_name":"meta-llama/llama-stack","owner":"meta-llama","description":"Composable building blocks to build Llama Apps","archived":false,"fork":false,"pushed_at":"2025-05-11T01:32:44.000Z","size":21780,"stargazers_count":7770,"open_issues_count":195,"forks_count":1022,"subscribers_count":131,"default_branch":"main","last_synced_at":"2025-05-12T02:43:09.112Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://llama-stack.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/meta-llama.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-25T22:32:26.000Z","updated_at":"2025-05-11T11:41:58.000Z","dependencies_parsed_at":"2024-08-09T23:28:28.192Z","dependency_job_id":"34bdab3f-3393-43d1-8e92-321a7c548253","html_url":"https://github.com/meta-llama/llama-stack","commit_stats":{"total_commits":730,"total_committers":65,"mean_commits":11.23076923076923,"dds":0.6356164383561644,"last_synced_commit":"c2f7905fa4f9515ce87573add6002a7cc5c4203f"},"previous_names":["meta-llama/llama-toolchain","meta-llama/llama-stack"],"tags_count":57,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-llama%2Fllama-stack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-llama%2Fllama-stack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-llama%2Fllama-stack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-llama%2Fllama-stack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/meta-llama","download_url":"https://codeload.github.com/meta-llama/llama-stack/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253672701,"owners_count":21945480,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-24T13:33:14.291Z","updated_at":"2025-05-12T05:28:13.037Z","avatar_url":"https://github.com/meta-llama.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话","others","Repos","LLM Application / RAG","Infrastructure / Deployment of LLMs on Device"],"sub_categories":["大语言对话模型及数据","Deployment Frameworks"],"readme":"# Llama Stack\n\n[![PyPI version](https://img.shields.io/pypi/v/llama_stack.svg)](https://pypi.org/project/llama_stack/)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-stack)](https://pypi.org/project/llama-stack/)\n[![License](https://img.shields.io/pypi/l/llama_stack.svg)](https://github.com/meta-llama/llama-stack/blob/main/LICENSE)\n[![Discord](https://img.shields.io/discord/1257833999603335178?color=6A7EC2\u0026logo=discord\u0026logoColor=ffffff)](https://discord.gg/llama-stack)\n[![Unit Tests](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain)\n[![Integration Tests](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml/badge.svg?branch=main)](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain)\n\n[**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)\n\n### ✨🎉 Llama 4 Support  🎉✨\nWe released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta.\n\n\u003cdetails\u003e\n\n\u003csummary\u003e👋 Click here to see how to run Llama 4 models on Llama Stack \u003c/summary\u003e\n\n\\\n*Note you need 8xH100 GPU-host to run these models*\n\n```bash\npip install -U llama_stack\n\nMODEL=\"Llama-4-Scout-17B-16E-Instruct\"\n# get meta url from llama.com\nllama model download --source meta --model-id $MODEL --meta-url \u003cMETA_URL\u003e\n\n# start a llama stack server\nINFERENCE_MODEL=meta-llama/$MODEL llama stack build --run --template meta-reference-gpu\n\n# install client to interact with the server\npip install llama-stack-client\n```\n### CLI\n```bash\n# Run a chat completion\nllama-stack-client --endpoint http://localhost:8321 \\\ninference chat-completion \\\n--model-id meta-llama/$MODEL \\\n--message \"write a haiku for meta's llama 4 models\"\n\nChatCompletionResponse(\n    completion_message=CompletionMessage(content=\"Whispers in code born\\nLlama's gentle, wise heartbeat\\nFuture's soft unfold\", role='assistant', stop_reason='end_of_turn', tool_calls=[]),\n    logprobs=None,\n    metrics=[Metric(metric='prompt_tokens', value=21.0, unit=None), Metric(metric='completion_tokens', value=28.0, unit=None), Metric(metric='total_tokens', value=49.0, unit=None)]\n)\n```\n### Python SDK\n```python\nfrom llama_stack_client import LlamaStackClient\n\nclient = LlamaStackClient(base_url=f\"http://localhost:8321\")\n\nmodel_id = \"meta-llama/Llama-4-Scout-17B-16E-Instruct\"\nprompt = \"Write a haiku about coding\"\n\nprint(f\"User\u003e {prompt}\")\nresponse = client.inference.chat_completion(\n    model_id=model_id,\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\", \"content\": prompt},\n    ],\n)\nprint(f\"Assistant\u003e {response.completion_message.content}\")\n```\nAs more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned!\n\n\n\u003c/details\u003e\n\n### 🚀 One-Line Installer 🚀\n\nTo try Llama Stack locally, run:\n\n```bash\ncurl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | sh\n```\n\n### Overview\n\nLlama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides\n\n- **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.\n- **Plugin architecture** to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.\n- **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment.\n- **Multiple developer interfaces** like CLI and SDKs for Python, Typescript, iOS, and Android.\n- **Standalone applications** as examples for how to build production-grade AI applications with Llama Stack.\n\n\u003cdiv style=\"text-align: center;\"\u003e\n  \u003cimg\n    src=\"https://github.com/user-attachments/assets/33d9576d-95ea-468d-95e2-8fa233205a50\"\n    width=\"480\"\n    title=\"Llama Stack\"\n    alt=\"Llama Stack\"\n  /\u003e\n\u003c/div\u003e\n\n### Llama Stack Benefits\n- **Flexible Options**: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.\n- **Consistent Experience**: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.\n- **Robust Ecosystem**: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.\n\nBy reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications.\n\n### API Providers\nHere is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.\n\n| **API Provider Builder** |    **Environments**    | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** |\n|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|\n|      Meta Reference      |      Single Node       |     ✅      |       ✅       |     ✅      |     ✅      |       ✅       |\n|        SambaNova         |         Hosted         |            |       ✅       |            |            |               |\n|         Cerebras         |         Hosted         |            |       ✅       |            |            |               |\n|        Fireworks         |         Hosted         |     ✅      |       ✅       |     ✅      |            |               |\n|       AWS Bedrock        |         Hosted         |            |       ✅       |            |     ✅      |               |\n|         Together         |         Hosted         |     ✅      |       ✅       |            |     ✅      |               |\n|           Groq           |         Hosted         |            |       ✅       |            |            |               |\n|          Ollama          |      Single Node       |            |       ✅       |            |            |               |\n|           TGI            | Hosted and Single Node |            |       ✅       |            |            |               |\n|        NVIDIA NIM        | Hosted and Single Node |            |       ✅       |            |            |               |\n|          Chroma          |      Single Node       |            |               |     ✅      |            |               |\n|        PG Vector         |      Single Node       |            |               |     ✅      |            |               |\n|    PyTorch ExecuTorch    |     On-device iOS      |     ✅      |       ✅       |            |            |               |\n|           vLLM           | Hosted and Single Node |            |       ✅       |            |            |               |\n|          OpenAI          |         Hosted         |            |       ✅       |            |            |               |\n|        Anthropic         |         Hosted         |            |       ✅       |            |            |               |\n|          Gemini          |         Hosted         |            |       ✅       |            |            |               |\n|          watsonx         |         Hosted         |            |       ✅       |            |            |               |\n\n\n### Distributions\n\nA Llama Stack Distribution (or \"distro\") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:\n\n|               **Distribution**                |                                                                    **Llama Stack Docker**                                                                     |                                                 Start This Distribution                                                  |\n|:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|\n|                Meta Reference                 |           [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general)           |      [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html)      |\n|                   SambaNova                   |                     [llamastack/distribution-sambanova](https://hub.docker.com/repository/docker/llamastack/distribution-sambanova/general)                     |   [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/sambanova.html)   |\n|                   Cerebras                    |                     [llamastack/distribution-cerebras](https://hub.docker.com/repository/docker/llamastack/distribution-cerebras/general)                     |   [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/cerebras.html)   |\n|                    Ollama                     |                       [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general)                       |            [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/ollama.html)            |\n|                      TGI                      |                          [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general)                          |             [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html)              |\n|                   Together                    |                     [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general)                     |           [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html)           |\n|                   Fireworks                   |                    [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general)                    |          [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html)           |\n| vLLM |                  [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general)                  |         [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html)          |\n\n\n### Documentation\n\nPlease checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest/index.html) page for more details.\n\n* CLI references\n    * [llama (server-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html): Guide for using the `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.\n    * [llama (client-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_stack_client_cli_reference.html): Guide for using the `llama-stack-client` CLI, which allows you to query information about the distribution.\n* Getting Started\n    * [Quick guide to start a Llama Stack server](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).\n    * [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs\n    * The complete Llama Stack lesson [Colab notebook](https://colab.research.google.com/drive/1dtVmxotBsI4cGZQNsJRYPrLiDeT0Wnwt) of the new [Llama 3.2 course on Deeplearning.ai](https://learn.deeplearning.ai/courses/introducing-multimodal-llama-3-2/lesson/8/llama-stack).\n    * A [Zero-to-Hero Guide](https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide) that guide you through all the key components of llama stack with code samples.\n* [Contributing](CONTRIBUTING.md)\n    * [Adding a new API Provider](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html) to walk-through how to add a new API provider.\n\n### Llama Stack Client SDKs\n\n|  **Language** |  **Client SDK** | **Package** |\n| :----: | :----: | :----: |\n| Python |  [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python) | [![PyPI version](https://img.shields.io/pypi/v/llama_stack_client.svg)](https://pypi.org/project/llama_stack_client/)\n| Swift  | [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift) | [![Swift Package Index](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Fmeta-llama%2Fllama-stack-client-swift%2Fbadge%3Ftype%3Dswift-versions)](https://swiftpackageindex.com/meta-llama/llama-stack-client-swift)\n| Typescript   | [llama-stack-client-typescript](https://github.com/meta-llama/llama-stack-client-typescript) | [![NPM version](https://img.shields.io/npm/v/llama-stack-client.svg)](https://npmjs.org/package/llama-stack-client)\n| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | [![Maven version](https://img.shields.io/maven-central/v/com.llama.llamastack/llama-stack-client-kotlin)](https://central.sonatype.com/artifact/com.llama.llamastack/llama-stack-client-kotlin)\n\nCheck out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [typescript](https://github.com/meta-llama/llama-stack-client-typescript), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.\n\nYou can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeta-llama%2Fllama-stack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmeta-llama%2Fllama-stack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeta-llama%2Fllama-stack/lists"}