{"id":13561087,"url":"https://github.com/pytorch/serve","last_synced_at":"2025-05-13T15:03:10.530Z","repository":{"id":37018537,"uuid":"212488700","full_name":"pytorch/serve","owner":"pytorch","description":"Serve, optimize and scale PyTorch models in production","archived":false,"fork":false,"pushed_at":"2025-05-01T14:28:01.000Z","size":126140,"stargazers_count":4316,"open_issues_count":443,"forks_count":882,"subscribers_count":54,"default_branch":"master","last_synced_at":"2025-05-05T22:36:55.856Z","etag":null,"topics":["cpu","deep-learning","docker","gpu","kubernetes","machine-learning","metrics","mlops","optimization","pytorch","serving"],"latest_commit_sha":null,"homepage":"https://pytorch.org/serve/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pytorch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-10-03T03:17:43.000Z","updated_at":"2025-05-05T16:00:57.000Z","dependencies_parsed_at":"2024-05-20T22:30:47.936Z","dependency_job_id":"38a78790-e719-4545-9330-d2e0f565ca75","html_url":"https://github.com/pytorch/serve","commit_stats":{"total_commits":2861,"total_committers":224,"mean_commits":"12.772321428571429","dds":0.835721775602936,"last_synced_commit":"d6ea6e7f27b1c127cd5acc261c4e6b56ddfa5d80"},"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fserve","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fserve/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fserve/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fserve/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pytorch","download_url":"https://codeload.github.com/pytorch/serve/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253968259,"owners_count":21992253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpu","deep-learning","docker","gpu","kubernetes","machine-learning","metrics","mlops","optimization","pytorch","serving"],"created_at":"2024-08-01T13:00:52.386Z","updated_at":"2025-05-13T15:03:05.513Z","avatar_url":"https://github.com/pytorch.png","language":"Java","readme":"\u003cfont size=\"5\"\u003e ⚠️ Notice: Limited Maintenance \u003c/font\u003e\n\nThis project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.\n\n# ❗ANNOUNCEMENT: Security Changes❗\nTorchServe now enforces token authorization enabled and model API control disabled by default. These security features are intended to address the concern of unauthorized API calls and to prevent potential malicious code from being introduced to the model server. Refer the following documentation for more information: [Token Authorization](https://github.com/pytorch/serve/blob/master/docs/token_authorization_api.md), [Model API control](https://github.com/pytorch/serve/blob/master/docs/model_api_control.md)\n\n# TorchServe\n\n\n![Nightly build](https://github.com/pytorch/serve/actions/workflows/torchserve-nightly-build.yml/badge.svg)\n![Docker Nightly build](https://github.com/pytorch/serve/actions/workflows/docker-nightly-build.yml/badge.svg)\n![Benchmark Nightly](https://github.com/pytorch/serve/actions/workflows/benchmark_nightly.yml/badge.svg)\n![Docker Regression Nightly](https://github.com/pytorch/serve/actions/workflows/regression_tests_docker.yml/badge.svg)\n![KServe Regression Nightly](https://github.com/pytorch/serve/actions/workflows/kserve_cpu_tests.yml/badge.svg)\n![Kubernetes Regression Nightly](https://github.com/pytorch/serve/actions/workflows/kubernetes_tests.yml/badge.svg)\n\nTorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production.\n\nRequires python \u003e= 3.8\n\n```bash\ncurl http://127.0.0.1:8080/predictions/bert -T input.txt\n```\n### 🚀 Quick start with TorchServe\n\n```bash\n# Install dependencies\npython ./ts_scripts/install_dependencies.py\n\n# Include dependencies for accelerator support with the relevant optional flags\npython ./ts_scripts/install_dependencies.py --rocm=rocm61\npython ./ts_scripts/install_dependencies.py --cuda=cu121\n\n# Latest release\npip install torchserve torch-model-archiver torch-workflow-archiver\n\n# Nightly build\npip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly\n```\n\n### 🚀 Quick start with TorchServe (conda)\n\n```bash\n# Install dependencies\npython ./ts_scripts/install_dependencies.py\n\n# Include depeendencies for accelerator support with the relevant optional flags\npython ./ts_scripts/install_dependencies.py --rocm=rocm61\npython ./ts_scripts/install_dependencies.py --cuda=cu121\n\n# Latest release\nconda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver\n\n# Nightly build\nconda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver\n```\n\n[Getting started guide](docs/getting_started.md)\n\n### 🐳 Quick Start with Docker\n\n```bash\n# Latest release\ndocker pull pytorch/torchserve\n\n# Nightly build\ndocker pull pytorch/torchserve-nightly\n```\n\nRefer to [torchserve docker](docker/README.md) for details.\n\n### 🤖 Quick Start LLM Deployment\n\n#### VLLM Engine\n```bash\n# Make sure to install torchserve with pip or conda as described above and login with `huggingface-cli login`\npython -m ts.llm_launcher --model_id meta-llama/Llama-3.2-3B-Instruct --disable_token_auth\n\n# Try it out\ncurl -X POST -d '{\"model\":\"meta-llama/Llama-3.2-3B-Instruct\", \"prompt\":\"Hello, my name is\", \"max_tokens\": 200}' --header \"Content-Type: application/json\" \"http://localhost:8080/predictions/model/1.0/v1/completions\"\n```\n\n#### TRT-LLM Engine\n```bash\n# Make sure to install torchserve with python venv as described above and login with `huggingface-cli login`\n# pip install -U --use-deprecated=legacy-resolver -r requirements/trt_llm.txt\npython -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3.1-8B-Instruct --engine trt_llm --disable_token_auth\n\n# Try it out\ncurl -X POST -d '{\"prompt\":\"count from 1 to 9 in french \", \"max_tokens\": 100}' --header \"Content-Type: application/json\" \"http://localhost:8080/predictions/model\"\n```\n\n### 🚢 Quick Start LLM Deployment with Docker\n\n```bash\n#export token=\u003cHUGGINGFACE_HUB_TOKEN\u003e\ndocker build --pull . -f docker/Dockerfile.vllm -t ts/vllm\n\ndocker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth\n\n# Try it out\ncurl -X POST -d '{\"model\":\"meta-llama/Meta-Llama-3-8B-Instruct\", \"prompt\":\"Hello, my name is\", \"max_tokens\": 200}' --header \"Content-Type: application/json\" \"http://localhost:8080/predictions/model/1.0/v1/completions\"\n```\n\nRefer to [LLM deployment](docs/llm_deployment.md) for details and other methods.\n\n## ⚡ Why TorchServe\n* Write once, run anywhere, on-prem, on-cloud, supports inference on CPUs, GPUs, AWS Inf1/Inf2/Trn1, Google Cloud TPUs, [Nvidia MPS](docs/nvidia_mps.md)\n* [Model Management API](docs/management_api.md): multi model management with optimized worker to model allocation\n* [Inference API](docs/inference_api.md): REST and gRPC support for batched inference\n* [TorchServe Workflows](examples/Workflows/README.md): deploy complex DAGs with multiple interdependent models\n* Default way to serve PyTorch models in\n  * [Sagemaker](https://aws.amazon.com/blogs/machine-learning/serving-pytorch-models-in-production-with-the-amazon-sagemaker-native-torchserve-integration/)\n  * [Vertex AI](https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai)\n  * [Kubernetes](kubernetes) with support for [autoscaling](kubernetes#session-affinity-with-multiple-torchserve-pods), session-affinity, monitoring using Grafana works on-prem, AWS EKS, Google GKE, Azure AKS\n  * [Kserve](https://kserve.github.io/website/0.8/modelserving/v1beta1/torchserve/): Supports both v1 and v2 API, [autoscaling and canary deployments](kubernetes/kserve/README.md#autoscaling) for A/B testing\n  * [Kubeflow](https://v0-5.kubeflow.org/docs/components/pytorchserving/)\n  * [MLflow](https://github.com/mlflow/mlflow-torchserve)\n* Export your model for optimized inference. Torchscript out of the box, [PyTorch Compiler](examples/pt2/README.md) preview, [ORT and ONNX](https://github.com/pytorch/serve/blob/master/docs/performance_guide.md), [IPEX](https://github.com/pytorch/serve/tree/master/examples/intel_extension_for_pytorch), [TensorRT](https://github.com/pytorch/serve/blob/master/docs/performance_guide.md), [FasterTransformer](https://github.com/pytorch/serve/tree/master/examples/FasterTransformer_HuggingFace_Bert), FlashAttention (Better Transformers)\n* [Performance Guide](docs/performance_guide.md): builtin support to optimize, benchmark, and profile PyTorch and TorchServe performance\n* [Expressive handlers](CONTRIBUTING.md): An expressive handler architecture that makes it trivial to support inferencing for your use case with [many supported out of the box](https://github.com/pytorch/serve/tree/master/ts/torch_handler)\n* [Metrics API](docs/metrics.md): out-of-the-box support for system-level metrics with [Prometheus exports](https://github.com/pytorch/serve/tree/master/examples/custom_metrics), custom metrics,\n* [Large Model Inference Guide](docs/large_model_inference.md): With support for GenAI, LLMs including\n  * [SOTA GenAI performance](https://github.com/pytorch/serve/tree/master/examples/pt2#torchcompile-genai-examples) using `torch.compile`\n  * Fast Kernels with FlashAttention v2, continuous batching and streaming response\n  * PyTorch [Tensor Parallel](examples/large_models/tp_llama) preview, [Pipeline Parallel](examples/large_models/Huggingface_pippy)\n  * Microsoft [DeepSpeed](examples/large_models/deepspeed), [DeepSpeed-Mii](examples/large_models/deepspeed_mii)\n  * Hugging Face [Accelerate](examples/large_models/Huggingface_accelerate), [Diffusers](examples/diffusers)\n  * Running large models on AWS [Sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-tutorials-torchserve.html) and [Inferentia2](https://pytorch.org/blog/high-performance-llama/)\n  * Running [Meta Llama Chatbot locally on Mac](examples/LLM/llama)\n* Monitoring using Grafana and [Datadog](https://www.datadoghq.com/blog/ai-integrations/#model-serving-and-deployment-vertex-ai-amazon-sagemaker-torchserve)\n\n\n## 🤔 How does TorchServe work\n* [Model Server for PyTorch Documentation](docs/README.md): Full documentation\n* [TorchServe internals](docs/internals.md): How TorchServe was built\n* [Contributing guide](CONTRIBUTING.md): How to contribute to TorchServe\n\n\n## 🏆 Highlighted Examples\n* [Serving Meta Llama with TorchServe](examples/LLM/llama/README.md)\n* [Chatbot with Meta Llama on Mac 🦙💬](examples/LLM/llama/chat_app)\n* [🤗 HuggingFace Transformers](examples/Huggingface_Transformers) with a [Better Transformer Integration/ Flash Attention \u0026 Xformer Memory Efficient ](examples/Huggingface_Transformers#Speed-up-inference-with-Better-Transformer)\n* [Stable Diffusion](examples/diffusers)\n* [Model parallel inference](examples/Huggingface_Transformers#model-parallelism)\n* [MultiModal models with MMF](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition) combining text, audio and video\n* [Dual Neural Machine Translation](examples/Workflows/nmt_transformers_pipeline) for a complex workflow DAG\n* [TorchServe Integrations](examples/README.md#torchserve-integrations)\n* [TorchServe Internals](examples/README.md#torchserve-internals)\n* [TorchServe UseCases](examples/README.md#usecases)\n\nFor [more examples](examples/README.md)\n\n## 🛡️ TorchServe Security Policy\n[SECURITY.md](SECURITY.md)\n\n## 🤓 Learn More\nhttps://pytorch.org/serve\n\n\n## 🫂 Contributing\n\nWe welcome all contributions!\n\nTo learn more about how to contribute, see the contributor guide [here](https://github.com/pytorch/serve/blob/master/CONTRIBUTING.md).\n\n## 📰 News\n* [High performance Llama 2 deployments with AWS Inferentia2 using TorchServe](https://pytorch.org/blog/high-performance-llama/)\n* [Naver Case Study: Transition From High-Cost GPUs to Intel CPUs and oneAPI powered Software with performance](https://pytorch.org/blog/ml-model-server-resource-saving/)\n* [Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs](https://pytorch.org/blog/amazon-sagemaker-w-torchserve/)\n* [Deploying your Generative AI model in only four steps with Vertex AI and PyTorch](https://cloud.google.com/blog/products/ai-machine-learning/get-your-genai-model-going-in-four-easy-steps)\n* [PyTorch Model Serving on Google Cloud TPU v5](https://cloud.google.com/tpu/docs/v5e-inference#pytorch-model-inference-and-serving)\n* [Monitoring using Datadog](https://www.datadoghq.com/blog/ai-integrations/#model-serving-and-deployment-vertex-ai-amazon-sagemaker-torchserve)\n* [Torchserve Performance Tuning, Animated Drawings Case-Study](https://pytorch.org/blog/torchserve-performance-tuning/)\n* [Walmart Search: Serving Models at a Scale on TorchServe](https://medium.com/walmartglobaltech/search-model-serving-using-pytorch-and-torchserve-6caf9d1c5f4d)\n* [🎥 Scaling inference on CPU with TorchServe](https://www.youtube.com/watch?v=066_Jd6cwZg)\n* [🎥 TorchServe C++ backend](https://www.youtube.com/watch?v=OSmGGDpaesc)\n* [Grokking Intel CPU PyTorch performance from first principles: a TorchServe case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex.html)\n* [Grokking Intel CPU PyTorch performance from first principles( Part 2): a TorchServe case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex_2.html)\n* [Case Study: Amazon Ads Uses PyTorch and AWS Inferentia to Scale Models for Ads Processing](https://pytorch.org/blog/amazon-ads-case-study/)\n* [Optimize your inference jobs using dynamic batch inference with TorchServe on Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/optimize-your-inference-jobs-using-dynamic-batch-inference-with-torchserve-on-amazon-sagemaker/)\n* [Using AI to bring children's drawings to life](https://ai.meta.com/blog/using-ai-to-bring-childrens-drawings-to-life/)\n* [🎥 Model Serving in PyTorch](https://www.youtube.com/watch?v=2A17ZtycsPw)\n* [Evolution of Cresta's machine learning architecture: Migration to AWS and PyTorch](https://aws.amazon.com/blogs/machine-learning/evolution-of-crestas-machine-learning-architecture-migration-to-aws-and-pytorch/)\n* [🎥 Explain Like I’m 5: TorchServe](https://www.youtube.com/watch?v=NEdZbkfHQCk)\n* [🎥 How to Serve PyTorch Models with TorchServe](https://www.youtube.com/watch?v=XlO7iQMV3Ik)\n* [How to deploy PyTorch models on Vertex AI](https://cloud.google.com/blog/topics/developers-practitioners/pytorch-google-cloud-how-deploy-pytorch-models-vertex-ai)\n* [Quantitative Comparison of Serving Platforms](https://biano-ai.github.io/research/2021/08/16/quantitative-comparison-of-serving-platforms-for-neural-networks.html)\n* [Efficient Serverless deployment of PyTorch models on Azure](https://medium.com/pytorch/efficient-serverless-deployment-of-pytorch-models-on-azure-dc9c2b6bfee7)\n* [Deploy PyTorch models with TorchServe in Azure Machine Learning online endpoints](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/deploy-pytorch-models-with-torchserve-in-azure-machine-learning/ba-p/2466459)\n* [Dynaboard moving beyond accuracy to holistic model evaluation in NLP](https://ai.facebook.com/blog/dynaboard-moving-beyond-accuracy-to-holistic-model-evaluation-in-nlp/)\n* [A MLOps Tale about operationalising MLFlow and PyTorch](https://medium.com/mlops-community/engineering-lab-1-team-1-a-mlops-tale-about-operationalising-mlflow-and-pytorch-62193b55dc19)\n* [Operationalize, Scale and Infuse Trust in AI Models using KFServing](https://blog.kubeflow.org/release/official/2021/03/08/kfserving-0.5.html)\n* [How Wadhwani AI Uses PyTorch To Empower Cotton Farmers](https://medium.com/pytorch/how-wadhwani-ai-uses-pytorch-to-empower-cotton-farmers-14397f4c9f2b)\n* [TorchServe Streamlit Integration](https://cceyda.github.io/blog/huggingface/torchserve/streamlit/ner/2020/10/09/huggingface_streamlit_serve.html)\n* [Dynabench aims to make AI models more robust through distributed human workers](https://venturebeat.com/2020/09/24/facebooks-dynabench-aims-to-make-ai-models-more-robust-through-distributed-human-workers/)\n* [Announcing TorchServe](https://aws.amazon.com/blogs/aws/announcing-torchserve-an-open-source-model-server-for-pytorch/)\n\n## 💖 All Contributors\n\n\u003ca href=\"https://github.com/pytorch/serve/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=pytorch/serve\" /\u003e\n\u003c/a\u003e\n\nMade with [contrib.rocks](https://contrib.rocks).\n## ⚖️ Disclaimer\nThis repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the [CONTRIBUTORS](https://github.com/pytorch/serve/graphs/contributors) file. For questions directed at Meta, please send an email to opensource@fb.com. For questions directed at Amazon, please send an email to torchserve@amazon.com. For all other questions, please open up an issue in this repository [here](https://github.com/pytorch/serve/issues).\n\n*TorchServe acknowledges the [Multi Model Server (MMS)](https://github.com/awslabs/multi-model-server) project from which it was derived*\n","funding_links":[],"categories":["Java","🎯 Tool Categories","Serving","Data Science","Frameworks/Servers for Serving","Deep Learning Framework","Deployment and Serving","A01_机器学习教程","General","模型序列化和转换","人工智能","Inference \u0026 Serving","Model Serving"],"sub_categories":["🏆 Top Serving Platforms","Frameworks/Servers for Serving","ML Ops","High-Level DL APIs","Model Serving Frameworks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Fserve","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytorch%2Fserve","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Fserve/lists"}