{"id":20058693,"url":"https://github.com/roboflow/deploy-models-with-grpc-pytorch-asyncio","last_synced_at":"2025-07-20T16:34:40.173Z","repository":{"id":87995436,"uuid":"567697157","full_name":"roboflow/deploy-models-with-grpc-pytorch-asyncio","owner":"roboflow","description":"Article about deploying machine learning models using grpc, pytorch and asyncio","archived":false,"fork":false,"pushed_at":"2022-11-18T16:12:01.000Z","size":209,"stargazers_count":29,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-10T06:10:06.955Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roboflow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-18T11:12:23.000Z","updated_at":"2025-07-04T18:05:53.000Z","dependencies_parsed_at":"2023-05-22T07:15:13.665Z","dependency_job_id":null,"html_url":"https://github.com/roboflow/deploy-models-with-grpc-pytorch-asyncio","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/roboflow/deploy-models-with-grpc-pytorch-asyncio","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fdeploy-models-with-grpc-pytorch-asyncio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fdeploy-models-with-grpc-pytorch-asyncio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fdeploy-models-with-grpc-pytorch-asyncio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fdeploy-models-with-grpc-pytorch-asyncio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roboflow","download_url":"https://codeload.github.com/roboflow/deploy-models-with-grpc-pytorch-asyncio/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roboflow%2Fdeploy-models-with-grpc-pytorch-asyncio/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266160866,"owners_count":23885886,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T13:03:13.363Z","updated_at":"2025-07-20T16:34:40.118Z","avatar_url":"https://github.com/roboflow.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deploying Machine Learning Models with PyTorch, gRPC and asyncio\n\n![alt](header.png)\n\nToday we're going to see how to deploy a machine-learning model behind gRPC service running via asyncio. gRPC promises to be faster, more scalable and more optimized than HTTP v1. [This is a good article](https://www.altexsoft.com/blog/what-is-grpc/) about gRPC pros and cons, feel free to have a look before. gRPC is supported in all major programming languages and will create types hints, client and server code for you, making it easier to incorporate a new service in your stack.\n\nWe will use [PyTorch](https://pytorch.org/get-started/locally/) to create an image classifier and perform inference using gRPC calls.\n\nThis article is also hosted on [GitHub](https://github.com/FrancescoSaverioZuppichini/deploy-models-with-grpc-pytorch-asyncio)\n\n## What's gRPC\n\nWhat's [gRPC](https://grpc.io/)? GRPC is a framework for implementing Remote Procedure Call (RPC) via HTTP/2 that runs on any device. It's developed and maintained mainly by Google and it's widely used in the industry. It allows two machines to communicate, similar to HTTP but with better syntax and performance. It's used to define microservices that may use different programming languages.\n\nIt works by defining the fields of the messages the client and server will exchange and the signature of the function we will expose, with a special syntax in a `.proto` file, then gRPC generates both client and server code and you can call the function directly from the client.\n\ngRPC services send and receive data as Protocol Buffer (Protobuf) messages, they can be better compressed than human-readable format (like JSON or XML), thus the better performance.\n\n## Getting Started\n\nLet's start by setup our environment using virtual env\n\n**Tested with python 3.9**\n\n```\npython -m venv .venv\n```\n\nThen, let's install all the required packages, `grpcio`, `grpcio-tools`, `torch`, `torchvision` and `Pillow`\n\n```\npip install grpcio grpcio-tools torch torchvision Pillow==9.3.0\n```\n\nAll set!\n\nWe will work on 4 files,\n\n```\n.\n└── src\n    ├── client.py\n    ├── inference.proto\n    ├── inference.py\n    └── server.py\n```\n\n- `client.py` holds the client code we will use to send inference requests\n- `server.py` holds the server code responsible for receiving the inference request and sending a reply\n- `inference.py` holds the actual model and inference logic\n- `inference.proto` holds the protocol buffer messages definition\n\nLet's start by coding our model inside `inference.py`\n\n\n## Inference\n\nWe will use `resnet34` from `torchvision`. First thing, we define our preprocessing transformation\n\n```python\n# inference.py\nimport torchvision.transforms as T\n\npreprocess = T.Compose(\n    [\n        T.Resize((224, 224)),\n        T.ToTensor(),\n        T.Normalize(\n            mean=[0.485, 0.456, 0.406],\n            std=[0.229, 0.224, 0.225],\n        ),\n    ]\n)\n\n\nif __name__ == \"__main__\":\n    from PIL import Image\n    image = Image.open('./examples/cat.jpg')\n    tensor = preprocess(image)\n    print(tensor.shape)\n```\n\nSweet, now the model\n\n```python\n# inference.py\nfrom typing import List\n\nimport torch\nimport torchvision.transforms as T\nfrom PIL import Image\nfrom torchvision.models import ResNet34_Weights, resnet34\n\npreprocess = ...\nmodel = resnet34(weights=ResNet34_Weights.IMAGENET1K_V1).eval()\n\n\n@torch.no_grad()\ndef inference(images: List[Image.Image]) -\u003e List[int]:\n    batch = torch.stack([preprocess(image) for image in images])\n    logits = model(batch)\n    preds = logits.argmax(dim=1).tolist()\n    return preds\n\n\nif __name__ == \"__main__\":\n    image = Image.open(\"./examples/cat.jpg\")\n    print(inference([image]))\n\n```\n\nThe model will output `262`, which is the right class for our `cat`. Our `inference` function takes a list of `Pil` images and creates a batch, then it collects the right classes and converts them to a list of class ids.\n\nNice, we have our model setup.\n\n## Server\n\nThe next step is to create the actual gRPC server. First, we describe the message and the service in the `.proto` file. \n\nA list of all types of messages can be found [here](https://learn.microsoft.com/en-us/dotnet/architecture/grpc-for-wcf-developers/protocol-buffers) and the official python tutorial for gRPC [here](https://grpc.io/docs/languages/python/basics/)\n\n### Proto\n\nWe will start by defining our `InferenceServer` service\n\n```proto\n// inference.proto\n\nsyntax = \"proto3\";\n\n// The inference service definition.\nservice InferenceServer {\n  // Sends a inference reply\n  rpc inference (InferenceRequest) returns (InferenceReply) {}\n}\n\n```\n\nThis tells gRPC we have an `InferenceServer` service with an `inference` function, notice that we need to specify the type of the messages: `InferenceRequest` and `InferenceReply`\n\n```proto\n// inference.proto\n...\n// The request message containing the images.\nmessage InferenceRequest {\n    repeated bytes image = 1;\n}\n\n// The response message containing the classes ids\nmessage InferenceReply {\n    repeated uint32 pred = 1;\n}\n```\n\nOur request will send a list of bytes (images), the `repeated` keyword is used to define lists, and we will send back a list of predictions\n\n### Build the server and client\n\nNow, we need to generate the client and server code using `grpcio-tools` (we install it at the beginning). \n\n```bash\ncd src \u0026\u0026 python -m grpc_tools.protoc -I . --python_out=. --pyi_out=. --grpc_python_out=. inference.proto \n```\n\nThis will generate the following files\n\n```\n└── src\n    ├── inference_pb2_grpc.py\n    ├── inference_pb2.py\n    ├── inference_pb2.pyi\n    ...\n```\n\n- `inference_pb2_grpc` contains our gRPC's server definition\n- `inference_pb2` contains our gRPC's messages definition\n- `inference_pb2` contains our gRPC's messages types definition\n\nWe now have to code our service, \n\n```python\n# server.py\n# we will use asyncio to run our service\nimport asyncio \n...\n# from the generated grpc server definition, import the required stuff\nfrom inference_pb2_grpc import InferenceServer, add_InferenceServerServicer_to_server\n# import the requests and reply types\nfrom inference_pb2 import InferenceRequest, InferenceReply\n...\n```\n\nTo create the gRPC server we need to import `InferenceServer` and `add_InferenceServerServicer_to_server` from the generated `inference_pb2_grpc`. Our logic will go inside a subclass of `InferenceServer` in the `inference` function, the one we defined in the `.proto` file.\n\n```python\n# server.py\nclass InferenceService(InferenceServer):\n    def open_image(self, image: bytes) -\u003e Image.Image:\n        image = Image.open(BytesIO(image))\n        return image\n\n    async def inference(self, request: InferenceRequest, context) -\u003e InferenceReply:\n        logging.info(f\"[🦾] Received request\")\n        start = perf_counter()\n        images = list(map(self.open_image, request.image))\n        preds = inference(images)\n        logging.info(f\"[✅] Done in {(perf_counter() - start) * 1000:.2f}ms\")\n        return InferenceReply(pred=preds)\n```\n\nNotice we subclass `InferenceServer`, we add our logic inside `inference` and we label it as an `async` function, this is because we will lunch our service using [asyncio](https://docs.python.org/3/library/asyncio.html). \n\nWe now need to tell gRPC how to start our service.\n\n```python\n# server.py\n...\nfrom inference_pb2_grpc import InferenceServer, add_InferenceServerServicer_to_server\nimport logging\n\nlogging.basicConfig(level=logging.INFO)\n\nasync def serve():\n    server = grpc.aio.server()\n    add_InferenceServerServicer_to_server(InferenceService(), server)\n    # using ip v6\n    adddress = \"[::]:50052\"\n    server.add_insecure_port(adddress)\n    logging.info(f\"[📡] Starting server on {adddress}\")\n    await server.start()\n    await server.wait_for_termination()\n```\n\nLine by line, we create a grpc asyncio server using `grpc.aio.server()`, we add our service by passing it to `add_InferenceServerServicer_to_server` then we listed on a custom port using ipv6 by calling the `.add_insecure_port` method and finally we await the `.start` server method\n\nFinally, \n\n```python\n# server.py\nif __name__ == \"__main__\":\n    asyncio.run(serve())\n```\n\nIf you know run the file\n\n```bash\npython src/server.py\n```\n\nYou'll see\n\n```\nINFO:root:[📡] Starting server on [::]:50052\n```\n\nThe full server looks like\n\n```python\nimport asyncio\nfrom time import perf_counter\n\nimport grpc\nfrom PIL import Image\nfrom io import BytesIO\nfrom inference import inference\nimport logging\nfrom inference_pb2_grpc import InferenceServer, add_InferenceServerServicer_to_server\nfrom inference_pb2 import InferenceRequest, InferenceReply\n\nlogging.basicConfig(level=logging.INFO)\n\n\nclass InferenceService(InferenceServer):\n    def open_image(self, image: bytes) -\u003e Image.Image:\n        image = Image.open(BytesIO(image))\n        return image\n\n    async def inference(self, request: InferenceRequest, context) -\u003e InferenceReply:\n        logging.info(f\"[🦾] Received request\")\n        start = perf_counter()\n        images = list(map(self.open_image, request.image))\n        preds = inference(images)\n        logging.info(f\"[✅] Done in {(perf_counter() - start) * 1000:.2f}ms\")\n        return InferenceReply(pred=preds)\n\n\nasync def serve():\n    server = grpc.aio.server()\n    add_InferenceServerServicer_to_server(InferenceService(), server)\n    # using ip v6\n    adddress = \"[::]:50052\"\n    server.add_insecure_port(adddress)\n    logging.info(f\"[📡] Starting server on {adddress}\")\n    await server.start()\n    await server.wait_for_termination()\n\n\nif __name__ == \"__main__\":\n    asyncio.run(serve())\n```\n\n\nSweet 🎉! We have our gRPC running with asyncio. We now need to define our **client**.\n\n## Client\n\nCreating a client is straightforward, similar to before we need the definitions that were generated in the previous step.\n\n```python\n# client.py\n\nimport asyncio\n\nimport grpc\n\nfrom inference_pb2 import InferenceRequest, InferenceReply\nfrom inference_pb2_grpc import InferenceServerStub\n```\n\n`InferenceServerStub` is the gRPC communication point. Let's create our `async` function to send `InferenceRequest` and collect `InferenceReply`\n\n```python\n...\nimport logging\n\nlogging.basicConfig(level=logging.INFO)\n\nasync def main():\n    async with grpc.aio.insecure_channel(\"[::]:50052 \") as channel:\n        stub = InferenceServerStub(channel)\n        start = perf_counter()\n\n        res: InferenceReply = await stub.inference(\n            InferenceRequest(image=[image_bytes])\n        )\n        logging.info(\n            f\"[✅] pred = {pformat(res.pred)} in {(perf_counter() - start) * 1000:.2f}ms\"\n        )\n```\n\nWe define our channel using `grpc.aio.insecure_channel` context manager, we create an instance of `InferenceServerStub` and we `await` the `.inference` method. The `.inference` method takes `InferenceRequest` instance containing our images in `bytes`. We receive back an `InferenceReply` instance and we print the predictions.\n\nTo get the bytes from an image, we can use `Pillow` and `BytesIO`\n\n```python\nfrom io import BytesIO\nfrom PIL import Image\n\n# client.py\n\nimage = Image.open(\"./examples/cat.jpg\")\nbuffered = BytesIO()\nimage.save(buffered, format=\"JPEG\")\nimage_bytes = buffered.getvalue()\n```\n\nThe full client code looks like\n\n```python\nimport asyncio\nfrom io import BytesIO\n\nimport grpc\nfrom PIL import Image\n\nfrom inference_pb2 import InferenceRequest, InferenceReply\nfrom inference_pb2_grpc import InferenceServerStub\nimport logging\nfrom pprint import pformat\nfrom time import perf_counter\n\nimage = Image.open(\"./examples/cat.jpg\")\nbuffered = BytesIO()\nimage.save(buffered, format=\"JPEG\")\nimage_bytes = buffered.getvalue()\n\nlogging.basicConfig(level=logging.INFO)\n\n\nasync def main():\n    async with grpc.aio.insecure_channel(\"[::]:50052 \") as channel:\n        stub = InferenceServerStub(channel)\n        start = perf_counter()\n\n        res: InferenceReply = await stub.inference(\n            InferenceRequest(image=[image_bytes])\n        )\n        logging.info(\n            f\"[✅] pred = {pformat(res.pred)} in {(perf_counter() - start) * 1000:.2f}ms\"\n        )\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\nlet's run it!\n\n```bash\npython src/client.py\n```\n\nIt results in the following output in the client\n\n```\n// client\nINFO:root:[✅] pred = [282] in 86.39ms\n```\n\nand on the server\n\n```\n// server\nINFO:root:[🦾] Received request\nINFO:root:[✅] Done in 84.03ms\n```\n\nNice!!! We can also pass multiple images, \n\n```python\n# client.py\n...\n        res: InferenceReply = await stub.inference(\n                    InferenceRequest(image=[image_bytes, image_bytes, image_bytes])\n                )\n```\n\nWe just copied and pasted `[image_bytes, image_bytes, image_bytes]` to send 3 images\n\nIf we run it,\n\n```bash\npython src/client.py\n```\n\nWe get\n\n```\nINFO:root:[✅] pred = [282, 282, 282] in 208.39ms\n```\n\nyes, 3 predictions on the same gRPC call! 🚀🚀🚀\n\n## Conclusion\n\nToday we have seen how to deploy a machine learning model using PyTorch, gRPC and asyncio. A scalable, effective and performant to make your model accessible. There are many gRPC features we didn't touch like [streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#server-streaming-rpc). \n\nI hope it helps!\n\nSee you in the next one,\n\nFrancesco","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froboflow%2Fdeploy-models-with-grpc-pytorch-asyncio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froboflow%2Fdeploy-models-with-grpc-pytorch-asyncio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froboflow%2Fdeploy-models-with-grpc-pytorch-asyncio/lists"}