{"id":13633244,"url":"https://github.com/stochasticai/x-stable-diffusion","last_synced_at":"2025-04-04T18:09:59.435Z","repository":{"id":61584173,"uuid":"548876576","full_name":"stochasticai/x-stable-diffusion","owner":"stochasticai","description":"Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6","archived":false,"fork":false,"pushed_at":"2023-12-04T17:42:17.000Z","size":27190,"stargazers_count":556,"open_issues_count":22,"forks_count":35,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-03-28T17:11:16.627Z","etag":null,"topics":["aitemplate","automl","cuda","docker","inference","notebook","nvfuser","onnx","onnxruntime","pytorch","stable-diffusion","tensorrt"],"latest_commit_sha":null,"homepage":"https://stochastic.ai","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stochasticai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-10-10T10:20:32.000Z","updated_at":"2025-03-15T17:31:27.000Z","dependencies_parsed_at":"2023-12-04T18:53:08.964Z","dependency_job_id":null,"html_url":"https://github.com/stochasticai/x-stable-diffusion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stochasticai%2Fx-stable-diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stochasticai%2Fx-stable-diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stochasticai%2Fx-stable-diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stochasticai%2Fx-stable-diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stochasticai","download_url":"https://codeload.github.com/stochasticai/x-stable-diffusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247226215,"owners_count":20904465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aitemplate","automl","cuda","docker","inference","notebook","nvfuser","onnx","onnxruntime","pytorch","stable-diffusion","tensorrt"],"created_at":"2024-08-01T23:00:31.639Z","updated_at":"2025-04-04T18:09:59.416Z","avatar_url":"https://github.com/stochasticai.png","language":"Jupyter Notebook","funding_links":[],"categories":["Serving","Large Model Serving"],"sub_categories":["Large Model Serving"],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\".github/stochastic_logo_light.svg#gh-light-mode-only\" width=\"250\" alt=\"Stochastic.ai\"/\u003e\n  \u003cimg src=\".github/stochastic_logo_dark.svg#gh-dark-mode-only\" width=\"250\" alt=\"Stochastic.ai\"/\u003e\n\u003c/p\u003e\n\n\u003cbr\u003e\n\n\u003c!-- # ⚡️ Real-time inference for Stable Diffusion --\u003e\n\u003c!-- ![stochasticai_demo](.github/stochasticai_demo.gif) --\u003e\n\n\u003c!-- \u003cp align=\"center\"\u003e\n \u003cimg src=\".github/stochasticai_demo.gif\" width=\"600\" alt=\"Stochastic.ai Demo\"/\u003e\n\u003c/p\u003e --\u003e\n\nWelcome to `x-stable-diffusion` by Stochastic!\n\nThis project is a compilation of acceleration techniques for the Stable Diffusion model to help you generate images faster and more efficiently, saving you both time and money.\n\nWith example images and a comprehensive benchmark, you can easily choose the best technique for your needs. When you're ready to deploy, our CLI called `stochasticx` makes it easy to get started on your local machine. Try `x-stable-diffusion` and see the difference it can make for your image generation performance and cost savings.\n\n\u003c!-- TOC --\u003e\n\u003c!-- Table of contents:\n- [Installation](#-installation)\n    - [Quickstart](#quickstart)\n    - [How to get less than 1s latency?](#how-to-get-less-than-1s-latency)\n    - [Manual](#manual-deployment)\n- [Optimizations](#-optimizations)\n- [Benchmarks](#benchmarks)\n  - [Setup](#setup)\n  - [Online results](#online-results)\n      - [A100 GPU](#a100-gpu)\n      - [T4 GPU](#t4-gpu)\n  - [Batched results](#batched-results)\n  - [Sample images generated](#sample-images-generated)\n- [Colab Notebooks](#how-to-run-with-google-colab)\n- [References](#references) --\u003e\n\u003c!-- /TOC --\u003e\n\n## 🚀 Installation\n\n### Quickstart\n\nMake sure you have [Python](https://www.python.org/downloads/) and [Docker](https://docs.docker.com/engine/install/) installed on your system\n\n1. Install the latest version of `stochasticx` library.\n```\npip install stochasticx\n```\n\n2. Deploy the Stable Diffusion model\n```\nstochasticx stable-diffusion deploy --type aitemplate\n```\n\n\u003c!-- If you don't have a Stochastic account, then the CLI will prompt you to quickly create one. It is free and just takes 1 minute [Sign up →](https://app.stochastic.ai/signup) --\u003e\n\n\u003e Alternatively, you can deploy stable diffusion without our CLI by checking the steps [here](#manual-deployment).\n\n\n3. To perform inference with this deployed model:\n```\nstochasticx stable-diffusion inference --prompt \"Riding a horse\"\n```\n Check all the options of the `inference` command:\n```\nstochasticx stable-diffusion inference --help\n```\n\n\n4. You can get the logs of the deployment executing the following command:\n```\nstochasticx stable-diffusion logs\n```\n\n5. Stop and remove the deployment with this command:\n```\nstochasticx stable-diffusion stop\n```\n\n### How to get less than 1s latency?\n\nChange the `num_inference_steps` to `30`. With this, you can get an image generated in 0.88 seconds. \n\n```javascript\n{\n  'max_seq_length': 64,\n  'num_inference_steps': 30, \n  'image_size': (512, 512) \n}\n```\n\nYou can also experiment with reducing the `image_size`.\n\n## How to run on Google Colab?\n\n- [Try PyTorch - FP16 in Colab -\u003e](https://colab.research.google.com/drive/1m3n2n5bfNpRgWJ8K-xwTvhzrTQETWduq?usp=sharing)\n- [Try TensorRT in Colab -\u003e](https://colab.research.google.com/drive/1WQ98YBHTG355vL5wKbmNj9xeBmHRZGJb?usp=sharing)\n\nIn each folder, we will provide a Google Colab notebook with which you can test the full flow and inference on a T4 GPU\n\n### Manual deployment\n\nCheck the `README.md` of the following directories:\n- [AITemplate](./AITemplate/README.md)\n- [FlashAttention](./FlashAttention/README.md)\n- [nvFuser](./nvFuser/README.md)\n- [PyTorch](./PyTorch/README.md)\n- [TensorRT](./TensorRT/README.md) \n\n## 🔥 Optimizations\n\n- AITemplate: [Latest optimization framework of Meta](https://github.com/facebookincubator/AITemplate)\n- TensorRT: [NVIDIA TensorRT framework](https://github.com/NVIDIA/TensorRT)\n- nvFuser: [nvFuser with Pytorch](https://pytorch.org/blog/introducing-nvfuser-a-deep-learning-compiler-for-pytorch/)\n- FlashAttention: [FlashAttention intergration in Xformers](https://github.com/facebookresearch/xformers)\n\n\n\n## Benchmarks\n\n### Setup\n\nFor hardware, we used 1x40GB A100 GPU with CUDA 11.6 and the results are reported by averaging 50 runs.\n\nThe following arguments were used for image generation for all the benchmarks:\n\n```javascript\n{\n  'max_seq_length': 64,\n  'num_inference_steps': 50, \n  'image_size': (512, 512) \n}\n```\n\n### Online results\nFor `batch_size` 1, these are the latency results:\n\n#### A100 GPU\n\n![A100_GPU_graph](./graphs/A100_GPU_latency.png)\n\n| project                | Latency (s) | GPU VRAM (GB) |\n| :--------------------- | :---------- | :------------ |\n| PyTorch           fp16 |  5.77       |  10.3         |\n| nvFuser           fp16 |  3.15       |  ---          |\n| FlashAttention    fp16 |  2.80       |  7.5          |\n| TensorRT          fp16 |  1.68       |  8.1          |\n| AITemplate        fp16 |  1.38       |  4.83         |\n| ONNX (CUDA)            |  7.26       |  13.3         |\n\n\n#### T4 GPU\n\u003e Note: AITemplate might not support T4 GPU yet. [Check support here](https://github.com/facebookincubator/AITemplate#installation)\n\n![T4_GPU_graph](./graphs/T4_GPU_latency.png)\n\n| project                | Latency (s) |\n| :--------------------- | :---------- |\n| PyTorch           fp16 |  16.2       |\n| nvFuser           fp16 |  19.3       |\n| FlashAttention    fp16 |  13.7       |\n| TensorRT          fp16 |  9.3        |\n\n### Batched results - A100 GPU\n\nThe following results were obtained by varying `batch_size` from 1 to 24.\n\n![A100_GPU_batch_size](./graphs/A100_GPU_batch.png)\n\n| project           \\ bs |      1        |     4         |    8          |    16             |   24              | \n| :--------------------- | :------------ | :------------ | :------------ | :---------------- | :---------------- |\n| Pytorch           fp16 | 5.77s/10.3GB  | 19.2s/18.5GB  | 36s/26.7GB    |  OOM              |                   |\n| FlashAttention    fp16 | 2.80s/7.5GB   |  9.1s/17GB    | 17.7s/29.5GB  |  OOM              |                   |\n| TensorRT          fp16 | 1.68s/8.1GB   |  OOM          |               |                   |                   |\n| AITemplate        fp16 | 1.38s/4.83GB  | 4.25s/8.5GB   | 7.4s/14.5GB   |  15.7s/25GB       |  23.4s/36GB       |\n| ONNX (CUDA)            | 7.26s/13.3GB  | OOM           | OOM           |  OOM              |  OOM              |\n\n\u003e Note: TensorRT fails to convert UNet model from ONNX to TensorRT due to memory issues.\n\n### Sample images generated\n\n[Click here to view the complete list of generated images](./generated_images/README.md)\n\n| Optimization \\ Prompt | Super Mario learning to fly in an airport, Painting by Leonardo Da Vinci | The Easter bunny riding a motorcycle in New York City | Drone flythrough of a tropical jungle convered in snow\n| --- | --- | --- | ---\n| PyTorch           fp16 |  ![pytorch_stable-diffusion_mario](./generated_images/PyTorch/0.png)      |  ![pytorch_stable-diffusion_bunny](./generated_images/PyTorch/1.png)         | ![pytorch_stable-diffusion_bunny](./generated_images/PyTorch/9.png) |\n| nvFuser           fp16 | ![nvFuser_stable-diffusion_mario](./generated_images/nvFuser/0.png)      |  ![nvFuser_stable-diffusion_bunny](./generated_images/nvFuser/1.png)         | ![nvFuser_stable-diffusion_bunny](./generated_images/nvFuser/9.png) |\n| FlashAttention    fp16 |  ![FlashAttention_stable-diffusion_mario](./generated_images/FlashAttention/0.png)      |  ![FlashAttention_stable-diffusion_bunny](./generated_images/FlashAttention/1.png)         | ![FlashAttention_stable-diffusion_bunny](./generated_images/FlashAttention/9.png) |\n| TensorRT          fp16 |  ![TensorRT_stable-diffusion_mario](./generated_images/TensorRT/0.png)      |  ![TensorRT_stable-diffusion_bunny](./generated_images/TensorRT/1.png)         | ![TensorRT_stable-diffusion_bunny](./generated_images/TensorRT/9.png) |\n| AITemplate        fp16 |  ![AITemplate_stable-diffusion_mario](./generated_images/AITemplate/0.png)      |  ![AITemplate_stable-diffusion_bunny](./generated_images/AITemplate/1.png)         | ![AITemplate_stable-diffusion_bunny](./generated_images/AITemplate/9.png) |\n\n## References\n\n- [HuggingFace Diffusers](https://github.com/huggingface/diffusers)\n- [AITemplate](https://github.com/facebookincubator/AITemplate)\n\n## 🌎 Join our community\n\n- Discord - https://discord.gg/TgHXuSJEk6\n\n## 🌎 Contributing\nAs an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our [contributing guide](CONTRIBUTING.md) to learn how you can get involved.\n\n\u003c!-- ## Team and contributors\n\n`x-stable-diffusion` is a community-driven project with several AI systems engineers and researchers contributing to it. \n\nIt is currently maintained by: [Toan Do](https://github.com/Toan-Do), [Marcos Rivera](https://github.com/MarcosRiveraMartinez), [Sarthak Langde](https://github.com/sarthaklangde), [Subhash GN](https://github.com/subhash-stc), [Riccardo Romagnoli](https://github.com/RiccardoRomagnoli), [Roman Ageev](https://github.com/StochasticRomanAgeev) and [Glenn Ko](https://github.com/glennko) --\u003e\n\n\n\u003c!-- ## ✅ Stochastic\n\nStochastic was founded with a vision to make deep learning optimization and deployment effortless. With our cloud platform, you can easily optimize and deploy your deep learning models with confidence, knowing that you are getting the best performance possible. Our platform automatically optimizes your models, benchmarking them on various evaluation metrics to ensure they are running at their peak.\n\nAnd when it comes time to deploy, Stochastic has you covered with auto-scaling accelerated inference for models like BLOOM 176B, Stable Diffusion, and GPT-J. Plus, our platform is cloud agnostic, supporting AWS, GCP, Azure, and Kubernetes clusters\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\".github/stochastic_x_dashboard.jpeg\" width=\"600\" alt=\"Stochastic X Dashboard\"/\u003e\n\u003c/p\u003e --\u003e\n\n\u003c!-- For fully-managed solution hosted on Stochastic [Sign up →](https://app.stochastic.ai/signup) --\u003e\n\u003c/br \u003e\nFor managed hosting on our cloud or on your private cloud [Contact us →](https://stochastic.ai/contact)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstochasticai%2Fx-stable-diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstochasticai%2Fx-stable-diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstochasticai%2Fx-stable-diffusion/lists"}