{"id":23742886,"url":"https://github.com/alibaba/bladedisc","last_synced_at":"2025-10-08T17:20:49.703Z","repository":{"id":36956212,"uuid":"440031051","full_name":"alibaba/BladeDISC","owner":"alibaba","description":"BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.","archived":false,"fork":false,"pushed_at":"2024-12-30T16:51:44.000Z","size":22239,"stargazers_count":856,"open_issues_count":87,"forks_count":165,"subscribers_count":34,"default_branch":"main","last_synced_at":"2025-04-09T06:00:38.624Z","etag":null,"topics":["compiler","deep-learning","inference-optimization","machine-learning","mlir","neural-network","pytorch","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alibaba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-20T03:25:19.000Z","updated_at":"2025-04-04T02:07:46.000Z","dependencies_parsed_at":"2023-01-17T08:15:42.879Z","dependency_job_id":"7089da53-7d49-4fef-ad28-cfe880f4968b","html_url":"https://github.com/alibaba/BladeDISC","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2FBladeDISC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2FBladeDISC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2FBladeDISC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2FBladeDISC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alibaba","download_url":"https://codeload.github.com/alibaba/BladeDISC/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254119472,"owners_count":22017951,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compiler","deep-learning","inference-optimization","machine-learning","mlir","neural-network","pytorch","tensorflow"],"created_at":"2024-12-31T11:48:45.101Z","updated_at":"2025-10-08T17:20:44.642Z","avatar_url":"https://github.com/alibaba.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BladeDISC Introduction \u003c!-- omit in toc --\u003e\n\n## We're hiring!🔥🔥🔥\nWe're always looking for candicates to join dev team. Your're the one we're searching for long:\n* 🥷 If you are an compiler or AI enthusiasts.\n* ⭐️ or if you are experienced in optimization on CPUs and GPUs.\n* ⚙️ or if you wanna build an unified and automated compiler to optimize both inference and training workload.\n* 🤿 or if you are using BladeDISC in production or research projects, and wanna have a deeper dive into it.\n* ✄ or you wanna build cutting-edge infrastructure in the AIGC era.\n\nPlease contact us via email or Dingtalk at the bottom of page.⬇️⬇️⬇️\n\n- [What's New](#whats-new)\n- [Overview](#overview)\n  - [Features and Roadmap](#features-and-roadmap)\n    - [Frontend Framework Support Matrix](#frontend-framework-support-matrix)\n    - [Backend Support Matrix](#backend-support-matrix)\n    - [Deployment Solutions](#deployment-solutions)\n  - [Numbers of Typical Workloads](#numbers-of-typical-workloads)\n    - [Advantage in Dynamic Shape Workloads](#advantage-in-dynamic-shape-workloads)\n- [API QuickView](#api-quickview)\n  - [For TensorFlow Users](#for-tensorflow-users)\n  - [For PyTorch Users](#for-pytorch-users)\n- [Setup and Examples](#setup-and-examples)\n- [Publications](#publications)\n- [Tutorials and Documents for Developers](#tutorials-and-documents-for-developers)\n- [Presentations and Talks](#presentations-and-talks)\n- [How to Contribute](#how-to-contribute)\n- [Building Status](#building-status)\n- [FAQ](#faq)\n  - [Roadmap with mlir-hlo Project](#roadmap-with-mlir-hlo-project)\n  - [Roadmap with Torch-MLIR Project](#roadmap-with-torch-mlir-project)\n- [Contact Us](#contact-us)\n\n## What's New\n\n+ [🔥 2023.03.17] BladeDISC v0.4.0: [Massive performance and feature updates](https://github.com/alibaba/BladeDISC/releases/tag/v0.4.0)\n+ [2022.12.08] BladeDISC v0.3.0:\n [Announce PyTorch 2.0 Compilation Support](https://github.com/alibaba/BladeDISC/releases/tag/v0.3.0)\n\n## Overview\n\nBladeDISC is an end-to-end **DynamIc Shape Compiler** project for machine\nlearning workloads, which is one of the key components of Alibaba's\n[PAI-Blade](https://www.aliyun.com/activity/bigdata/blade). BladeDISC provides\ngeneral, transparent, and ease-of-use performance optimization for\nTensorFlow/PyTorch workloads on GPGPU and CPU backends. The architecture\nnatively supports dynamic shape workloads, with many considerations in the\nperformance of both static and dynamic shape scenarios. It also supports\nmultiple and flexible deployment solutions, including both Plugin Mode inside\nTensorFlow/PyTorch runtime, and Standalone Mode for AOT standalone execution.\nThe project is based on [MLIR](https://mlir.llvm.org/) and highly related to\n[mlir-hlo](https://github.com/tensorflow/mlir-hlo) project.\n\nRefer to [our website](https://alibaba.github.io/BladeDISC/) for more\ninformation, including the setup tutorial, developer guide, demo examples and\ndocuments for developers.\n\n### Features and Roadmap\n\n#### Frontend Framework Support Matrix\n\n|           | TensorFlow [1] | PyTorch [2]  |\n|---------- | -------------- | ------------ |\n| Inference |    Yes         |    Yes       |\n|  Training |    Yes [3]     |  Ongoing     |\n\n[1] TensorFlow 1.12, 1.15, 2.4 \u0026 2.5 are supported and fully verified. For other\nversions, some slight work on adaptation might be needed.\n\n[2] PyTorch version \u003e= 1.6.0 has been fully verified.\n\n[3] Although supported, there's much room for improvement on Op coverage for\ntraining workloads.\n\n#### Backend Support Matrix\n\n|            |   Status      |\n|----------- | ------------- |\n| Nvidia GPU |    Yes [1]    |\n| AMD GPU    |    Yes        |\n| Hygon DCU  |    Yes        |\n|  X86       |    Yes        |\n| AArch64    |    Yes        |\n\n[1] Support for CUDA below 11.0 has been deprecated officially since Aug 2022.\n\n#### Deployment Solutions\n\n* Plugin Mode - BladeDISC works as a plugin of TensorFlow or PyTorch. Only the\n  supported Ops are clustered and compiled, and the unsupported ones will be\n  executed by the original TensorFlow or PyTorch runtime. We recommend this mode\n  to most of the users for its transparency and ease of use.\n\n* Standalone Mode - In Standalone mode, the input workload will be compiled into\n  a binary that can be executed by itself, aka, does not rely on a TensorFlow\n  or PyTorch runtime. In this mode, all ops must be supported.\n\n### Numbers of Typical Workloads\n\nBy evaluating BladeDISC using a set of typical machine learning workloads for\nproduction purposes, BladeDISC shows up to 6.95x speedup compared with\nPyTorch. Moreover, compared to static optimizing compilers (i.e.,\nXLA and TensorRT), BladeDISC shows comparable or even better performance.\n\n\u003cfigure align=\"center\"\u003e\n\u003cimg src=\"./docs/pics/numbers.png\" style=\"width:80%\"\u003e\n\u003cfigcaption align = \"center\"\u003e\n\u003cb\u003e\nFig.1 End-to-end Performance of BladeDISC and baselines.\nNote that some baselines fail to optimize ViT model.\n\u003c/b\u003e\n\u003c/figcaption\u003e\n\u003c/figure\u003e\n\n#### Advantage in Dynamic Shape Workloads\n\nSpecifically, for the BERT large inference on T4 GPU, we provide in the\n[examples](./docs/tutorials/tensorflow_inference_and_training.md), static compiler\noptimization (XLA) shows severe performance degradation due to its compilation\noverhead, while BladeDISC shows a 1.75x speedup.\n\n| TensorFlow  |    XLA    |  BladeDISC  |\n|-------------|-----------|-------------|\n|   1.78 s    |   41.69s  |    1.02s    |\n|   1X        |           |    1.75X    |\n\n## API QuickView\n\n### For TensorFlow Users\n\nOnly two lines of code are needed on native TensorFlow program as the following:\n\n``` python\nimport numpy as np\nimport tensorflow as tf\n\n## enable BladeDISC on TensorFlow program\nimport blade_disc_tf as disc\ndisc.enable()\n\n## construct TensorFlow Graph and run it\ng = tf.Graph()\nwith g.as_default():\n    ...\n    with tf.session as sess:\n        sess.run(...)\n```\n\nFor more information, please refer to [QuickStart for TensorFlow\nUsers](./docs/quickstart.md#quickstart-for-tensorflow-users)\n\n### For PyTorch Users\n\nPyTorch users only need the following few lines of code to enable\nBladeDISC:\n\n``` python\nimport torch_blade\n# construct PyTorch Module\nclass MyModule(nn.Module):\n    ...\n\nmodule = MyModule().eval()\n\nwith torch.no_grad():\n    # blade_module is the optimized module by BladeDISC\n    blade_module = torch_blade.optimize(module, allow_tracing=True, model_inputs=(x, y))\n\n# run the optimized module\nblade_module(x, y)\n```\n\n`torch_blade.optimize` accepts an `nn.Module` object and outputs the\noptimized module.  For more information, please refer to [Quickstart\nfor PyTorch Users](./docs/quickstart.md#quickstart-for-pytorch-users).\n\n## Setup and Examples\n\n* [How to Setup and Build from Source](./docs/build_from_source.md)\n* [Use Case of TensorFlow Inference and Training](./docs/tutorials/tensorflow_inference_and_training.md)\n* [Use Case of PyTorch Inference](./docs/tutorials/torch_bert_inference.md)\n\n## Publications\n\n* Zhen Zheng, Zaifeng Pan, Dalin Wang, Kai Zhu, Wenyi Zhao, Tianyou Guo, Xiafei Qiu, Minmin Sun, Junjie Bai, Feng Zhang, Xiaoyong Du, Jidong Zhai, Wei Lin.\nBladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach. (SIGMOD'24)\n\n* Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin. \n[AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures](./docs/papers/asplos-22-zhenzheng.pdf). (ASPLOS'22)\n\n\n## Tutorials and Documents for Developers\n\n* [Tutorial: A Walkthrough of the BladeDISC Pass Pipeline](./docs/developers/pass_pipeline.md)\n* [Introduction to Runtime Abstraction Layer](./docs/developers/runtime_abstraction_layer.md)\n* [TorchBlade Overview](./docs/developers/bladedisc_torch_overview.md)\n* [Tutorial: How to Add a New Torch Operator](./docs/developers/torch_add_a_new_operator.md)\n\n## Presentations and Talks\n* [Performance optimization practice for dynamic shape AI workloads via a compiler-based approach](https://bladedisc.oss-cn-hangzhou.aliyuncs.com/docs/performance-optimization-practice.pdf)\n* [2022/07/31 BladeDISC: A Practice of Dynamic Shape Deep Learning Compiler(Chinese)](https://bladedisc.oss-cn-hangzhou.aliyuncs.com/docs/BladeDISC%EF%BC%9A%E5%8A%A8%E6%80%81Shape%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E7%BC%96%E8%AF%91%E5%99%A8%E5%AE%9E%E8%B7%B5%E7%9A%84.pdf)\n* [2022/07/07 BladeDISC and Torch-MLIR Roadmap Talk on Torch-MLIR Community](https://bladedisc.oss-cn-hangzhou.aliyuncs.com/docs/BladeDISC-and-TorchMLIR-Roadmap-tts.pptx)\n* [GTC22-S41073, Generalized and Transparent AI Optimization Solutions with AI Compilers from Cloud Service](https://bladedisc.oss-cn-hangzhou.aliyuncs.com/docs/GTC22%20S41073%2C%20Generalized%20and%20Transparent%20AI%20Optimization%20Solutions%20with%20AI%20Compilers%20from%20Cloud%20Service.pdf)\n* [GTC22-S41395, Easier-to-use and More Robust TensorRT via PAI-Blade](https://bladedisc.oss-cn-hangzhou.aliyuncs.com/docs/GTC22-S41395%2C%20Easier-to-use%20and%20More%20Robust%20TensorRT%20via%20PAI-Blade.pdf)\n* [2023/2/17 bladedisc intro. (cpu vendor oriented)](https://bladedisc.oss-cn-hangzhou.aliyuncs.com/docs/bladedisc-intro-for-intel.pdf)\n* [2023/3/10 transform dialect based codegen in bladedisc](https://bladedisc.oss-cn-hangzhou.aliyuncs.com/docs/transform-dialect-based-codegen-in-bladedisc.pdf)\n\n## How to Contribute\n\n* [Contribute to BladeDISC](./docs/contribution.md)\n\n## Building Status\n\n| Framework | Device| Status |\n| -- | -- | -- |\n| PyTorch Pre | GPU | [![pytorch_pre_gpu](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch_pre_gpu.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch_pre_gpu.yml) |\n| PyTorch Pre | CPU | [![pytorch_pre_cpu](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch_pre_cpu.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch_pre_cpu.yml) |\n| PyTorch2.0.0 | GPU | [![pytorch200_gpu](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch200_gpu.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch200_gpu.yml) |\n| PyTorch2.0.0 | CPU | [![pytorch200_cpu](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch200_cpu.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch200_cpu.yml) |\n| PyTorch2.0.0 | Yitian | [![pytorch200_yitian](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch200_yitian.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch200_yitian.yml) |\n| PyTorch1.13.0 | GPU | [![pytorch113_gpu](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch113_gpu.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch113_gpu.yml) |\n| PyTorch1.13.0 | CPU | [![pytorch113_cpu](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch113_cpu.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch113_cpu.yml) |\n| PyTorch1.13.0 | Yitian | [![pytorch113_yitian](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch113_yitian.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/pytorch113_yitian.yml) |\n| TensorFlow2.5 | GPU | [![tf250_gpu](https://github.com/alibaba/BladeDISC/actions/workflows/tf250_gpu.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/tf250_gpu.yml) |\n| TensorFlow2.5 | CPU | [![tf250_cpu](https://github.com/alibaba/BladeDISC/actions/workflows/tf250_cpu.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/tf250_cpu.yml) |\n| TensorFlow2.8 | Yitian | [![tf280_yitian](https://github.com/alibaba/BladeDISC/actions/workflows/tf280_yitian.yml/badge.svg?branch=main)](https://github.com/alibaba/BladeDISC/actions/workflows/tf280_yitian.yml) |\n\n## FAQ\n\n### Roadmap with mlir-hlo Project\n\nBladeDISC is in a close relationship with\n[mlir-hlo](https://github.com/tensorflow/mlir-hlo) project. Part of the building\nblocks, including the MHLO Op definitions, TF to MHLO conversions, and some\ngeneral purpose passes have been upstreamed to mlir-hlo repository. We'll\ncontinue to work in a close cooperative relationship with mlir-hlo project in\nthe longer term.\n\n### Roadmap with Torch-MLIR Project\n\nBladeDISC compiles PyTorch workloads based on [Torch-MLIR](https://github.com/llvm/torch-mlir/).\nThe BladeDISC Dev Team is cooperating with the community to add Torch-To-Mhlo conversion\nto Torch-MLIR, especially fully dynamic shape features.\nSee RFC: https://github.com/llvm/torch-mlir/issues/999.\nWe appeal to the community developers interested in joining.\n\n## Contact Us\n\n* Mailgroup: bladedisc-dev@list.alibaba-inc.com\n\n* DingTalk group for support and discussion:\n\n![DingTalk](./docs/pics/dingtalk_support.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Fbladedisc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falibaba%2Fbladedisc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Fbladedisc/lists"}