Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
https://github.com/merrymercy/awesome-tensor-compilers
Last synced: about 23 hours ago
JSON representation
-
Papers
-
Sparse
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- The Sparse Abstract Machine
- SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
- Looplets: A Language For Structured Coiteration
- Code Synthesis for Sparse Tensor Format Conversion and Optimization
- Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture
- SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute
- Compiler Support for Sparse Tensor Computations in MLIR
- A High Performance Sparse Tensor Algebra Compiler in MLIR - HPC 2021
- Dynamic Sparse Tensor Algebra Compilation
- TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning
- The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code
- ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism
- A Framework for Sparse Matrix Code Synthesis from High-level Specifications
- Automatic Nonzero Structure Analysis
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- Automatic Data Structure Selection and Transformation for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- The Tensor Algebra Compiler
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- Unified Compilation for Lossless Compression and Sparse Computing
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- WACO: Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- Format Abstraction for Sparse Tensor Algebra Compilers
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- Compilation of Sparse Array Programming Models
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
- Next-generation Generic Programming and its Application to Sparse Matrix Computations
- SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations
-
Cost Model
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- TLP: A Deep Learning-based Cost Model for Tensor Program Tuning
- An Asymptotic Cost Model for Autoscheduling Sparse Tensor Programs
- TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers
- A Deep Learning Based Cost Model for Automatic Code Optimization
- A Learned Performance Model for the Tensor Processing Unit
- DYNATUNE: Dynamic Tensor Program Optimization in Deep Neural Network Compilation
- MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
- Expedited Tensor Program Compilation Based on LightGBM
-
Compiler and IR Design
- Composable and Modular Code Generation in MLIR: A Structured and Retargetable Approach to Tensor Compiler Construction
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
- MLIR: Scaling Compiler Infrastructure for Domain Specific Computation
- A Tensor Compiler for Unified Machine Learning Prediction Serving
- Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
- Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures - Nun et al., SC 2019
- Tiramisu: A polyhedral compiler for expressing fast and portable code
- Relay: A High-Level Compiler for Deep Learning
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
- Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
- Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning
- Glow: Graph Lowering Compiler Techniques for Neural Networks
- DLVM: A modern compiler infrastructure for deep learning systems
- Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs
- TensorIR: An Abstraction for Automatic Tensorized Program Optimization
- DaCeML: A Data-Centric Compiler for Machine Learning
- Roller: Fast and Efficient Tensor Compilation for Deep Learning
- BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach
- Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines - Kelley et al., PLDI 2013
- Composable and Modular Code Generation in MLIR: A Structured and Retargetable Approach to Tensor Compiler Construction
- Exocompilation for Productive Programming of Hardware Accelerators
- TASO: The Tensor Algebra SuperOptimizer for Deep Learning
-
Auto-tuning and Auto-scheduling
- Tensor Program Optimization with Probabilistic Programs
- Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
- Value Learning for Throughput Optimization of Deep Neural Networks
- A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers
- Ansor: Generating High-Performance Tensor Programs for Deep Learning
- ProTuner: Tuning Programs with Monte Carlo Tree Search - Ali et al., arXiv 2020
- AdaTune: Adaptive tensor program compilation made efficient
- Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
- A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra
- Learning to Optimize Halide with Tree Search and Random Programs
- Learning to Optimize Tensor Programs
- Automatically Scheduling Halide Image Processing Pipelines
- Autoscheduling for sparse tensor algebra with an asymptotic cost model
- One-shot tuner for deep learning compilers
- Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning
-
CPU and GPU Optimization
- DeepCuts: A deep learning optimization framework for versatile GPU workloads
- UNIT: Unifying Tensorized Instruction Compilation
- PolyDL: Polyhedral Optimizations for Creation of HighPerformance DL primitives
- Automatic Kernel Generation for Volta Tensor Cores
- Optimizing CNN Model Inference on CPUs
- Swizzle Inventor: Data Movement Synthesis for GPU Kernels
- Analytical cache modeling and tilesize optimization for tensor contractions
-
NPU Optimization
-
Graph-level Optimization
- POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging
- Collage: Seamless Integration of Deep Learning Backends with Automatic Placement
- Apollo: Automatic Partition-based Operator Fusion through Layer by Layer Optimization
- IOS: An Inter-Operator Scheduler for CNN Acceleration
- Transferable Graph Optimizers for ML Compilers
- FusionStitching: Boosting Memory IntensiveComputations for Deep Learning Workloads
- Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
- Optimizing DNN Computation Graph using Graph Substitutions
-
Dynamic Model
- Axon: A Language for Dynamic Shapes in Deep Learning Graphs
- DietCode: Automatic Optimization for Dynamic Tensor Programs
- The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding
- Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference
- DISC: A Dynamic Shape Compiler for Machine Learning Workloads
- Cortex: A Compiler for Recursive Deep Learning Models
-
Graph Neural Networks
-
Distributed Computing
- SpDISTAL: Compiling Distributed Sparse Tensor Computations
- Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
- Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization
- Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning
- DISTAL: The Distributed Tensor Algebra Compiler
- GSPMD: General and Scalable Parallelization for ML Computation Graphs
- Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads
- OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
- Beyond Data and Model Parallelism for Deep Neural Networks
-
Quantization
-
Program Rewriting
-
Verification and Testing
-
Survey
-
-
Open Source Projects
- Glow: Compiler for Neural Network Hardware Accelerators
- nnfusion: A Flexible and Efficient Deep Neural Network Compiler
- Hummingbird: Compiling Trained ML Models into Tensor Computation
- AITemplate: A Python framework which renders neural network into high performance CUDA/HIP C++ code
- Hidet: A Compilation-based Deep Learning Framework
- TensorComprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
- PlaidML: A Platform for Making Deep Learning Work Everywhere
- BladeDISC: An End-to-End DynamIc Shape Compiler for Machine Learning Workloads
- Nebulgym: Easy-to-use Library to Accelerate AI Training
- DaCeML: A Data-Centric Compiler for Machine Learning
- Mirage: A Multi-level Superoptimizer for Tensor Algebra
- TVM: An End to End Machine Learning Compiler Framework
- MLIR: Multi-Level Intermediate Representation
- XLA: Optimizing Compiler for Machine Learning
- Halide: A Language for Fast, Portable Computation on Images and Tensors
- Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
- TACO: The Tensor Algebra Compiler
- Speedster: Automatically apply SOTA optimization techniques to achieve the maximum inference speed-up on your hardware
- NN-512: A Compiler That Generates C99 Code for Neural Net Inference
- Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations
-
Tutorials
-
Verification and Testing
-
Categories
Sub Categories
Keywords