https://github.com/microsoft/moonlit

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
https://github.com/microsoft/moonlit

inference-efficiency model-compression neural-architecture-search token-pruning

Last synced: 2 months ago
JSON representation

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

Host: GitHub
URL: https://github.com/microsoft/moonlit
Owner: microsoft
License: mit
Created: 2023-05-26T03:49:08.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-10-25T23:47:44.000Z (8 months ago)
Last Synced: 2025-04-07T05:13:21.273Z (2 months ago)
Topics: inference-efficiency, model-compression, neural-architecture-search, token-pruning
Language: Python
Homepage:
Size: 12 MB
Stars: 81
Watchers: 5
Forks: 7
Open Issues: 7
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Support: SUPPORT.md

Awesome Lists containing this project

README

        # Moonlit: Research for enhancing AI models' efficiency and performance.

**Moonlit** is a collection of our model compression work for efficient AI.

> [**ToP**](./ToP) (```@KDD'23```): [**Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference**](https://arxiv.org/abs/2306.14393)

>>**ToP** is a constraint-aware and ranking-distilled token pruning method, which selectively removes unnecessary tokens as input sequence pass through layers, allowing the model to improve online inference speed while preserving accuracy.

> 

> [**SpaceEvo**](./SpaceEvo) (```@ICCV'23```): [**SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference**](https://arxiv.org/abs/2303.08308)

>>**SpaceEvo** is an automatic method for designing a dedicated, quantization-friendly search space for target hardware. This work is featured on Microsoft Research blog: [Efficient and hardware-friendly neural architecture search with SpaceEvo](https://www.microsoft.com/en-us/research/blog/efficient-and-hardware-friendly-neural-architecture-search-with-spaceevo/)

> 

> [**ElasticViT**](./ElasticViT) (```@ICCV'23```): [**ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices**](https://arxiv.org/abs/2303.09730)

>>**ElasticViT** is a two-stage NAS approach that trains a high-quality ViT supernet over a very large search space for covering a wide range of mobile devices, and then searches an optimal sub-network (subnet) for direct deployment. 

>

> [**LitePred**](./LitePred/) (```@NSDI'24```): [**LitePred: Transferable and Scalable Latency Prediction for Hardware-Aware Neural Architecture Search**]()

>>**LitePred** is a lightweight transferrable approach for accurately predicting DNN inference latency. Instead of training a latency predictor from scratch, LitePred is the first to transfer pre-existing latency predictors and achieve accurate prediction on new edge platforms with a profiling cost of less than 1 hour.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/microsoft/moonlit

Awesome Lists containing this project

README