{"id":13441331,"url":"https://github.com/mit-han-lab/efficientvit","last_synced_at":"2025-04-25T14:40:25.976Z","repository":{"id":152147623,"uuid":"624184221","full_name":"mit-han-lab/efficientvit","owner":"mit-han-lab","description":"Efficient vision foundation models for high-resolution generation and perception.","archived":false,"fork":false,"pushed_at":"2025-04-24T00:37:28.000Z","size":217253,"stargazers_count":2818,"open_issues_count":103,"forks_count":218,"subscribers_count":40,"default_branch":"master","last_synced_at":"2025-04-24T01:28:12.441Z","etag":null,"topics":["deep-compression-autoencoder","efficient-diffusion-model","efficientvit","high-resolution","imagenet","segment-anything","segmentation","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mit-han-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-05T23:30:46.000Z","updated_at":"2025-04-24T00:37:31.000Z","dependencies_parsed_at":"2023-10-03T11:32:40.766Z","dependency_job_id":"681ac583-fc3c-441e-8005-f337572fa51b","html_url":"https://github.com/mit-han-lab/efficientvit","commit_stats":{"total_commits":101,"total_committers":13,"mean_commits":7.769230769230769,"dds":0.4356435643564357,"last_synced_commit":"9569a322b617668f35fc3c1eb0bb6b029e4ab934"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fefficientvit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fefficientvit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fefficientvit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fefficientvit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mit-han-lab","download_url":"https://codeload.github.com/mit-han-lab/efficientvit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250544982,"owners_count":21448147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-compression-autoencoder","efficient-diffusion-model","efficientvit","high-resolution","imagenet","segment-anything","segmentation","vision-transformer"],"created_at":"2024-07-31T03:01:32.666Z","updated_at":"2025-04-24T04:01:15.378Z","avatar_url":"https://github.com/mit-han-lab.png","language":"Python","funding_links":[],"categories":["Python","⚡ Efficient Mobile Models","Paper List"],"sub_categories":["🚀 Backbone Networks","Follow-up Papers"],"readme":"# Efficient Vision Foundation Models for High-Resolution Generation and Perception\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-compression-autoencoder-for-efficient/image-generation-on-imagenet-512x512)](https://paperswithcode.com/sota/image-generation-on-imagenet-512x512?p=deep-compression-autoencoder-for-efficient)\n\n## News\n- (🔥 New) [2025/01/24] We released DC-AE-SANA-1.1: [doc](https://github.com/mit-han-lab/efficientvit/blob/master/assets/docs/dc_ae_sana_1.1.md). \n- (🔥 New) [2025/01/23] DC-AE and SANA are accepted by ICLR 2025.\n- (🔥 New) [2025/01/14] We released **DC-AE+USiT models**: [model](https://huggingface.co/collections/mit-han-lab/dc-ae-diffusion-670dbb8d6b6914cf24c1a49d), [training](https://github.com/mit-han-lab/efficientvit/blob/master/applications/dc_ae/README.md#dc-ae--usit). Using the default training settings and sampling strategy, DC-AE+USiT-2B achieves 1.72 FID on ImageNet 512x512, surpassing the SOTA diffusion model EDM2-XXL and SOTA auto-regressive image generative models (MAGVIT-v2 and MAR-L).\n\n______________________________________________________________________\n\n- (🔥 New) [2024/12/24] **diffusers** supports DC-AE models. All [DC-AE models in diffusers safetensors](https://huggingface.co/collections/mit-han-lab/dc-ae-670085b9400ad7197bb1009b) are released. [Usage](https://github.com/mit-han-lab/efficientvit/tree/master/applications/dc_ae#deep-compression-autoencoder-diffusers).\n- [2024/10/21] DC-AE and EfficientViT block are used in our latest text-to-image diffusion model SANA! Check the [project page](https://nvlabs.github.io/Sana/) for more details.\n- [2024/10/15] We released **Deep Compression Autoencoder (DC-AE)**: [link](#deep-compression-autoencoder-for-efficient-high-resolution-diffusion-models-paper-readme)!\n- [2024/07/10] EfficientViT is used as the backbone in [Grounding DINO 1.5 Edge](https://arxiv.org/pdf/2405.10300) for efficient open-set object detection.\n- [2024/07/10] EfficientViT-SAM is used in [MedficientSAM](https://github.com/hieplpvip/medficientsam), the 1st place model in [CVPR 2024 Segment Anything In Medical Images On Laptop Challenge](https://www.codabench.org/competitions/1847/).\n- [2024/04/06] EfficientViT-SAM is accepted by [eLVM@CVPR'24](https://sites.google.com/view/elvm/home?authuser=0).\n- [2024/03/19] Online demo of EfficientViT-SAM is available: [https://evitsam.hanlab.ai/](https://evitsam.hanlab.ai/). \n- [2024/02/07] We released [EfficientViT-SAM](https://arxiv.org/abs/2402.05008), the first accelerated SAM model that matches/outperforms SAM-ViT-H's zero-shot performance, delivering the SOTA performance-efficiency trade-off.\n- [2023/11/20] EfficientViT is available in the [NVIDIA Jetson Generative AI Lab](https://www.jetson-ai-lab.com/tutorial_efficientvit.html).\n- [2023/09/12] EfficientViT is highlighted by [MIT home page](https://www.mit.edu/archive/spotlight/efficient-computer-vision/) and [MIT News](https://news.mit.edu/2023/ai-model-high-resolution-computer-vision-0912).\n- [2023/07/18] EfficientViT is accepted by ICCV 2023.\n\n## Content\n\n### [ICLR 2025] Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models [[paper](https://arxiv.org/abs/2410.10733)] [[readme](applications/dc_ae/README.md)] [[poster](assets/dc_ae_poster.pdf)]\n\n**Deep Compression Autoencoder (DC-AE) is a new family of high-spatial compression autoencoders with a spatial compression ratio of up to 128 while maintaining reconstruction quality. It accelerates all latent diffusion models regardless of the diffusion model architecture.**\n\n#### Demo\n\n![demo](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0/resolve/main/assets/dc_ae_demo.gif)\n\u003cp align=\"center\"\u003e\n\u003cb\u003e Figure 1: We address the reconstruction accuracy drop of high spatial-compression autoencoders.\n\u003c/p\u003e\n\n![demo](assets/dc_ae_diffusion_demo.gif)\n\u003cp align=\"center\"\u003e\n\u003cb\u003e Figure 2: DC-AE speeds up latent diffusion models.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0/resolve/main/assets/dc_ae_sana.jpg\"  width=\"1200\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cb\u003e Figure 3: DC-AE enables efficient text-to-image generation on the laptop: \u003ca href=\"https://nvlabs.github.io/Sana/\"\u003eSANA\u003c/a\u003e.\n\u003c/p\u003e\n\n- [Usage of Deep Compression Autoencoder](applications/dc_ae/README.md#deep-compression-autoencoder)\n- [Usage of DC-AE-Diffusion](applications/dc_ae/README.md#efficient-diffusion-models-with-dc-ae)\n- [Evaluate Deep Compression Autoencoder](applications/dc_ae/README.md#evaluate-deep-compression-autoencoder)\n- [Demo DC-AE-Diffusion Models](applications/dc_ae/README.md#demo-dc-ae-diffusion-models)\n- [Evaluate DC-AE-Diffusion Models](applications/dc_ae/README.md#evaluate-dc-ae-diffusion-models)\n- [Train DC-AE-Diffusion Models](applications/dc_ae/README.md#train-dc-ae-diffusion-models)\n- [Reference](applications/dc_ae/README.md#reference)\n\n### [CVPR 2024 eLVM Workshop] EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss [[paper](https://arxiv.org/abs/2402.05008)] [[online demo](https://evitsam.hanlab.ai/)] [[readme](applications/efficientvit_sam/README.md)]\n\n**EfficientViT-SAM is a new family of accelerated segment anything models by replacing SAM's heavy image encoder with EfficientViT. It delivers a 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing accuracy.**\n\n\u003cp align=\"left\"\u003e\n\u003cimg src=\"https://huggingface.co/mit-han-lab/efficientvit-sam/resolve/main/sam_zero_shot_coco_mAP.png\"  width=\"500\"\u003e\n\u003c/p\u003e\n\n- [Pretrained EfficientViT-SAM Models](applications/efficientvit_sam/README.md#pretrained-efficientvit-sam-models)\n- [Usage of EfficientViT-SAM](applications/efficientvit_sam/README.md#usage)\n- [Evaluate EfficientViT-SAM](applications/efficientvit_sam/README.md#evaluation)\n- [Visualize EfficientViT-SAM](applications/efficientvit_sam/README.md#visualization)\n- [Deploy EfficientViT-SAM](applications/efficientvit_sam/README.md#deployment)\n- [Train EfficientViT-SAM](applications/efficientvit_sam/README.md#training)\n- [Reference](applications/efficientvit_sam/README.md#reference)\n\n### [ICCV 2023] EfficientViT-Classification [[paper](https://arxiv.org/abs/2205.14756)] [[readme](applications/efficientvit_cls/README.md)]\n\n**Efficient image classification models with EfficientViT backbones.**\n\n\u003cp align=\"left\"\u003e\n\u003cimg src=\"https://huggingface.co/han-cai/efficientvit-cls/resolve/main/efficientvit_cls_results.png\"  width=\"600\"\u003e\n\u003c/p\u003e\n\n- [Pretrained EfficientViT Classification Models](applications/efficientvit_cls/README.md#pretrained-efficientvit-classification-models)\n- [Usage of EfficientViT Classification Models](applications/efficientvit_cls/README.md#usage)\n- [Evaluate EfficientViT Classification Models](applications/efficientvit_cls/README.md#evaluation)\n- [Export EfficientViT Classification Models](applications/efficientvit_cls/README.md#export)\n- [Train EfficientViT Classification Models](applications/efficientvit_cls/README.md#training)\n- [Reference](applications/efficientvit_cls/README.md#reference)\n\n### [ICCV 2023] EfficientViT-Segmentation [[paper](https://arxiv.org/abs/2205.14756)] [[readme](applications/efficientvit_seg/README.md)]\n\n**Efficient semantic segmantation models with EfficientViT backbones.**\n\n![demo](assets/cityscapes_l1.gif)\n\n- [Pretrained EfficientViT Segmentation Models](applications/efficientvit_seg/README.md#pretrained-efficientvit-segmentation-models)\n- [Usage of EfficientViT Segmentation Models](applications/efficientvit_seg/README.md#usage)\n- [Evaluate EfficientViT Segmentation Models](applications/efficientvit_seg/README.md#evaluation)\n- [Visualize EfficientViT Segmentation Models](applications/efficientvit_seg/README.md#visualization)\n- [Export EfficientViT Segmentation Models](applications/efficientvit_seg/README.md#export)\n- [Reference](applications/efficientvit_seg/README.md#reference)\n\n### EfficientViT-GazeSAM [[readme](applications/efficientvit_gazesam/README.md)]\n\n**Gaze-prompted image segmentation models capable of running in real time with TensorRT on an NVIDIA RTX 4070.**\n\n![GazeSAM demo](https://huggingface.co/mit-han-lab/efficientvit-sam/resolve/main/gazesam/efficientvit_gazesam_demo.gif)\n\n## Getting Started\n\n```bash\nconda create -n efficientvit python=3.10\nconda activate efficientvit\npip install -U -r requirements.txt\n```\n\n## Third-Party Implementation/Integration\n\n- [NVIDIA Jetson Generative AI Lab](https://www.jetson-ai-lab.com/tutorial_efficientvit.html)\n- [timm](https://github.com/huggingface/pytorch-image-models): [link](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/efficientvit_mit.py)\n- [X-AnyLabeling](https://github.com/CVHub520/X-AnyLabeling): [link](https://github.com/CVHub520/X-AnyLabeling/blob/main/anylabeling/services/auto_labeling/efficientvit_sam.py)\n- [Grounding DINO 1.5 Edge](https://github.com/IDEA-Research/Grounding-DINO-1.5-API): [link](https://arxiv.org/pdf/2405.10300)\n\n## Contact\n\n[Han Cai](http://hancai.ai/)\n\n## Reference\n\nIf EfficientViT or EfficientViT-SAM or DC-AE is useful or relevant to your research, please kindly recognize our contributions by citing our paper:\n\n```bibtex\n@inproceedings{cai2023efficientvit,\n  title={Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction},\n  author={Cai, Han and Li, Junyan and Hu, Muyan and Gan, Chuang and Han, Song},\n  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},\n  pages={17302--17313},\n  year={2023}\n}\n```\n\n```bibtex\n@article{zhang2024efficientvit,\n  title={EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss},\n  author={Zhang, Zhuoyang and Cai, Han and Han, Song},\n  journal={arXiv preprint arXiv:2402.05008},\n  year={2024}\n}\n```\n\n```bibtex\n@article{chen2024deep,\n  title={Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models},\n  author={Chen, Junyu and Cai, Han and Chen, Junsong and Xie, Enze and Yang, Shang and Tang, Haotian and Li, Muyang and Lu, Yao and Han, Song},\n  journal={arXiv preprint arXiv:2410.10733},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Fefficientvit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmit-han-lab%2Fefficientvit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Fefficientvit/lists"}