{"id":13442579,"url":"https://github.com/mit-han-lab/mcunet","last_synced_at":"2025-05-13T18:38:11.841Z","repository":{"id":37332505,"uuid":"503454474","full_name":"mit-han-lab/mcunet","owner":"mit-han-lab","description":"[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning","archived":false,"fork":false,"pushed_at":"2024-03-29T18:17:37.000Z","size":12581,"stargazers_count":556,"open_issues_count":26,"forks_count":94,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-20T23:47:04.099Z","etag":null,"topics":["deep-learning","microncontroller","neural-architecture-search","pytorch","tinyml"],"latest_commit_sha":null,"homepage":"https://mcunet.mit.edu","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mit-han-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-06-14T17:19:15.000Z","updated_at":"2025-04-18T01:56:44.000Z","dependencies_parsed_at":"2024-03-29T19:30:47.682Z","dependency_job_id":"dfac4cd4-336a-4a90-b835-0cdc68147f36","html_url":"https://github.com/mit-han-lab/mcunet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fmcunet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fmcunet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fmcunet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fmcunet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mit-han-lab","download_url":"https://codeload.github.com/mit-han-lab/mcunet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254004732,"owners_count":21998115,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","microncontroller","neural-architecture-search","pytorch","tinyml"],"created_at":"2024-07-31T03:01:47.530Z","updated_at":"2025-05-13T18:38:11.802Z","avatar_url":"https://github.com/mit-han-lab.png","language":"Python","funding_links":[],"categories":["Python","其他_机器学习与深度学习","🎯 **TinyML \u0026 MCU-specific Advances**"],"sub_categories":["🧠 **MCUNet Series** - MIT HAN Lab"],"readme":"# MCUNet: Tiny Deep Learning on IoT Devices \n\nThis is the official implementation of the MCUNet series.\n\n###  [TinyML Project Website](https://hanlab.mit.edu/projects/tinyml) | [MCUNetV1](https://hanlab.mit.edu/projects/mcunet) | [MCUNetV2](https://hanlab.mit.edu/projects/mcunetv2) | [MCUNetV3](https://hanlab.mit.edu/projects/mcunetv3)\n\n![demo](assets/figures/mcunet_demo.gif)\n\n## News\n\n**If you are interested in getting updates, please sign up [here](https://forms.gle/UW1uUmnfk1k6UJPPA) to get notified!**\n\n- **(2024/03)** We release a [new demo video](https://www.youtube.com/watch?v=0pUFZYdoMY8) of [On-Device Training Under 256KB Memory](https://arxiv.org/abs/2206.15472).\n- **(2023/10)** [Tiny Machine Learning: Progress and Futures \\[Feature\\]](https://hanlab.mit.edu/projects/tinyml-magazine) appears at IEEE CAS Magazine.\n- **(2022/12)** We simplified the `net_id` of models (new version: `mcunet-in0`, `mcunet-vww1`, etc.) for an upcoming review paper (stay tuned!).\n- **(2022/10)** Our new work [On-Device Training Under 256KB Memory](https://arxiv.org/abs/2206.15472) is highlighted on the [MIT homepage](http://web.mit.edu/spotlight/learning-edge/)!\n- **(2022/09)** Our new work [On-Device Training Under 256KB Memory](https://arxiv.org/abs/2206.15472) is accepted to NeurIPS 2022! It enables tiny on-device training for IoT devices. \n- **(2022/08)** We release the source code of **TinyEngine** in [this repo](https://github.com/mit-han-lab/tinyengine). Please take a look!\n- **(2022/08)** Our new course on **TinyML and Efficient Deep Learning** will be released soon in September 2022: [efficientml.ai](https://efficientml.ai/).\n- **(2022/07)** We also include the person detection model used in the video demo above. We will also include the deployment code in TinyEngine release. \n- **(2022/06)** We refactor the MCUNet repo as a standalone repo (previous repo: https://github.com/mit-han-lab/tinyml)\n- **(2021/10)** **MCUNetV2** is accepted to NeurIPS 2021: https://arxiv.org/abs/2110.15352 !\n- **(2020/10)** **MCUNet** is accepted to NeurIPS 2020 as **spotlight**: https://arxiv.org/abs/2007.10319 !\n- Our projects are covered by: [MIT News](https://news.mit.edu/2020/iot-deep-learning-1113), [MIT News (v2)](https://news.mit.edu/2021/tiny-machine-learning-design-alleviates-bottleneck-memory-usage-iot-devices-1208), [WIRED](https://www.wired.com/story/ai-algorithms-slimming-fit-fridge/), [Morning Brew](https://www.morningbrew.com/emerging-tech/stories/2020/12/07/researchers-figured-fit-ai-ever-onto-internet-things-microchips), [Stacey on IoT](https://staceyoniot.com/researchers-take-a-3-pronged-approach-to-edge-ai/), [Analytics Insight](https://www.analyticsinsight.net/amalgamating-ml-and-iot-in-smart-home-devices/), [Techable](https://techable.jp/archives/142462), etc.\n\n\n## Overview\n\nMicrocontrollers are low-cost, low-power hardware. They are widely deployed and have wide applications.\n\n![teaser](assets/figures/applications.png)\n\nBut the tight memory budget (50,000x smaller than GPUs) makes deep learning deployment difficult.\n\n![teaser](assets/figures/memory_size.png)\n\nMCUNet is a **system-algorithm co-design** framework for tiny deep learning on microcontrollers. It consists of **TinyNAS** and **TinyEngine**. They are co-designed to fit the tight memory budgets.\n\nWith system-algorithm co-design, we can significantly improve the deep learning performance on the same tiny memory budget.\n\n![teaser](assets/figures/overview.png)\n\nOur **TinyEngine** inference engine could be a useful infrastructure for MCU-based AI applications. It significantly **improves the inference speed and reduces the memory usage** compared to existing libraries like [TF-Lite Micro](https://www.tensorflow.org/lite/microcontrollers), [CMSIS-NN](https://arxiv.org/abs/1801.06601), [MicroTVM](https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny), etc. It improves the inference speed by **1.5-3x**, and reduces the peak memory by **2.7-4.8x**.\n\n![teaser](assets/figures/latency_mem.png)\n\n\n\n## Model Zoo\n\n### Usage\n\nYou can build the pre-trained PyTorch `fp32` model or the `int8` quantized model in TF-Lite format.\n\n```python\nfrom mcunet.model_zoo import net_id_list, build_model, download_tflite\nprint(net_id_list)  # the list of models in the model zoo\n\n# pytorch fp32 model\nmodel, image_size, description = build_model(net_id=\"mcunet-in3\", pretrained=True)  # you can replace net_id with any other option from net_id_list\n\n# download tflite file to tflite_path\ntflite_path = download_tflite(net_id=\"mcunet-in3\")\n```\n\n\n### Evaluate\n\nTo evaluate the accuracy of PyTorch `fp32` models, run:\n\n```bash\npython eval_torch.py --net_id mcunet-in2 --dataset {imagenet/vww} --data-dir PATH/TO/DATA/val\n```\n\nTo evaluate the accuracy of TF-Lite `int8` models, run:\n\n```bash\npython eval_tflite.py --net_id mcunet-in2 --dataset {imagenet/vww} --data-dir PATH/TO/DATA/val\n```\n\n### Model List\n\n- Note that all the **latency**, **SRAM**, and **Flash** usage are profiled with **TinyEngine** on STM32F746.\n- Here we only provide the `int8` quantized modes. `int4` quantized models (as shown in the paper) can further push the accuracy-memory trade-off, but lacking a general format support.\n- For accuracy (top1, top-5), we report the accuracy of `fp32`/`int8` models respectively\n\nThe **ImageNet** model list:\n\n| net_id              | MACs   | #Params | SRAM  | Flash  | Res. | Top-1\u003cbr /\u003e(fp32/int8) | Top-5\u003cbr /\u003e(fp32/int8) |\n| ------------------- | ------ | ------- | ----- | ------ | ---- | ---------------------- | ---------------------- |\n| *# baseline models* |        |         |       |        |      |                        |                        |\n| mbv2-w0.35          | 23.5M  | 0.75M   | 308kB | 862kB  | 144  | 49.7%/49.0%            | 74.6%/73.8%            |\n| proxyless-w0.3      | 38.3M  | 0.75M   | 292kB | 892kB  | 176  | 57.0%/56.2%            | 80.2%/79.7%            |\n| *# mcunet models*   |        |         |       |        |      |                        |                        |\n| mcunet-in0          | 6.4M   | 0.75M   | 266kB | 889kB  | 48   | 41.5%/40.4%            | 66.3%/65.2%            |\n| mcunet-in1          | 12.8M  | 0.64M   | 307kB | 992kB  | 96   | 51.5%/49.9%            | 75.5%/74.1%            |\n| mcunet-in2          | 67.3M  | 0.73M   | 242kB | 878kB  | 160  | 60.9%/60.3%            | 83.3%/82.6%            |\n| mcunet-in3          | 81.8M  | 0.74M   | 293kB | 897kB  | 176  | 62.2%/61.8%            | 84.5%/84.2%            |\n| mcunet-in4          | 125.9M | 1.73M   | 456kB | 1876kB | 160  | 68.4%/68.0%            | 88.4%/88.1%            |\n\nThe **VWW** model list:\n\n*Note that the VWW dataset might be hard to prepare. You can download our pre-built `minival` set from [here](https://www.dropbox.com/s/bc7qi89ezra9711/vww-minival.tar?dl=0), around 380MB.*\n\n| net_id      | MACs  | #Params | SRAM  | Flash | Resolution | Top-1\u003cbr /\u003e(fp32/int8) |\n| ----------- | ----- | ------- | ----- | ----- | ---------- | ---------------------- |\n| mcunet-vww0 | 6.0M  | 0.37M   | 146kB | 617kB | 64         | 87.4%/87.3%            |\n| mcunet-vww1 | 11.6M | 0.43M   | 162kB | 689kB | 80         | 88.9%/88.9%            |\n| mcunet-vww2 | 55.8M | 0.64M   | 311kB | 897kB | 144        | 91.7%/91.8%            |\n\nFor TF-Lite `int8` models, we do not use quantization-aware training (QAT), so some results is slightly lower than paper numbers. \n\n### Detection Model\n\nWe also share the person detection model used in the [demo](https://www.youtube.com/watch?v=F4XKn0iDfxg). To visualize the model's prediction on a sample image, please run the following command:\n\n```bash\npython eval_det.py\n```\n\nIt will visualize the prediction here: `assets/sample_images/person_det_vis.jpg`.\n\nThe model takes in a small input resolution of 128x160 to reduce memory usage. It does not achieve state-of-the-art performance due to the limited image and model size but should provide decent performance for tinyML applications (please check the demo for a video recording). We will also release the deployment code in the upcoming TinyEngine release. \n\n## Requirement\n\n- Python 3.6+\n\n- PyTorch 1.4.0+\n\n- Tensorflow 1.15 (if you want to test TF-Lite models; CPU support only)\n\n## Acknowledgement\n\nWe thank MIT-IBM Watson AI Lab, Intel, Amazon, SONY, Qualcomm, NSF for supporting this research.\n\n\n## Citation\nIf you find the project helpful, please consider citing our paper:\n\n```\n@article{lin2020mcunet,\n  title={Mcunet: Tiny deep learning on iot devices},\n  author={Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song},\n  journal={Advances in Neural Information Processing Systems},\n  volume={33},\n  year={2020}\n}\n\n@inproceedings{\n  lin2021mcunetv2,\n  title={MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning},\n  author={Lin, Ji and Chen, Wei-Ming and Cai, Han and Gan, Chuang and Han, Song},\n  booktitle={Annual Conference on Neural Information Processing Systems (NeurIPS)},\n  year={2021}\n} \n\n@article{\n  lin2022ondevice, \n  title = {On-Device Training Under 256KB Memory},\n  author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, \n  journal = {arXiv:2206.15472 [cs]},\n  url = {https://arxiv.org/abs/2206.15472},\n  year = {2022}\n}\n```\n\n\n## Related Projects\n\n[On-Device Training Under 256KB Memory](https://hanlab.mit.edu/projects/mcunetv3) (NeurIPS'22)\n\n[TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning](https://arxiv.org/abs/2007.11622) (NeurIPS'20)\n\n[Once for All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791) (ICLR'20)\n\n[ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) (ICLR'19)\n\n[AutoML for Architecting Efficient and Specialized Neural Networks](https://ieeexplore.ieee.org/abstract/document/8897011) (IEEE Micro)\n\n[AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/pdf/1802.03494.pdf) (ECCV'18)\n\n[HAQ: Hardware-Aware Automated Quantization](https://arxiv.org/pdf/1811.08886.pdf)  (CVPR'19, oral)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Fmcunet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmit-han-lab%2Fmcunet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Fmcunet/lists"}