{"id":13439893,"url":"https://github.com/chongzhou96/EdgeSAM","last_synced_at":"2025-03-20T09:31:03.305Z","repository":{"id":211965039,"uuid":"728068800","full_name":"chongzhou96/EdgeSAM","owner":"chongzhou96","description":"Official PyTorch implementation of \"EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM\"","archived":false,"fork":false,"pushed_at":"2024-08-12T13:50:43.000Z","size":25658,"stargazers_count":928,"open_issues_count":16,"forks_count":42,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-10-28T02:17:56.269Z","etag":null,"topics":["coreml","on-device-ai","segment-anything"],"latest_commit_sha":null,"homepage":"https://mmlab-ntu.com/project/edgesam/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chongzhou96.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-06T06:51:43.000Z","updated_at":"2024-10-27T00:49:11.000Z","dependencies_parsed_at":"2024-01-16T02:38:38.677Z","dependency_job_id":"7b4bb39e-1262-477c-b02a-9df19d309b82","html_url":"https://github.com/chongzhou96/EdgeSAM","commit_stats":null,"previous_names":["chongzhou96/edgesam"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chongzhou96%2FEdgeSAM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chongzhou96%2FEdgeSAM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chongzhou96%2FEdgeSAM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chongzhou96%2FEdgeSAM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chongzhou96","download_url":"https://codeload.github.com/chongzhou96/EdgeSAM/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244585693,"owners_count":20476781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coreml","on-device-ai","segment-anything"],"created_at":"2024-07-31T03:01:17.969Z","updated_at":"2025-03-20T09:31:01.044Z","avatar_url":"https://github.com/chongzhou96.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Paper List","⚡ Optimization \u0026 Mobile Deployment"],"sub_categories":["Follow-up Papers","🛠️ Optimization Resources"],"readme":"# EdgeSAM\n**Prompt-In-the-Loop Distillation for On-Device Deployment of SAM**\n\n\n[Chong Zhou\u003csup\u003e1\u003c/sup\u003e](https://chongzhou96.github.io/),\n[Xiangtai Li\u003csup\u003e1\u003c/sup\u003e](https://lxtgh.github.io/),\n[Chen Change Loy\u003csup\u003e1*\u003c/sup\u003e](https://www.mmlab-ntu.com/person/ccloy/),\n[Bo Dai\u003csup\u003e2\u003c/sup\u003e](https://daibo.info/)\n\n(*corresponding author)\n\n[\u003csup\u003e1\u003c/sup\u003eS-Lab, Nanyang Technological University](https://www.mmlab-ntu.com/),\n[\u003csup\u003e2\u003c/sup\u003eShanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/)\n\n[[`Paper`](https://arxiv.org/abs/2312.06660)]\n[[`Project Page`](https://www.mmlab-ntu.com/project/edgesam/)]\n[[`Hugging Face Demo`](https://huggingface.co/spaces/chongzhou/EdgeSAM)]\n[[`iOS App`](https://apps.apple.com/us/app/cutcha-photo/id6478521132)]\n\nhttps://github.com/chongzhou96/EdgeSAM/assets/15973859/fe1cd104-88dc-4690-a5ea-ff48ae013db3\n\n**Watch the full live demo video: [[YouTube](https://www.youtube.com/watch?v=YYsEQ2vleiE)] [[Bilibili](https://www.bilibili.com/video/BV1294y1P7TC/)]**\n\n## Updates\n\n* **2024/07/23**: We release our training and evaluation code, check out [README_TRAIN.md](README_TRAIN.md).\n* **2024/06/05**: Check out our iOS App [CutCha](https://apps.apple.com/us/app/cutcha-photo/id6478521132) powered by EdgeSAM.\n* **2024/01/01**: EdgeSAM is intergrated into [X-AnyLabeling](https://github.com/CVHub520/X-AnyLabeling).\n* **2023/12/19**: EdgeSAM is now supported in [ISAT](https://github.com/yatengLG/ISAT_with_segment_anything), a segmentation labeling tool.\n* **2023/12/16**: EdgeSAM is now supported in [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything). Check out the [grounded-edge-sam demo](https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/main/EfficientSAM/grounded_edge_sam.py). Thanks to the IDEA Research team!\n* **2023/12/14**: [autodistill-grounded-edgesam](https://github.com/autodistill/autodistill-grounded-edgesam) combines Grounding DINO and EdgeSAM to create Grounded EdgeSAM [[blog](https://blog.roboflow.com/how-to-use-grounded-edgesam/)]. Thanks to the Roboflow team!\n* **2023/12/13**: Add ONNX export and speed up the web demo with ONNX as the backend.\n\n## Overview\n\n**EdgeSAM** is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance.\nIt achieves a **40-fold speed increase** compared to the original SAM, and outperforms MobileSAM, being **14 times as fast** when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively.\nEdgeSAM is also the first SAM variant that can run at **over 30 FPS** on an iPhone 14.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"900\" alt=\"compare\" src=\"https://github.com/chongzhou96/EdgeSAM/assets/15973859/95a6f308-7300-4cb4-8b1b-b711cdea3f64\"\u003e\n\u003c/p\u003e\n\n*In this figure, we show the encoder throughput of EdgeSAM compared with SAM and MobileSAM as well as the mIoU performance on the SA-1K dataset (sampled from SA-1B) with box and point prompts.*\n\n\u003cdetails\u003e\n\n\u003csummary\u003e \u003cstrong\u003eApproach\u003c/strong\u003e \u003c/summary\u003e\n\nOur approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture, better suited for edge devices. We carefully benchmark various distillation strategies and demonstrate that task-agnostic encoder distillation fails to capture the full knowledge embodied in SAM. To overcome this bottleneck, we include both the prompt encoder and mask decoder in the distillation process, with box and point prompts in the loop, so that the distilled model can accurately capture the intricate dynamics between user input and mask generation.\n\n  \u003cp align=\"center\"\u003e\n    \u003cimg width=\"612\" alt=\"arch\" src=\"https://github.com/chongzhou96/EdgeSAM/assets/15973859/e706101a-c3d5-4d99-bea5-c6735ce25237\"\u003e\n  \u003c/p\u003e\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e \u003cstrong\u003ePerformance\u003c/strong\u003e \u003c/summary\u003e\n\n| Method      | Train Set | COCO AP | COCO AP\u003csub\u003es\u003c/sub\u003e | COCO AP\u003csub\u003em\u003c/sub\u003e | COCO AP\u003csub\u003el\u003c/sub\u003e | GFLops | MParam. | FPS iPhone 14 | FPS 2080 Ti | FPS 3090 |\n|-------------|-----------|---------|---------------------|---------------------|---------------------|--------|---------|---------------|-------------|----------|\n| SAM         | SA-1B     | 46.1    | 33.6                | 51.9                | 57.7                | 2734.8 | 641.1   | -             | 4.3         | -        |\n| FastSAM     | 2% SA-1B  | 37.9    | 23.9                | 43.4                | 50.0                | 887.6  | 68.2    | -             | -           | 25.0*    |\n| MobileSAM   | 1% SA-1B  | 39.4    | 26.9                | 44.4                | 52.2                | 38.2   | 9.8     | 4.9           | 103.5       | 100.0*   |\n| EdgeSAM     | 1% SA-1B  | 42.2    | 29.6                | 47.6                | 53.9                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n| EdgeSAM-3x  | 3% SA-1B  | 42.7    | 30.0                | 48.6                | 54.5                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n| EdgeSAM-10x | 10% SA-1B | 43.0    | 30.3                | 48.9                | 55.1                | 22.1   | 9.6     | 38.7          | 164.3       | -        |\n\n*In this table, we report the mask mAP on the COCO dataset. ViTDet-H is used as the detector, whose box mAP is 58.7, to provide box prompts. For speed benchmarking, we infer both the encoder and decoder (with a single prompt). FLOPs are calculated based on the 1024x1024 input resolution. Numbers denoted by * are copied from MobileSAM. 3x and 10x represent training with more data. Here, we do not apply an additional mask refinement iteration per the setting of the original SAM paper.*\n\n\u003c/details\u003e\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Usage](#usage)\n- [Train and Eval](#train)\n- [Web Demo](#demo)\n- [CoreML / ONNX Export](#export)\n- [Checkpoints](#checkpoints)\n- [iOS App](#ios)\n- [Acknowledgements](#acknowledgement)\n- [Citation](#cite)\n- [License](#license)\n\n## Installation \u003ca name=\"installation\"\u003e\u003c/a\u003e\n\nThe code requires `python\u003e=3.8` and we use `torch==2.0.0` and `torchvision==0.15.1`. Please refer to the\n[official PyTorch installation instructions](https://pytorch.org/get-started/locally/).\n\n1. Clone the repository locally:\n\n```\ngit clone https://github.com/chongzhou96/EdgeSAM.git \u0026\u0026 cd EdgeSAM\n```\n\n2. Install additional dependencies:\n\n```\npip install -r requirements.txt\n```\n\n3. Install EdgeSAM:\n\n```\npip install -e .\n```\n\n## Usage \u003ca name=\"usage\"\u003e\u003c/a\u003e\n\n1. Download checkpoints (please refer to [Checkpoints](#checkpoints) for more details about the PyTorch and CoreML checkpoints):\n\n```\nmkdir weights\nwget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam.pth\nwget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x.pth\n```\n\n2. You can easily incorporate EdgeSAM into your Python code with following lines:\n\n```\nfrom edge_sam import SamPredictor, sam_model_registry\nsam = sam_model_registry[\"edge_sam\"](checkpoint=\"\u003cpath/to/checkpoint\u003e\")\npredictor = SamPredictor(sam)\npredictor.set_image(\u003cyour_image\u003e)\nmasks, _, _ = predictor.predict(\u003cinput_prompts\u003e)\n```\n\nSince EdgeSAM follows the same encoder-decoder architecture as SAM, their usages are very similar. One minor difference is that EdgeSAM allows outputting 1, 3, and 4 mask candidates for each prompt, while SAM yields either 1 or 3 masks. For more details, please refer to the [example Jupyter Notebook](https://github.com/chongzhou96/EdgeSAM/blob/master/notebooks/predictor_example.ipynb).\n\n## Train and Eval \u003ca name=\"train\"\u003e\u003c/a\u003e\nPlease refer to [README_TRAIN.md](README_TRAIN.md) for more details.\n\n## Web Demo \u003ca name=\"demo\"\u003e\u003c/a\u003e\nAfter installing EdgeSAM and downloading the checkpoints. You can start an interactive web demo with the following command:\n\n```\npython web_demo/gradio_app.py\n```\n\nBy default, the demo is hosted on `http://0.0.0.0:8080/` and expects `edge_sam_3x.pth` to be stored in the `weights/` folder. You can change the default behavior by:\n\n```\npython web_demo/gradio_app.py --checkpoint [CHECKPOINT] --server-name [SERVER_NAME] --port [PORT]\n```\n\nSince EdgeSAM can run smoothly on a mobile phone, it's fine if you don't have a GPU.\n\nWe've deployed the same web demo in the Hugging Face Space [[link](https://huggingface.co/spaces/chongzhou/EdgeSAM)]. \u003cdel\u003e However, since it uses the CPU as the backend and is shared by all users, the experience might not be as good as a local deployment. \u003c/del\u003e  Really appreciate the Hugging Face team for supporting us with the GPU!\n\n**Speed up the web demo with ONNX backend**\n\n1. Install the onnxruntime with `pip install onnxruntime` if your machine doesn't have a GPU or `pip install onnxruntime-gpu` if it does (but don't install both of them). Our implementation is tested under version `1.16.3`.\n\n2. Download the ONNX models to the `weights/` folder:\n\n```\nwget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_encoder.onnx\nwget -P weights/ https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_decoder.onnx\n```\n\n3. Start the demo:\n\n```\npython web_demo/gradio_app.py --enable-onnx\n```\n\n4. Navigate to http://0.0.0.0:8080 in your browser.\n\n## CoreML / ONNX Export \u003ca name=\"export\"\u003e\u003c/a\u003e\n\n**CoreML**\n\nWe provide a script that can export a trained EdgeSAM PyTorch model to two CoreML model packages, one for the encoder and another for the decoder. You can also download the exported CoreML models at [Checkpoints](#checkpoints).\n\nFor encoder:\n\n```\npython scripts/export_coreml_model.py [CHECKPOINT]\n```\n\nFor decoder:\n\n```\npython scripts/export_coreml_model.py [CHECKPOINT] --decoder --use-stability-score\n```\n\nSince EdgeSAM doesn't perform knowledge distillation on the IoU token of the original SAM, its IoU predictions might not be reliable. Therefore, we use the stability score for mask selection instead. You can stick to the IoU predictions by removing `--use-stability-score`.\n\nThe following shows the performance reports of the EdgeSAM CoreML models measured by Xcode on an iPhone 14 (left: encoder, right: decoder):\n\n\u003cp align=\"center\"\u003e\n\n  ![xcode](https://github.com/chongzhou96/EdgeSAM/assets/15973859/8df54f76-24c9-4ad2-af6d-086b971d073b)\n\n\u003c/p\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e \u003cstrong\u003e Known issues and model descriptions \u003c/strong\u003e \u003c/summary\u003e\n\n  As of `coremltools==7.1`, you may encounter the assertion error during the export, e.g., `assert len(inputs) \u003c= 3 or inputs[3] is None`. One workaround is to comment out this assertion following the traceback path, e.g., `/opt/anaconda3/envs/EdgeSAM/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py line 1573`.\n\n  Since CoreML doesn't support interpolation with dynamic target sizes, the converted CoreML models do not contain the pre-processing, i.e., resize-norm-pad, and the post-processing, i.e., resize back to the original size.\n\n  The encoder takes a `1x3x1024x1024` image as the input and outputs a `1x256x64x64` image embedding. The decoder then takes the image embedding together with point coordinates and point labels as the input. The point coordinates follow the `(height, width)` format with the top-left corner as the `(0, 0)`. The choices of point labels are `0: negative point`, `1: positive point`, `2: top-left corner of box`, and `3: bottom-right corner of box`.\n\n\u003c/details\u003e\n\n**ONNX**\n\nSimilar to the CoreML export, you can use the following commands to export the encoder and the decoder to ONNX models respectively:\n\nFor encoder:\n\n```\npython scripts/export_onnx_model.py [CHECKPOINT]\n```\n\nFor decoder:\n\n```\npython scripts/export_onnx_model.py [CHECKPOINT] --decoder --use-stability-score\n```\n\n## Checkpoints \u003ca name=\"checkpoints\"\u003e\u003c/a\u003e\n\nPlease download the checkpoints of EdgeSAM from its Hugging Face Space (all the EdgeSAM variants only differ in the number of training images):\n\n| Model               | COCO mAP | PyTorch | CoreML         | ONNX           |\n| ------------------- | -------- | ------- | -------------- | -------------- |\n| SAM                 | 46.1     | -       | -              | -              |\n| EdgeSAM             | 42.1     | [Download](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam.pth) | [[Encoder](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_encoder.mlpackage.zip)] [[Decoder](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_decoder.mlpackage.zip)] | [[Encoder](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_encoder.onnx)] [[Decoder](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_decoder.onnx)] |\n| EdgeSAM-3x          | 42.7     | [Download](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x.pth) | [[Encoder](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_encoder.mlpackage.zip)] [[Decoder](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_decoder.mlpackage.zip)] | [[Encoder](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_encoder.onnx)] [[Decoder](https://huggingface.co/spaces/chongzhou/EdgeSAM/resolve/main/weights/edge_sam_3x_decoder.onnx)] |\n| EdgeSAM-10x         | 43       | TBA     | TBA            | TBA |\n\nNote: You need to unzip the CoreML model packages before usage.\n\n## iOS App \u003ca name=\"ios\"\u003e\u003c/a\u003e\nWe are planning to release the iOS app that we used in the live demo to the App Store. Please stay tuned!\n\n## Acknowledgements \u003ca name=\"acknowledgement\"\u003e\u003c/a\u003e\nThis study is supported under the RIE2020 Industry Alignment Fund Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). We are grateful to [Han Soong Chong](https://www.linkedin.com/in/hansoong-choong-0493a5155/) for his effort in the demonstration application.\n\nWe appreciate the following projects, which enable EdgeSAM: [SAM](https://github.com/facebookresearch/segment-anything), [MobileSAM](https://github.com/ChaoningZhang/MobileSAM), [FastSAM](https://github.com/CASIA-IVA-Lab/FastSAM), [TinyViT](https://github.com/microsoft/Cream), and [RepViT](https://github.com/THU-MIG/RepViT).\n\n## Citation \u003ca name=\"cite\"\u003e\u003c/a\u003e\n```bibtex\n@article{zhou2023edgesam,\n  title={EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM},\n  author={Zhou, Chong and Li, Xiangtai and Loy, Chen Change and Dai, Bo},\n  journal={arXiv preprint arXiv:2312.06660},\n  year={2023}\n}\n```\n\n## License \u003ca name=\"license\"\u003e\u003c/a\u003e\n\nThis project is licensed under \u003ca rel=\"license\" href=\"https://github.com/chongzhou96/EdgeSAM/blob/master/LICENSE\"\u003eNTU S-Lab License 1.0\u003c/a\u003e. Redistribution and use should follow this license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchongzhou96%2FEdgeSAM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchongzhou96%2FEdgeSAM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchongzhou96%2FEdgeSAM/lists"}