{"id":13443385,"url":"https://github.com/dbolya/yolact","last_synced_at":"2025-05-14T12:09:56.910Z","repository":{"id":38275058,"uuid":"138796699","full_name":"dbolya/yolact","owner":"dbolya","description":"A simple, fully convolutional model for real-time instance segmentation.","archived":false,"fork":false,"pushed_at":"2023-11-06T07:33:12.000Z","size":21708,"stargazers_count":5098,"open_issues_count":421,"forks_count":1324,"subscribers_count":105,"default_branch":"master","last_synced_at":"2025-04-04T14:32:36.036Z","etag":null,"topics":["instance-segmentation","pytorch","real-time","realtime","yolact"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dbolya.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-26T21:41:31.000Z","updated_at":"2025-04-03T08:22:00.000Z","dependencies_parsed_at":"2022-07-14T06:40:36.281Z","dependency_job_id":"179ec78b-8963-4849-a211-8e55a0964164","html_url":"https://github.com/dbolya/yolact","commit_stats":{"total_commits":553,"total_committers":8,"mean_commits":69.125,"dds":"0.019891500904159143","last_synced_commit":"57b8f2d95e62e2e649b382f516ab41f949b57239"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbolya%2Fyolact","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbolya%2Fyolact/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbolya%2Fyolact/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dbolya%2Fyolact/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dbolya","download_url":"https://codeload.github.com/dbolya/yolact/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248469140,"owners_count":21108963,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["instance-segmentation","pytorch","real-time","realtime","yolact"],"created_at":"2024-07-31T03:01:59.975Z","updated_at":"2025-04-11T19:46:03.545Z","avatar_url":"https://github.com/dbolya.png","language":"Python","funding_links":[],"categories":["Python","🔍 Instance Segmentation","Sensor Processing","Summary","对象检测_分割","Appendix: Object Detection for Natural Scene","🤖 AI \u0026 Machine Learning"],"sub_categories":["🌟 State-of-the-Art Models (2024-2025)","Image Processing","资源传输下载","Papers"],"readme":"# **Y**ou **O**nly **L**ook **A**t **C**oefficien**T**s\n```\n    ██╗   ██╗ ██████╗ ██╗      █████╗  ██████╗████████╗\n    ╚██╗ ██╔╝██╔═══██╗██║     ██╔══██╗██╔════╝╚══██╔══╝\n     ╚████╔╝ ██║   ██║██║     ███████║██║        ██║   \n      ╚██╔╝  ██║   ██║██║     ██╔══██║██║        ██║   \n       ██║   ╚██████╔╝███████╗██║  ██║╚██████╗   ██║   \n       ╚═╝    ╚═════╝ ╚══════╝╚═╝  ╚═╝ ╚═════╝   ╚═╝ \n```\n\nA simple, fully convolutional model for real-time instance segmentation. This is the code for our papers:\n - [YOLACT: Real-time Instance Segmentation](https://arxiv.org/abs/1904.02689)\n - [YOLACT++: Better Real-time Instance Segmentation](https://arxiv.org/abs/1912.06218)\n\n#### YOLACT++ (v1.2) released! ([Changelog](CHANGELOG.md))\nYOLACT++'s resnet50 model runs at 33.5 fps on a Titan Xp and achieves 34.1 mAP on COCO's `test-dev` (check out our journal paper [here](https://arxiv.org/abs/1912.06218)).\n\nIn order to use YOLACT++, make sure you compile the DCNv2 code. (See [Installation](https://github.com/dbolya/yolact#installation))\n\n#### For a real-time demo, check out our ICCV video:\n[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/0pMfmo8qfpQ/0.jpg)](https://www.youtube.com/watch?v=0pMfmo8qfpQ)\n\nSome examples from our YOLACT base model (33.5 fps on a Titan Xp and 29.8 mAP on COCO's `test-dev`):\n\n![Example 0](data/yolact_example_0.png)\n\n![Example 1](data/yolact_example_1.png)\n\n![Example 2](data/yolact_example_2.png)\n\n# Installation\n - Clone this repository and enter it:\n   ```Shell\n   git clone https://github.com/dbolya/yolact.git\n   cd yolact\n   ```\n - Set up the environment using one of the following methods:\n   - Using [Anaconda](https://www.anaconda.com/distribution/)\n     - Run `conda env create -f environment.yml`\n   - Manually with pip\n     - Set up a Python3 environment (e.g., using virtenv).\n     - Install [Pytorch](http://pytorch.org/) 1.0.1 (or higher) and TorchVision.\n     - Install some other packages:\n       ```Shell\n       # Cython needs to be installed before pycocotools\n       pip install cython\n       pip install opencv-python pillow pycocotools matplotlib \n       ```\n - If you'd like to train YOLACT, download the COCO dataset and the 2014/2017 annotations. Note that this script will take a while and dump 21gb of files into `./data/coco`.\n   ```Shell\n   sh data/scripts/COCO.sh\n   ```\n - If you'd like to evaluate YOLACT on `test-dev`, download `test-dev` with this script.\n   ```Shell\n   sh data/scripts/COCO_test.sh\n   ```\n - If you want to use YOLACT++, compile deformable convolutional layers (from [DCNv2](https://github.com/CharlesShang/DCNv2/tree/pytorch_1.0)).\n   Make sure you have the latest CUDA toolkit installed from [NVidia's Website](https://developer.nvidia.com/cuda-toolkit).\n   ```Shell\n   cd external/DCNv2\n   python setup.py build develop\n   ```\n\n\n# Evaluation\nHere are our YOLACT models (released on April 5th, 2019) along with their FPS on a Titan Xp and mAP on `test-dev`:\n\n| Image Size | Backbone      | FPS  | mAP  | Weights                                                                                                              |  |\n|:----------:|:-------------:|:----:|:----:|----------------------------------------------------------------------------------------------------------------------|--------|\n| 550        | Resnet50-FPN  | 42.5 | 28.2 | [yolact_resnet50_54_800000.pth](https://drive.google.com/file/d/1yp7ZbbDwvMiFJEq4ptVKTYTI2VeRDXl0/view?usp=sharing)  | [Mirror](https://ucdavis365-my.sharepoint.com/:u:/g/personal/yongjaelee_ucdavis_edu/EUVpxoSXaqNIlssoLKOEoCcB1m0RpzGq_Khp5n1VX3zcUw) |\n| 550        | Darknet53-FPN | 40.0 | 28.7 | [yolact_darknet53_54_800000.pth](https://drive.google.com/file/d/1dukLrTzZQEuhzitGkHaGjphlmRJOjVnP/view?usp=sharing) | [Mirror](https://ucdavis365-my.sharepoint.com/:u:/g/personal/yongjaelee_ucdavis_edu/ERrao26c8llJn25dIyZPhwMBxUp2GdZTKIMUQA3t0djHLw)\n| 550        | Resnet101-FPN | 33.5 | 29.8 | [yolact_base_54_800000.pth](https://drive.google.com/file/d/1UYy3dMapbH1BnmtZU4WH1zbYgOzzHHf_/view?usp=sharing)      | [Mirror](https://ucdavis365-my.sharepoint.com/:u:/g/personal/yongjaelee_ucdavis_edu/EYRWxBEoKU9DiblrWx2M89MBGFkVVB_drlRd_v5sdT3Hgg)\n| 700        | Resnet101-FPN | 23.6 | 31.2 | [yolact_im700_54_800000.pth](https://drive.google.com/file/d/1lE4Lz5p25teiXV-6HdTiOJSnS7u7GBzg/view?usp=sharing)     | [Mirror](https://ucdavis365-my.sharepoint.com/:u:/g/personal/yongjaelee_ucdavis_edu/Eagg5RSc5hFEhp7sPtvLNyoBjhlf2feog7t8OQzHKKphjw)\n\nYOLACT++ models (released on December 16th, 2019):\n\n| Image Size | Backbone      | FPS  | mAP  | Weights                                                                                                              |  |\n|:----------:|:-------------:|:----:|:----:|----------------------------------------------------------------------------------------------------------------------|--------|\n| 550        | Resnet50-FPN  | 33.5 | 34.1 | [yolact_plus_resnet50_54_800000.pth](https://drive.google.com/file/d/1ZPu1YR2UzGHQD0o1rEqy-j5bmEm3lbyP/view?usp=sharing)  | [Mirror](https://ucdavis365-my.sharepoint.com/:u:/g/personal/yongjaelee_ucdavis_edu/EcJAtMiEFlhAnVsDf00yWRIBUC4m8iE9NEEiV05XwtEoGw) |\n| 550        | Resnet101-FPN | 27.3 | 34.6 | [yolact_plus_base_54_800000.pth](https://drive.google.com/file/d/15id0Qq5eqRbkD-N3ZjDZXdCvRyIaHpFB/view?usp=sharing) | [Mirror](https://ucdavis365-my.sharepoint.com/:u:/g/personal/yongjaelee_ucdavis_edu/EVQ62sF0SrJPrl_68onyHF8BpG7c05A8PavV4a849sZgEA)\n\nTo evalute the model, put the corresponding weights file in the `./weights` directory and run one of the following commands. The name of each config is everything before the numbers in the file name (e.g., `yolact_base` for `yolact_base_54_800000.pth`).\n## Quantitative Results on COCO\n```Shell\n# Quantitatively evaluate a trained model on the entire validation set. Make sure you have COCO downloaded as above.\n# This should get 29.92 validation mask mAP last time I checked.\npython eval.py --trained_model=weights/yolact_base_54_800000.pth\n\n# Output a COCOEval json to submit to the website or to use the run_coco_eval.py script.\n# This command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively.\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json\n\n# You can run COCOEval on the files created in the previous command. The performance should match my implementation in eval.py.\npython run_coco_eval.py\n\n# To output a coco json file for test-dev, make sure you have test-dev downloaded from above and go\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json --dataset=coco2017_testdev_dataset\n```\n## Qualitative Results on COCO\n```Shell\n# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.15.\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --display\n```\n## Benchmarking on COCO\n```Shell\n# Run just the raw model on the first 1k images of the validation set\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --max_images=1000\n```\n## Images\n```Shell\n# Display qualitative results on the specified image.\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=my_image.png\n\n# Process an image and save it to another file.\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png\n\n# Process a whole folder of images.\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder\n```\n## Video\n```Shell\n# Display a video in real-time. \"--video_multiframe\" will process that many frames at once for improved performance.\n# If you want, use \"--display_fps\" to draw the FPS directly on the frame.\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4\n\n# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0\n\n# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!\npython eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4\n```\nAs you can tell, `eval.py` can do a ton of stuff. Run the `--help` command to see everything it can do.\n```Shell\npython eval.py --help\n```\n\n\n# Training\nBy default, we train on COCO. Make sure to download the entire dataset using the commands above.\n - To train, grab an imagenet-pretrained model and put it in `./weights`.\n   - For Resnet101, download `resnet101_reducedfc.pth` from [here](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing).\n   - For Resnet50, download `resnet50-19c8e357.pth` from [here](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing).\n   - For Darknet53, download `darknet53.pth` from [here](https://drive.google.com/file/d/17Y431j4sagFpSReuPNoFcj9h7azDTZFf/view?usp=sharing).\n - Run one of the training commands below.\n   - Note that you can press ctrl+c while training and it will save an `*_interrupt.pth` file at the current iteration.\n   - All weights are saved in the `./weights` directory by default with the file name `\u003cconfig\u003e_\u003cepoch\u003e_\u003citer\u003e.pth`.\n```Shell\n# Trains using the base config with a batch size of 8 (the default).\npython train.py --config=yolact_base_config\n\n# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.\npython train.py --config=yolact_base_config --batch_size=5\n\n# Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name.\npython train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1\n\n# Use the help option to see a description of all available command line arguments\npython train.py --help\n```\n\n## Multi-GPU Support\nYOLACT now supports multiple GPUs seamlessly during training:\n\n - Before running any of the scripts, run: `export CUDA_VISIBLE_DEVICES=[gpus]`\n   - Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3).\n   - You should still do this if only using 1 GPU.\n   - You can check the indices of your GPUs with `nvidia-smi`.\n - Then, simply set the batch size to `8*num_gpus` with the training commands above. The training script will automatically scale the hyperparameters to the right values.\n   - If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using.\n   - If you want to allocate the images per GPU specific for different GPUs, you can use `--batch_alloc=[alloc]` where [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to `batch_size`.\n\n## Logging\nYOLACT now logs training and validation information by default. You can disable this with `--no_log`. A guide on how to visualize these logs is coming soon, but now you can look at `LogVizualizer` in `utils/logger.py` for help.\n\n## Pascal SBD\nWe also include a config for training on Pascal SBD annotations (for rapid experimentation or comparing with other methods). To train on Pascal SBD, proceed with the following steps:\n 1. Download the dataset from [here](http://home.bharathh.info/pubs/codes/SBD/download.html). It's the first link in the top \"Overview\" section (and the file is called `benchmark.tgz`).\n 2. Extract the dataset somewhere. In the dataset there should be a folder called `dataset/img`. Create the directory `./data/sbd` (where `.` is YOLACT's root) and copy `dataset/img` to `./data/sbd/img`.\n 4. Download the COCO-style annotations from [here](https://drive.google.com/open?id=1ExrRSPVctHW8Nxrn0SofU1lVhK5Wn0_S).\n 5. Extract the annotations into `./data/sbd/`.\n 6. Now you can train using `--config=yolact_resnet50_pascal_config`. Check that config to see how to extend it to other models.\n\nI will automate this all with a script soon, don't worry. Also, if you want the script I used to convert the annotations, I put it in `./scripts/convert_sbd.py`, but you'll have to check how it works to be able to use it because I don't actually remember at this point.\n\nIf you want to verify our results, you can download our `yolact_resnet50_pascal_config` weights from [here](https://drive.google.com/open?id=1yLVwtkRtNxyl0kxeMCtPXJsXFFyc_FHe). This model should get 72.3 mask AP_50 and 56.2 mask AP_70. Note that the \"all\" AP isn't the same as the \"vol\" AP reported in others papers for pascal (they use an averages of the thresholds from `0.1 - 0.9` in increments of `0.1` instead of what COCO uses).\n\n## Custom Datasets\nYou can also train on your own dataset by following these steps:\n - Create a COCO-style Object Detection JSON annotation file for your dataset. The specification for this can be found [here](http://cocodataset.org/#format-data). Note that we don't use some fields, so the following may be omitted:\n   - `info`\n   - `liscense`\n   - Under `image`: `license, flickr_url, coco_url, date_captured`\n   - `categories` (we use our own format for categories, see below)\n - Create a definition for your dataset under `dataset_base` in `data/config.py` (see the comments in `dataset_base` for an explanation of each field):\n```Python\nmy_custom_dataset = dataset_base.copy({\n    'name': 'My Dataset',\n\n    'train_images': 'path_to_training_images',\n    'train_info':   'path_to_training_annotation',\n\n    'valid_images': 'path_to_validation_images',\n    'valid_info':   'path_to_validation_annotation',\n\n    'has_gt': True,\n    'class_names': ('my_class_id_1', 'my_class_id_2', 'my_class_id_3', ...)\n})\n```\n - A couple things to note:\n   - Class IDs in the annotation file should start at 1 and increase sequentially on the order of `class_names`. If this isn't the case for your annotation file (like in COCO), see the field `label_map` in `dataset_base`.\n   - If you do not want to create a validation split, use the same image path and annotations file for validation. By default (see `python train.py --help`), `train.py` will output validation mAP for the first 5000 images in the dataset every 2 epochs.\n - Finally, in `yolact_base_config` in the same file, change the value for `'dataset'` to `'my_custom_dataset'` or whatever you named the config object above. Then you can use any of the training commands in the previous section.\n\n#### Creating a Custom Dataset from Scratch\nSee [this nice post by @Amit12690](https://github.com/dbolya/yolact/issues/70#issuecomment-504283008) for tips on how to annotate a custom dataset and prepare it for use with YOLACT.\n\n\n\n\n# Citation\nIf you use YOLACT or this code base in your work, please cite\n```\n@inproceedings{yolact-iccv2019,\n  author    = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},\n  title     = {YOLACT: {Real-time} Instance Segmentation},\n  booktitle = {ICCV},\n  year      = {2019},\n}\n```\n\nFor YOLACT++, please cite\n```\n@article{yolact-plus-tpami2020,\n  author  = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},\n  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, \n  title   = {YOLACT++: Better Real-time Instance Segmentation}, \n  year    = {2020},\n}\n```\n\n\n\n# Contact\nFor questions about our paper or code, please contact [Daniel Bolya](mailto:dbolya@ucdavis.edu).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbolya%2Fyolact","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdbolya%2Fyolact","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbolya%2Fyolact/lists"}