{"id":13442526,"url":"https://github.com/apple/ml-cvnets","last_synced_at":"2025-05-15T17:05:24.836Z","repository":{"id":37714455,"uuid":"419903001","full_name":"apple/ml-cvnets","owner":"apple","description":"CVNets: A library for training computer vision networks","archived":false,"fork":false,"pushed_at":"2023-10-30T17:05:10.000Z","size":6036,"stargazers_count":1861,"open_issues_count":41,"forks_count":240,"subscribers_count":32,"default_branch":"main","last_synced_at":"2025-05-03T20:02:42.766Z","etag":null,"topics":["ade20k","classification","computer-vision","deep-learning","detection","imagenet","machine-learning","mscoco","pascal-voc","pytorch","segmentation"],"latest_commit_sha":null,"homepage":"https://apple.github.io/ml-cvnets","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apple.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-10-21T23:12:39.000Z","updated_at":"2025-05-03T07:28:00.000Z","dependencies_parsed_at":"2024-01-16T22:21:51.598Z","dependency_job_id":"64be7e55-c461-483c-aaf1-274d4c84f173","html_url":"https://github.com/apple/ml-cvnets","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-cvnets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-cvnets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-cvnets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-cvnets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apple","download_url":"https://codeload.github.com/apple/ml-cvnets/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254384988,"owners_count":22062422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ade20k","classification","computer-vision","deep-learning","detection","imagenet","machine-learning","mscoco","pascal-voc","pytorch","segmentation"],"created_at":"2024-07-31T03:01:46.826Z","updated_at":"2025-05-15T17:05:19.824Z","avatar_url":"https://github.com/apple.png","language":"Python","funding_links":[],"categories":["Python","⚡ Efficient Mobile Models"],"sub_categories":["🚀 Backbone Networks"],"readme":"# CVNets: A library for training computer vision networks\n\nCVNets is a computer vision toolkit that allows researchers and engineers to train standard and novel mobile- \nand non-mobile computer vision models for variety of tasks, including object classification, object detection,\nsemantic segmentation, and foundation models (e.g., CLIP).\n\n## Table of contents\n\n   * [What's new?](#whats-new)\n   * [Installation](#installation)\n   * [Getting started](#getting-started)\n   * [Supported models and tasks](#supported-models-and-tasks)\n   * [Maintainers](#maintainers)\n   * [Research effort at Apple using CVNets](#research-effort-at-apple-using-cvnets)\n   * [Contributing to CVNets](#contributing-to-cvnets)\n   * [License](#license)\n   * [Citation](#citation)\n\n## What's new?\n\n   * ***July 2023***: Version 0.4 of the CVNets library includes\n      *  [Bytes Are All You Need: Transformers Operating Directly On File Bytes\n](https://arxiv.org/abs/2306.00238)\n      * [RangeAugment: Efficient online augmentation with Range Learning](https://arxiv.org/abs/2212.10553)\n      * Training and evaluating foundation models (CLIP)\n      * Mask R-CNN\n      * EfficientNet, Swin Transformer, and ViT\n      * Enhanced distillation support\n\n## Installation\n\nWe recommend to use Python 3.10+ and [PyTorch](https://pytorch.org) (version \u003e= v1.12.0)\n\nInstructions below use Conda, if you don't have Conda installed, you can check out [How to Install Conda](https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links).\n\n```bash\n# Clone the repo\ngit clone git@github.com:apple/ml-cvnets.git\ncd ml-cvnets\n\n# Create a virtual env. We use Conda\nconda create -n cvnets python=3.10.8\nconda activate cvnets\n\n# install requirements and CVNets package\npip install -r requirements.txt -c constraints.txt\npip install --editable .\n```\n\n## Getting started\n\n   * General instructions for working with CVNets are given [here](docs/source/en/general). \n   * Examples for training and evaluating models are provided [here](docs/source/en/models) and [here](examples). \n   * Examples for converting a PyTorch model to CoreML are provided [here](docs/source/en/general/README-pytorch-to-coreml.md).\n\n## Supported models and Tasks\n\nTo see a list of available models and benchmarks, please refer to [Model Zoo](docs/source/en/general/README-model-zoo.md) and [examples](examples) folder.\n\n\u003cdetails\u003e\n\u003csummary\u003e\nImageNet classification models\n\u003c/summary\u003e\n\n   * CNNs\n     * [MobileNetv1](https://arxiv.org/abs/1704.04861)\n     * [MobileNetv2](https://arxiv.org/abs/1801.04381)\n     * [MobileNetv3](https://arxiv.org/abs/1905.02244)\n     * [EfficientNet](https://arxiv.org/abs/1905.11946)\n     * [ResNet](https://arxiv.org/abs/1512.03385)\n     * [RegNet](https://arxiv.org/abs/2003.13678)\n   * Transformers\n     * [Vision Transformer](https://arxiv.org/abs/2010.11929)\n     * [MobileViTv1](https://arxiv.org/abs/2110.02178)\n     * [MobileViTv2](https://arxiv.org/abs/2206.02680)\n     * [SwinTransformer](https://arxiv.org/abs/2103.14030)\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nMultimodal Classification\n\u003c/summary\u003e\n\n  * [ByteFormer](https://arxiv.org/abs/2306.00238)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nObject detection\n\u003c/summary\u003e\n\n   * [SSD](https://arxiv.org/abs/1512.02325)\n   * [Mask R-CNN](https://arxiv.org/abs/1703.06870)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nSemantic segmentation\n\u003c/summary\u003e\n\n   * [DeepLabv3](https://arxiv.org/abs/1706.05587)\n   * [PSPNet](https://arxiv.org/abs/1612.01105)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nFoundation models\n\u003c/summary\u003e\n\n   * [CLIP](https://arxiv.org/abs/2103.00020)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nAutomatic Data Augmentation\n\u003c/summary\u003e\n\n   * [RangeAugment](https://arxiv.org/abs/2212.10553)\n   * [AutoAugment](https://arxiv.org/abs/1805.09501)\n   * [RandAugment](https://arxiv.org/abs/1909.13719)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nDistillation\n\u003c/summary\u003e\n\n   * Soft distillation\n   * Hard distillation\n\n\u003c/details\u003e\n\n## Maintainers\n\nThis code is developed by \u003ca href=\"https://sacmehta.github.io\" target=\"_blank\"\u003eSachin\u003c/a\u003e, and is now maintained by Sachin, \u003ca href=\"https://mchorton.com\" target=\"_blank\"\u003eMaxwell Horton\u003c/a\u003e, \u003ca href=\"https://www.mohammad.pro\" target=\"_blank\"\u003eMohammad Sekhavat\u003c/a\u003e, and Yanzi Jin.\n\n### Previous Maintainers\n* \u003ca href=\"https://farzadab.github.io\" target=\"_blank\"\u003eFarzad\u003c/a\u003e\n\n## Research effort at Apple using CVNets\n\nBelow is the list of publications from Apple that uses CVNets:\n\n   * [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer, ICLR'22](https://arxiv.org/abs/2110.02178)\n   * [CVNets: High performance library for Computer Vision, ACM MM'22](https://arxiv.org/abs/2206.02002)\n   * [Separable Self-attention for Mobile Vision Transformers (MobileViTv2)](https://arxiv.org/abs/2206.02680)\n   * [RangeAugment: Efficient Online Augmentation with Range Learning](https://arxiv.org/abs/2212.10553)\n   * [Bytes Are All You Need: Transformers Operating Directly on File Bytes](https://arxiv.org/abs/2306.00238)\n\n## Contributing to CVNets\n\nWe welcome PRs from the community! You can find information about contributing to CVNets in our [contributing](CONTRIBUTING.md) document. \n\nPlease remember to follow our [Code of Conduct](CODE_OF_CONDUCT.md).\n\n## License\n\nFor license details, see [LICENSE](LICENSE). \n\n## Citation\n\nIf you find our work useful, please cite the following paper:\n\n``` \n@inproceedings{mehta2022mobilevit,\n     title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},\n     author={Sachin Mehta and Mohammad Rastegari},\n     booktitle={International Conference on Learning Representations},\n     year={2022}\n}\n\n@inproceedings{mehta2022cvnets, \n     author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, \n     title = {CVNets: High Performance Library for Computer Vision}, \n     year = {2022}, \n     booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, \n     series = {MM '22} \n}\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-cvnets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapple%2Fml-cvnets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-cvnets/lists"}