Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/apple/ml-cvnets
CVNets: A library for training computer vision networks
https://github.com/apple/ml-cvnets
ade20k classification computer-vision deep-learning detection imagenet machine-learning mscoco pascal-voc pytorch segmentation
Last synced: 3 days ago
JSON representation
CVNets: A library for training computer vision networks
- Host: GitHub
- URL: https://github.com/apple/ml-cvnets
- Owner: apple
- License: other
- Created: 2021-10-21T23:12:39.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-30T17:05:10.000Z (about 1 year ago)
- Last Synced: 2025-01-11T15:04:24.067Z (10 days ago)
- Topics: ade20k, classification, computer-vision, deep-learning, detection, imagenet, machine-learning, mscoco, pascal-voc, pytorch, segmentation
- Language: Python
- Homepage: https://apple.github.io/ml-cvnets
- Size: 5.76 MB
- Stars: 1,820
- Watchers: 33
- Forks: 234
- Open Issues: 41
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# CVNets: A library for training computer vision networks
CVNets is a computer vision toolkit that allows researchers and engineers to train standard and novel mobile-
and non-mobile computer vision models for variety of tasks, including object classification, object detection,
semantic segmentation, and foundation models (e.g., CLIP).## Table of contents
* [What's new?](#whats-new)
* [Installation](#installation)
* [Getting started](#getting-started)
* [Supported models and tasks](#supported-models-and-tasks)
* [Maintainers](#maintainers)
* [Research effort at Apple using CVNets](#research-effort-at-apple-using-cvnets)
* [Contributing to CVNets](#contributing-to-cvnets)
* [License](#license)
* [Citation](#citation)## What's new?
* ***July 2023***: Version 0.4 of the CVNets library includes
* [Bytes Are All You Need: Transformers Operating Directly On File Bytes
](https://arxiv.org/abs/2306.00238)
* [RangeAugment: Efficient online augmentation with Range Learning](https://arxiv.org/abs/2212.10553)
* Training and evaluating foundation models (CLIP)
* Mask R-CNN
* EfficientNet, Swin Transformer, and ViT
* Enhanced distillation support## Installation
We recommend to use Python 3.10+ and [PyTorch](https://pytorch.org) (version >= v1.12.0)
Instructions below use Conda, if you don't have Conda installed, you can check out [How to Install Conda](https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links).
```bash
# Clone the repo
git clone [email protected]:apple/ml-cvnets.git
cd ml-cvnets# Create a virtual env. We use Conda
conda create -n cvnets python=3.10.8
conda activate cvnets# install requirements and CVNets package
pip install -r requirements.txt -c constraints.txt
pip install --editable .
```## Getting started
* General instructions for working with CVNets are given [here](docs/source/en/general).
* Examples for training and evaluating models are provided [here](docs/source/en/models) and [here](examples).
* Examples for converting a PyTorch model to CoreML are provided [here](docs/source/en/general/README-pytorch-to-coreml.md).## Supported models and Tasks
To see a list of available models and benchmarks, please refer to [Model Zoo](docs/source/en/general/README-model-zoo.md) and [examples](examples) folder.
ImageNet classification models
* CNNs
* [MobileNetv1](https://arxiv.org/abs/1704.04861)
* [MobileNetv2](https://arxiv.org/abs/1801.04381)
* [MobileNetv3](https://arxiv.org/abs/1905.02244)
* [EfficientNet](https://arxiv.org/abs/1905.11946)
* [ResNet](https://arxiv.org/abs/1512.03385)
* [RegNet](https://arxiv.org/abs/2003.13678)
* Transformers
* [Vision Transformer](https://arxiv.org/abs/2010.11929)
* [MobileViTv1](https://arxiv.org/abs/2110.02178)
* [MobileViTv2](https://arxiv.org/abs/2206.02680)
* [SwinTransformer](https://arxiv.org/abs/2103.14030)Multimodal Classification
* [ByteFormer](https://arxiv.org/abs/2306.00238)
Object detection
* [SSD](https://arxiv.org/abs/1512.02325)
* [Mask R-CNN](https://arxiv.org/abs/1703.06870)Semantic segmentation
* [DeepLabv3](https://arxiv.org/abs/1706.05587)
* [PSPNet](https://arxiv.org/abs/1612.01105)Foundation models
* [CLIP](https://arxiv.org/abs/2103.00020)
Automatic Data Augmentation
* [RangeAugment](https://arxiv.org/abs/2212.10553)
* [AutoAugment](https://arxiv.org/abs/1805.09501)
* [RandAugment](https://arxiv.org/abs/1909.13719)Distillation
* Soft distillation
* Hard distillation## Maintainers
This code is developed by Sachin, and is now maintained by Sachin, Maxwell Horton, Mohammad Sekhavat, and Yanzi Jin.
### Previous Maintainers
* Farzad## Research effort at Apple using CVNets
Below is the list of publications from Apple that uses CVNets:
* [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer, ICLR'22](https://arxiv.org/abs/2110.02178)
* [CVNets: High performance library for Computer Vision, ACM MM'22](https://arxiv.org/abs/2206.02002)
* [Separable Self-attention for Mobile Vision Transformers (MobileViTv2)](https://arxiv.org/abs/2206.02680)
* [RangeAugment: Efficient Online Augmentation with Range Learning](https://arxiv.org/abs/2212.10553)
* [Bytes Are All You Need: Transformers Operating Directly on File Bytes](https://arxiv.org/abs/2306.00238)## Contributing to CVNets
We welcome PRs from the community! You can find information about contributing to CVNets in our [contributing](CONTRIBUTING.md) document.
Please remember to follow our [Code of Conduct](CODE_OF_CONDUCT.md).
## License
For license details, see [LICENSE](LICENSE).
## Citation
If you find our work useful, please cite the following paper:
```
@inproceedings{mehta2022mobilevit,
title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
author={Sachin Mehta and Mohammad Rastegari},
booktitle={International Conference on Learning Representations},
year={2022}
}@inproceedings{mehta2022cvnets,
author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad},
title = {CVNets: High Performance Library for Computer Vision},
year = {2022},
booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
series = {MM '22}
}```