https://github.com/apple/ml-cvnets

CVNets: A library for training computer vision networks
https://github.com/apple/ml-cvnets

ade20k classification computer-vision deep-learning detection imagenet machine-learning mscoco pascal-voc pytorch segmentation

Last synced: about 2 months ago
JSON representation

CVNets: A library for training computer vision networks

Host: GitHub
URL: https://github.com/apple/ml-cvnets
Owner: apple
License: other
Created: 2021-10-21T23:12:39.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-10-30T17:05:10.000Z (over 1 year ago)
Last Synced: 2025-05-03T20:02:42.766Z (2 months ago)
Topics: ade20k, classification, computer-vision, deep-learning, detection, imagenet, machine-learning, mscoco, pascal-voc, pytorch, segmentation
Language: Python
Homepage: https://apple.github.io/ml-cvnets
Size: 5.76 MB
Stars: 1,861
Watchers: 32
Forks: 240
Open Issues: 41
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

        # CVNets: A library for training computer vision networks

CVNets is a computer vision toolkit that allows researchers and engineers to train standard and novel mobile- 

and non-mobile computer vision models for variety of tasks, including object classification, object detection,

semantic segmentation, and foundation models (e.g., CLIP).

## Table of contents

   * [What's new?](#whats-new)

   * [Installation](#installation)

   * [Getting started](#getting-started)

   * [Supported models and tasks](#supported-models-and-tasks)

   * [Maintainers](#maintainers)

   * [Research effort at Apple using CVNets](#research-effort-at-apple-using-cvnets)

   * [Contributing to CVNets](#contributing-to-cvnets)

   * [License](#license)

   * [Citation](#citation)

## What's new?

   * ***July 2023***: Version 0.4 of the CVNets library includes

      *  [Bytes Are All You Need: Transformers Operating Directly On File Bytes

](https://arxiv.org/abs/2306.00238)

      * [RangeAugment: Efficient online augmentation with Range Learning](https://arxiv.org/abs/2212.10553)

      * Training and evaluating foundation models (CLIP)

      * Mask R-CNN

      * EfficientNet, Swin Transformer, and ViT

      * Enhanced distillation support

## Installation

We recommend to use Python 3.10+ and [PyTorch](https://pytorch.org) (version >= v1.12.0)

Instructions below use Conda, if you don't have Conda installed, you can check out [How to Install Conda](https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links).

```bash

# Clone the repo

git clone [email protected]:apple/ml-cvnets.git

cd ml-cvnets

# Create a virtual env. We use Conda

conda create -n cvnets python=3.10.8

conda activate cvnets

# install requirements and CVNets package

pip install -r requirements.txt -c constraints.txt

pip install --editable .

```

## Getting started

   * General instructions for working with CVNets are given [here](docs/source/en/general). 

   * Examples for training and evaluating models are provided [here](docs/source/en/models) and [here](examples). 

   * Examples for converting a PyTorch model to CoreML are provided [here](docs/source/en/general/README-pytorch-to-coreml.md).

## Supported models and Tasks

To see a list of available models and benchmarks, please refer to [Model Zoo](docs/source/en/general/README-model-zoo.md) and [examples](examples) folder.

ImageNet classification models

   * CNNs

     * [MobileNetv1](https://arxiv.org/abs/1704.04861)

     * [MobileNetv2](https://arxiv.org/abs/1801.04381)

     * [MobileNetv3](https://arxiv.org/abs/1905.02244)

     * [EfficientNet](https://arxiv.org/abs/1905.11946)

     * [ResNet](https://arxiv.org/abs/1512.03385)

     * [RegNet](https://arxiv.org/abs/2003.13678)

   * Transformers

     * [Vision Transformer](https://arxiv.org/abs/2010.11929)

     * [MobileViTv1](https://arxiv.org/abs/2110.02178)

     * [MobileViTv2](https://arxiv.org/abs/2206.02680)

     * [SwinTransformer](https://arxiv.org/abs/2103.14030)

Multimodal Classification

  * [ByteFormer](https://arxiv.org/abs/2306.00238)

Object detection

   * [SSD](https://arxiv.org/abs/1512.02325)

   * [Mask R-CNN](https://arxiv.org/abs/1703.06870)

Semantic segmentation

   * [DeepLabv3](https://arxiv.org/abs/1706.05587)

   * [PSPNet](https://arxiv.org/abs/1612.01105)

Foundation models

   * [CLIP](https://arxiv.org/abs/2103.00020)

Automatic Data Augmentation

   * [RangeAugment](https://arxiv.org/abs/2212.10553)

   * [AutoAugment](https://arxiv.org/abs/1805.09501)

   * [RandAugment](https://arxiv.org/abs/1909.13719)

Distillation

   * Soft distillation

   * Hard distillation

## Maintainers

This code is developed by Sachin, and is now maintained by Sachin, Maxwell Horton, Mohammad Sekhavat, and Yanzi Jin.

### Previous Maintainers

* Farzad

## Research effort at Apple using CVNets

Below is the list of publications from Apple that uses CVNets:

   * [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer, ICLR'22](https://arxiv.org/abs/2110.02178)

   * [CVNets: High performance library for Computer Vision, ACM MM'22](https://arxiv.org/abs/2206.02002)

   * [Separable Self-attention for Mobile Vision Transformers (MobileViTv2)](https://arxiv.org/abs/2206.02680)

   * [RangeAugment: Efficient Online Augmentation with Range Learning](https://arxiv.org/abs/2212.10553)

   * [Bytes Are All You Need: Transformers Operating Directly on File Bytes](https://arxiv.org/abs/2306.00238)

## Contributing to CVNets

We welcome PRs from the community! You can find information about contributing to CVNets in our [contributing](CONTRIBUTING.md) document. 

Please remember to follow our [Code of Conduct](CODE_OF_CONDUCT.md).

## License

For license details, see [LICENSE](LICENSE). 

## Citation

If you find our work useful, please cite the following paper:

``` 

@inproceedings{mehta2022mobilevit,

     title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},

     author={Sachin Mehta and Mohammad Rastegari},

     booktitle={International Conference on Learning Representations},

     year={2022}

}

@inproceedings{mehta2022cvnets, 

     author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 

     title = {CVNets: High Performance Library for Computer Vision}, 

     year = {2022}, 

     booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 

     series = {MM '22} 

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/apple/ml-cvnets

Awesome Lists containing this project

README