Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

awesome-vision-architecture

An up-to-date list of progress made in deep learning vision architectures
https://github.com/chenyaofo/awesome-vision-architecture

  • `PDF` - cvnets/blob/main/examples/README-mobilevit.md) ***TL;DR**: This paper present MobileViT, a light-weight and general-purpose vision transformer for mobile devices. MobileViT presents a different perspective for the global processing of information with transformers.*
  • `PDF` - research/vision_transformer) ***TL;DR**: While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In this context, the authors seek to directly apply a pure transformer to sequences of image patches (called Vision Transformer), which performs very well on image classification tasks.*
  • `PDF` - Transformer) ***TL;DR**: This paper presents a new vision Transformer, called Swin Transformer, whose representation is computed with Shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.*
  • `PDF` - PretrainedModels) ***TL;DR**: The authors propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer.*
  • `PDF` - branch topology in the training and single-branch topology (VGG-like style) in the inference. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique.*
  • `PDF` - research/noisystudent) ***TL;DR**: The authors present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet. To achieve this result, they first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. They then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher.*
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/transforms/autoaugment.py) ***TL;DR**: The authors propose a simplified search space for data augmentation that vastly reduces the computational expense of automated augmentation, and permits the removal of a separate proxy task. Despite the simplifications, our method achieves equal or better performance over previous automated augmentation strategies.*
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv3.py) ***TL;DR**: This paper presents the next generation of MobileNets (MobileNetV3) based on a combination of complementary architecture search techniques as well as a novel architecture design.*
  • `PDF` - PyTorch) ***TL;DR**: Prior works have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as opposed to head of a person). The authors therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches.*
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/transforms/autoaugment.py) ***TL;DR**: Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, the authors describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies.*
  • `PDF`
  • `PDF` - frank/SENet) ***TL;DR**: Based on the benefit of enhancing spatial encoding in prior works, the authors propose a novel architectural unit, which we term the “Squeezeand-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels.*
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv2.py) ***TL;DR**: Based on MobileNetV1, the authors devise a new mobile architecture, MobileNetV2, which is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers.*
  • `PDF` - party Code (Stars 1.3k)`](https://github.com/megvii-model/ShuffleNet-Series) ***TL;DR**: The authors introduce an extremely computation-efficient CNN architecture named ShuffleNet, which utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy.*
  • `PDF` - cifar10) ***TL;DR**: The authors propose mixup, a simple learning principle/data augmentation, which trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples.*
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/models/shufflenetv2.py) ***TL;DR**: Prior architecture design is mostly guided by the indirect metric of computation complexity (i.e., FLOPs). In contrast, the authors proposes to use the direct metric (i.e., speed on the target platform) and derives several practical guidelines for efficient network (ShuffleNetV2) design from the empirical observations.*
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py) ***TL;DR**: The authors observe that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. Based on this, they introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.*
  • `PDF`
  • `PDF`
  • `PDF` - party Code (Stars 8.5k)`](https://github.com/Cadene/pretrained-models.pytorch/blob/master/pretrainedmodels/models/xception.py) ***TL;DR**: The authors present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution).*
  • `PDF`
  • `PDF` - mlrg/Cutout) ***TL;DR**: The authors show that the simple regularization technique of randomly masking out square regions of input during training, called cutout, can be used to improve the robustness and overall performance of convolutional neural networks.*
  • `PDF` - residual-networks) [`Third-party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) ***TL;DR**: This paper presents a residual learning framework (ResNet) to ease the training of networks that are substantially deeper than those used previously, which reformulates the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.*
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py) ***TL;DR**: With version 1 and version 2 of Inception family, the authors want to explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.*
  • `PDF` - level accuracy on ImageNet with 50x fewer parameters.*
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/models/vgg.py) ***TL;DR**: From the empirical results, the authors found that a network (VGG) with increasing depth and very small ( 3x3) convolution filters would lead to a significant performace improvement based on the prior-art configurations.*
  • `PDF` - branch topology, leading to increasing of the depth and width of the network while keeping the computational budget constant.*
  • `PDF`
  • `PDF` - party Code (Stars 11.7k)`](https://github.com/pytorch/vision/blob/main/torchvision/models/alexnet.py) ***TL;DR**: This is a pioneering work that exploits a deep convolutional neural network (AlexNet) for large image classification task (ImageNet), which achieves very impressing performance.*
  • `PDF`
  • `Download Link` - 1k** and **ImageNet-21k**. **1)** ImageNet-1k contains 1,281,167 training images, 50,000 validation images of 1000 object classes. **2)** ImageNet-21K, which is bigger and more diverse, consists of 14,197,122 images, each tagged in a single-label fashion by one of 21,841 possible classes. The dataset has no official train-validation split, and the classes are not well-balanced - some classes contain only 1-10 samples, while others contain thousands of samples. Lastly, it is recommended to download this dataset from [Academic Torrents](https://academictorrents.com/browse.php?search=ImageNet) instead of the official website.* `How to cite:` **Imagenet: A Large-scale Hierarchical Image Database** `Cited by 38.9k` `CVPR` `2009` `Princeton University` `ImageNet` [`PDF`](https://image-net.org/static_files/papers/imagenet_cvpr09.pdf)
  • `Download Link` - 10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. **2)** The CIFAR-100 is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. Please refer to the [official website](https://www.cs.toronto.edu/~kriz/cifar.html) for more details.* `How to cite:` **Learning Multiple Layers of Features from Tiny Images** `Cited by 15.4k` `Tech Report` `2009` `Alex Krizhevsky` `CIFAR-10` `CIFAR-100` [`PDF`](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf)
  • `Download Link` - 101–Mining Discriminative Components with Random Forests** `Cited by 912` `ECCV` `2014` `ETH Z¨urich` `F` `o` `o` `d` `-` `1` `0` `1` [`PDF`](https://link.springer.com/content/pdf/10.1007/978-3-319-10599-4_29.pdf)
  • **rwightman/pytorch-image-models** - loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.*
  • **Paper with Code** - of-the-art papers and a leaderboard/benchmark of SoTA on varying datasets.*