{"id":60841,"url":"https://github.com/axruff/DeepLearning","name":"DeepLearning","description":"A collection of research papers, datasets and software on Deep Learning","projects_count":307,"last_synced_at":"2026-07-08T22:00:27.183Z","repository":{"id":109681506,"uuid":"167366902","full_name":"axruff/DeepLearning","owner":"axruff","description":"A collection of research papers, datasets and software on Deep Learning","archived":false,"fork":false,"pushed_at":"2023-06-15T12:09:52.000Z","size":992,"stargazers_count":31,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-06-20T23:04:04.655Z","etag":null,"topics":["awesome-list","best-practices","computer-vision","deep-learning","machine-learning","neural-network","neural-networks","papers","research","supervised-learning","survey","tomography","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/axruff.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-24T12:50:27.000Z","updated_at":"2026-05-24T15:28:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"86848596-ddb1-4042-a4ac-0b9f7d987012","html_url":"https://github.com/axruff/DeepLearning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/axruff/DeepLearning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axruff%2FDeepLearning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axruff%2FDeepLearning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axruff%2FDeepLearning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axruff%2FDeepLearning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/axruff","download_url":"https://codeload.github.com/axruff/DeepLearning/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axruff%2FDeepLearning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35279442,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-08T02:00:06.796Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"created_at":"2024-05-15T00:00:17.992Z","updated_at":"2026-07-08T22:00:27.183Z","primary_language":null,"list_of_lists":false,"displayable":true,"categories":["Reinforcement Learning","Models","Semi Supervised","Pruning and Compression","Unsupervised Learning","Analysis and Interpretability","Multitask Learning","Optical Flow","Optimization and Regularisation","Mutual Learning","Weakly Supervised","Segmentation","Instance Segmentation","Interactive Segmentation","Anomaly Detection","Semantic Correspondence","Transfer Learning"],"sub_categories":["Inverse Reinforcement Learning","Logic and Semantics","Multi-level","Transformers","Context and Attention","Composition","Capsule Networks","3D Shape"],"readme":"# Deep Learning papers and resources\n\n##### Table of Contents\n\n- [💎 Neural Networks](#neural-networks)\n  - [⭕ Models](#models)\n    - [Multi-level](#multi-level)\n    - [Context and Attention](#context-and-attention)\n    - [Composition](#composition)\n    - [Capsule Networks](#capsule-networks)\n    - [Transformers](#transformers)\n    - [3D Shape and Neural Rendering](#3d-shape)\n    - [Logic and Semantics](#logic-and-semantics)\n  - [💪 Optimization](#optimization)\n    - [Optimization and Regularisation](#optimization-and-regularisation)\n    - [Pruning, Compression](#pruning-and-compression)\n  - [📊 Analysis and Interpretability](#analysis-and-interpretability) \n- [📜 Tasks](#tasks)\n  - [Segmentation](#segmentation)\n  - [Instance Segmentation](#instance-segmentation)\n  - [Interactive Segmentation](#interactive-segmentation)\n  - [Semantic Correspondence](#semantic-correspondence)\n  - [Anomaly Detection](#anomaly-detection)\n  - [Optical Flow](#optical-flow)\n- [⚙️ Methods](#neural-networks)\n  - [TL - Transfer Learning](#transfer-learning)\n  - [GM - Generative Modelling](#generative-modelling)\n  - [WS - Weakly Supervised Learning](#weakly-supervised)\n  - [SSL - Semi-supervised Learning](#semi-supervised)\n  - [USL - Un- and Self-supervised Learning](#unsupervised-learning)\n  - [CL - Collaborative Learning](#mutual-learning)\n  - [MTL - Multi-task Learning](#multitask-learning)\n  - [AD - Anomaly Detection](#anomaly-detection)\n  - [RL - Reinforcement Learning](#reinforcement-learning)\n  - [IRL - Inverse Reinforcement Learning](#inverse-reinforcement-learning)\n- [🎁 Datasets](#datasets)\n- [⚔ Benchmarks](#benchmarks)\n- [🌍 Applications](#applications)\n  - [Applications: Medical Imaging](#applications-medical-imaging)\n  - [Applications: X-ray Imaging](#applications-x-ray-imaging)\n  - [Applications: Image Registration](#applications-image-registration)\n  - [Applications: Video and Motion](#applications-video)\n  - [Applications: Denoising and Superresolution](#application-denoising-and-superresolution)\n  - [Applications: Inpainting](#applications-inpainting)\n  - [Applications: Photography](#applications-photography)\n  - [Applications: Misc](#applications-misc)\n- [💻 Software](#software)\n- [📈 Overview](#overview)\n- [💬 Opinions](#opinions)\n\n\n\n\n**Notations**\n\n:white_check_mark: - Checked\n\n⭕ - To Check\n\n📜 - Survey\n\n\n# Neural Networks\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Models\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n[1998 - **[LeNet]**: Gradient-based learning applied to document recognition](https://ieeexplore.ieee.org/document/726791)\n\n\n[2012 - **[AlexNet]** ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) ✅\n\n\n[2013 - Learning Hierarchical Features for Scene Labeling](http://yann.lecun.com/exdb/publis/pdf/farabet-pami-13.pdf)\n\n---\n[2013 - **[R-CNN]** Rich feature hierarchies for accurate object detection and semantic segmentation](https://arxiv.org/abs/1311.2524)\n\n\u003csup\u003eObject detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset.\u003c/sup\u003e\n\n\u003cimg src=\"http://3.bp.blogspot.com/-aM69pqJLP9k/VT2927f8WmI/AAAAAAAAAv8/7S49kEq5Ss0/s1600/%E6%93%B7%E5%8F%96.PNG\" width=\"400\"\u003e\n\n[2014 - **[OverFeat]**: Integrated Recognition, Localization and Detection using Convolutional Networks](https://arxiv.org/pdf/1312.6229.pdf)\n\n[2014 - **[Seq2Seq]**: Sequence to Sequence Learning with Neural Networks](https://arxiv.org/pdf/1409.3215.pdf)\n\n\n---\n[2014 - **[VGG]** Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) ✅\n\n\u003csub\u003eIn this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.\u003c/sub\u003e\n\n\u003cimg src=\"https://cdn-images-1.medium.com/max/1000/1*HzxRI1qHXjiVXla-_NiMBA.png\" width=\"350\"\u003e\n\n---\n[2014 - **[GoogleNet]** Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842) ✅\n\n\u003csub\u003eWe propose a deep convolutional neural network architecture codenamed \"Inception\", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.\u003c/sub\u003e\n\n\u003cimg src=\"https://miro.medium.com/max/5176/1*ZFPOSAted10TPd3hBQU8iQ.png\" width=\"350\"\u003e\n\n\n[2014 - Neural Turing Machines](https://arxiv.org/abs/1410.5401)\n\n---\n[2015 - **[ResNet]** Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) ✅\n\n\u003csub\u003eDeeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers.\nThe depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC \u0026 COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.\u003c/sub\u003e\n\n\u003cimg src=\"https://miro.medium.com/max/3048/1*6hF97Upuqg_LdsqWY6n_wg.png\" width=\"350\"\u003e\n\n---\n[2015 - Spatial Transformer Networks](https://arxiv.org/abs/1506.02025)\n\n\u003csub\u003eConvolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which \u003cb\u003eexplicitly allows the spatial manipulation\u003c/b\u003e of data within the network. \u003cb\u003eThis differentiable module\u003c/b\u003e can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn \u003cb\u003einvariance to translation, scale, rotation and more generic warping\u003c/b\u003e, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.\u003c/sub\u003e\n\n\u003cimg src=\"https://miro.medium.com/max/1104/0*n3FxIWWb46ARPww-\" width=\"350\"\u003e\n\n---\n[2016 - **[WRN]**: Wide Residual Networks](https://arxiv.org/abs/1605.07146) [[github]](https://github.com/szagoruyko/wide-residual-networks)\n\n[2015 - **[FCN]** Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/abs/1411.4038)\n\n\u003cimg src=\"http://deeplearning.net/tutorial/_images/cat_segmentation.png\" width=\"400\"\u003e\n\n\n[2015 - **[U-net]**: Convolutional networks for biomedical image segmentation](https://arxiv.org/abs/1505.04597) ✅\n\n[2016 - **[Xception]**: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/abs/1610.02357)\n[Implementation](https://colab.research.google.com/drive/1BT_t64JCzr8ge51orG8uLBLIL7w1Hos4)\n\n[2016 - **[V-Net]**: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation](https://arxiv.org/abs/1606.04797)\n\n[2017 - **[MobileNets]**: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)\n\n[Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials](https://arxiv.org/abs/1210.5644)\n\n\u003cimg src=\"http://vladlen.info/wp-content/uploads/2011/12/densecrf1.png\" width=\"300\"\u003e\n\n\n[2018 - **[TernausNet]**: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation](https://arxiv.org/abs/1801.05746) ✅\n\n\n---\n[2018 - CubeNet: Equivariance to 3D Rotation and Translation](http://openaccess.thecvf.com/content_ECCV_2018/papers/Daniel_Worrall_CubeNet_Equivariance_to_ECCV_2018_paper.pdf)[[github]](https://github.com/deworrall92/cubenet), [*[video]*](https://www.youtube.com/watch?v=TlzRyHbWeP0\u0026feature=youtu.be) ⭕\n\n\u003cimg src=\"https://i.pinimg.com/564x/8c/c8/44/8cc844bb8784d93790f9d2d2552297bf.jpg\" width=\"350\"\u003e\n\n---\n[2018 - Deep Rotation Equivariant Network](https://arxiv.org/abs/1705.08623)[[github]](https://github.com/ZJULearning/DREN/raw/master/img/rotate_equivariant.png) ⭕\n\n\u003cimg src=\"https://github.com/ZJULearning/DREN/raw/master/img/rotate_equivariant.png\" width=\"350\"\u003e\n\n---\n[2018 - ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698)\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/b69cb596-c002-4f82-9ad8-ff733a3214f6/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210209%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210209T103524Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=c918998a89773ae0cba4ec47e8f110f873df01872b1b1e33756085dc26609007\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n---\n[2019 - **[PacNet]**: Pixel-Adaptive Convolutional Neural Networks](https://arxiv.org/abs/1904.05373)\n\n\u003cimg src=\"https://suhangpro.github.io/pac/fig/pac.png\" width=\"350\"\u003e\n\n---\n[2019 - Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation](https://arxiv.org/abs/1903.02120v3) [[github]](https://github.com/LinZhuoChen/DUpsampling) ⭕\n\n\u003cimg src=\"https://tonghe90.github.io/papers/cvpr2019_tz.png\" width=\"400\"\u003e\n\n[2019 - Panoptic Feature Pyramid Networks](http://openaccess.thecvf.com/content_CVPR_2019/html/Kirillov_Panoptic_Feature_Pyramid_Networks_CVPR_2019_paper.html) ⭕\n\n---\n[2019 - **[DeeperLab]**: Single-Shot Image Parser](https://arxiv.org/abs/1902.05093) ⭕\n\n\u003cimg src=\"http://deeperlab.mit.edu/deeperlab_illustration.png\" width=\"350\"\u003e\n\n---\n[2019 - **[EfficientNet]**: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) ⭕\n\n\u003cimg src=\"https://miro.medium.com/max/4044/1*xQCVt1tFWe7XNWVEmC6hGQ.png\" width=\"350\"\u003e\n\n---\n[2019 - Hamiltonian Neural Networks](https://arxiv.org/abs/1906.01563)\n\n\u003csub\u003eEven though neural networks enjoy widespread use, they still struggle to learn the basic laws of physics. How might we endow them with better inductive biases? In this paper, we draw inspiration from Hamiltonian mechanics to train models that learn and respect exact conservation laws in an unsupervised manner. We evaluate our models on problems where conservation of energy is important, including the two-body problem and pixel observations of a pendulum. Our model trains faster and generalizes better than a regular neural network. An interesting side effect is that our model is perfectly reversible in time.\u003c/sub\u003e\n\n\n\u003cimg src=\"https://greydanus.github.io/assets/hamiltonian-nns/overall-idea.png\" width=\"350\"\u003e^\n\n---\n[2020 - Roto-Translation Equivariant Convolutional Networks: Application to Histopathology Image Analysis](https://arxiv.org/abs/2002.08725)\n\n\u003cimg src=\"https://storage.googleapis.com/groundai-web-prod/media/users/user_14/project_408932/images/x1.png\" width=\"350\"\u003e\n\n---\n[2020 - Neural Operator: Graph Kernel Network for Partial Differential Equations](https://arxiv.org/abs/2003.03485)\n\n\u003cimg src=\"https://i.pinimg.com/564x/ca/d3/4a/cad34a3e6ef844515239d0ba80d40f8a.jpg\" width=\"350\"\u003e\n\n---\n[2021 - Learning Neural Network Subspaces](https://arxiv.org/abs/2102.10472)\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/b95defb7-3c23-4997-9e83-98205cdc7b38/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210301%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210301T145215Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=7c3ecb403e56f47292957a7f029fc7e538f68c8d24f3a8f50fb18f2256ac6ee5\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"250\"\u003e\n\n\u003csub\u003eRecent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.\u003c/sub\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n### Multi-level\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n---\n[2014 - **[SPP-Net]** Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/abs/1406.4729)\n\n\u003cimg src=\"http://kaiminghe.com/eccv14sppnet/img/sppnet.jpg\" width=\"350\"\u003e\n\n---\n[2016 - **[ParseNet]**: Looking Wider to See Better](https://arxiv.org/abs/1506.04579)\n\n\u003cimg src=\"https://miro.medium.com/max/700/1*dRhGetHArI_bs6IdiIFhkA.png\" width=\"350\"\u003e\n\n---\n[2016 - **[PSPNet]**: Pyramid Scene Parsing Network](https://arxiv.org/abs/1612.01105v2) [[github]](https://github.com/hszhao/PSPNet) ✅\n\n\u003cimg src=\"https://hszhao.github.io/projects/pspnet/figures/pspnet.png\" width=\"400\"\u003e\n\n---\n[2016 - **[DeepLab]**: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs](https://arxiv.org/pdf/1606.00915.pdf) ✅\n\n---\n[2015 - Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net](https://arxiv.org/abs/1511.06881)\n\n\u003cimg src=\"https://media.springernature.com/original/springer-static/image/chp%3A10.1007%2F978-3-319-46454-1_39/MediaObjects/419978_1_En_39_Fig1_HTML.gif\" width=\"350\"\u003e\n\n---\n[2016 - Attention to Scale: Scale-aware Semantic Image Segmentation](https://arxiv.org/abs/1511.03339)\n\n\u003cimg src=\"http://liangchiehchen.com/fig/attention.jpg\" width=\"350\"\u003e\n\n[2017 - Rethinking Atrous Convolution for Semantic Image Segmentation](https://arxiv.org/pdf/1706.05587.pdf)\n\n\n---\n[2017 - Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144)\n\n\u003cimg src=\"https://1.bp.blogspot.com/-Q0-o_ej8BDU/WTYnS568nPI/AAAAAAAAADQ/TTBczrPIQi8IvXZrjy3suRDBlo_p1pONQCLcB/s640/r1.png\" width=\"400\"\u003e\n\n---\n[2018 - **[DeepLabv3]**: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611)\n\n\u003cimg src=\"https://2.bp.blogspot.com/-gxnbZ9w2Dro/WqMOQTJ_zzI/AAAAAAAACeA/dyLgkY5TnFEf2j6jyXDXIDWj_wrbHhteQCLcBGAs/s640/image2.png\" width=\"400\"\u003e\n\n\n---\n[2019 - **[FastFCN]**: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation](https://arxiv.org/abs/1903.11816v1) [[github]](https://github.com/wuhuikai/FastFCN) ✅\n\n\u003cimg src=\"http://wuhuikai.me/FastFCNProject/images/Framework.png\" width=\"350\"\u003e\n\n---\n[2019 - Making Convolutional Networks Shift-Invariant Again](https://arxiv.org/abs/1904.11486)\n\n\u003cimg src=\"https://i.pinimg.com/564x/72/a2/5c/72a25c7d87e1c4dfef45bec81adee2e7.jpg\" width=\"250\"\u003e\n\n---\n[2019 - **[LEDNet]**: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation](https://arxiv.org/abs/1905.02423v1)\n\n\u003cimg src=\"http://www.programmersought.com/images/387/eb5e83159442106d19fbd79698e299eb.png\" width=\"300\"\u003e\n\n---\n[2019 - Feature Pyramid Encoding Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1909.08599v1)\n\n\u003cimg src=\"https://storage.googleapis.com/groundai-web-prod/media%2Fusers%2Fuser_290654%2Fproject_390693%2Fimages%2FFPENet.png\" width=\"350\"\u003e\n\n---\n[2019 - Efficient Segmentation: Learning Downsampling Near Semantic Boundaries](https://arxiv.org/abs/1907.07156)\n\n\u003cimg src=\"https://images.deepai.org/converted-papers/1907.07156/x5.png\" width=\"250\"\u003e\n\n---\n[2019 - PointRend: Image Segmentation as Rendering](https://arxiv.org/abs/1912.08193)\n\n\u003cimg src=\"https://media.arxiv-vanity.com/render-output/1976701/x3.png\" width=\"300\"\u003e\n\n---\n[2019 - Fixing the train-test resolution discrepancy](https://arxiv.org/abs/1906.06423) ✅\n\n\u003cimg src=\"https://raw.githubusercontent.com/facebookresearch/FixRes/master/image/image2.png\" width=\"350\"\u003e\n\n\u003e This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. \n\n\u003e We experimentally validate that, for a target test resolu- tion, using a lower train resolution offers better classification at test time.\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n### Context and Attention\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n---\n[2016 - Image Captioning with Semantic Attention](https://arxiv.org/abs/1603.03925)\n\n\u003cimg src=\"http://cdn-ak.f.st-hatena.com/images/fotolife/P/PDFangeltop1/20160406/20160406161035.png\" width=\"350\"\u003e\n\n---\n[2018 - **[EncNet]** Context Encoding for Semantic Segmentation](https://arxiv.org/abs/1803.08904v1) [[github]](https://github.com/zhanghang1989/PyTorch-Encoding) ⭕\n\n\u003cimg src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRZS4gdvv26N8N7dpr92pPoHmVP3RQ8ztddravjJlwHr1Sw5fCT\" width=\"400\"\u003e\n\n---\n[2018 - Tell Me Where to Look: Guided Attention Inference Network](https://arxiv.org/abs/1802.10171)\n\n\u003cimg src=\"https://storage.googleapis.com/groundai-web-prod/media%2Fusers%2Fuser_55108%2Fproject_88090%2Fimages%2Fx1.png\" width=\"350\"\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n### Composition\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n[2005 - Image Parsing: Unifying Segmentation, Detection, and Recognition](https://link.springer.com/article/10.1007/s11263-005-6642-x) ⭕\n\n\n[2013 - Complexity of Representation and Inference in Compositional Models with Part Sharing](https://arxiv.org/abs/1301.3560)\n\n---\n[2017 - Interpretable Convolutional Neural Networks](https://arxiv.org/abs/1710.00935) ⭕\n\n\u003cimg src=\"https://miro.medium.com/max/2712/0*DGs0o1DFHCaCMZvY\" width=\"350\"\u003e\n\n---\n[2019 - Local Relation Networks for Image Recognition ](https://arxiv.org/pdf/1904.11491.pdf)\n\n\u003cimg src=\"https://storage.googleapis.com/groundai-web-prod/media%2Fusers%2Fuser_10859%2Fproject_356834%2Fimages%2Fx1.png\" width=\"350\"\u003e\n\n---\n[2017 - Teaching Compositionality to CNNs](https://www.semanticscholar.org/paper/Teaching-Compositionality-to-CNNs-Stone-Wang/3726b82007512a15a530fd1adad57af58a9abb62) ⭕\n\n\u003cimg src=\"https://www.vicarious.com/wp-content/uploads/2017/10/compositionality3.png\" width=\"350\"\u003e\n\n---\n[2020 - Concept Bottleneck Models](https://arxiv.org/abs/2007.04612) ⭕\n\n\u003cimg src=\"https://images.deepai.org/converted-papers/2007.04612/figures/teaser.png\" width=\"300\"\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n### Capsule Networks\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n---\n[2017 - Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829) ⭕\n\n\u003cimg src=\"https://cdn-images-1.medium.com/fit/t/1600/480/0*9fvb_xaSSqW7XVb_.png\" width=\"350\"\u003e\n\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n### Transformers\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n[2020 - **SURVEY**: A Survey on Visual Transformer](https://arxiv.org/abs/2012.12556) 📜\n\n[2021 - **SURVEY**: Transformers in Vision: A Survey](https://arxiv.org/abs/2101.01169) 📜\n\n[2023 - **SURVEY**: A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks](https://paperswithcode.com/paper/a-comprehensive-survey-on-applications-of) 📜\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n### 3D Shape\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n---\n[2020 - **[NeRF]**: Representing Scenes as Neural Radiance Fields for View Synthesis ](https://arxiv.org/abs/2003.08934) ⭕\n\n\u003cimg src=\"https://uploads-ssl.webflow.com/51e0d73d83d06baa7a00000f/5e700ef6067b43821ed52768_pipeline_website-01-p-800.png\" width=\"350\"\u003e\n\n\u003csub\u003eWe present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying \u003cb\u003econtinuous volumetric scene function \u003c/b\u003e using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (θ,ϕ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.\u003c/sub\u003e\n\n---\n[2020 - [BLOG] NeRF Explosion 2020](https://dellaert.github.io/NeRF/)\n\n---\n[2020 - **[SURVEY]** State of the Art on Neural Rendering](https://arxiv.org/abs/2004.03805) 📜\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/c8e90a05-3207-43cc-9a16-00be7bdae536/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210205%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210205T142824Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=432f6096a6b940cadaade65b71dd7d85628e4567565489a3ced22b0fd6160f52\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n---\n[2020 - AutoInt: Automatic Integration for Fast Neural Volume Rendering](https://arxiv.org/abs/2012.01714?s=09)\n\n\u003cimg src=\"https://i.pinimg.com/564x/cb/00/d8/cb00d86700bc4e926170f5b80d5503a2.jpg\" width=\"250\"\u003e\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/42fac0d1-1396-4239-8ace-df37606f50b6/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210208%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210208T124231Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=d6ac5d7cc25bc2a9ff00a0db5cdf1fc31695353e532a0d6b1d03c9c53793e019\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\u003csub\u003eNumerical integration is a foundational technique in scientific computing and is at the core of many computer vision applications. Among these applications, implicit neural volume rendering has recently been proposed as a new paradigm for view synthesis, achieving photorealistic image quality. However, a fundamental obstacle to making these methods practical is the extreme computational and memory requirements caused by the required volume integrations along the rendered rays during training and inference. Millions of rays, each requiring hundreds of forward passes through a neural network are needed to approximate those integrations with Monte Carlo sampling. Here, \u003cb\u003ewe propose automatic integration\u003c/b\u003e, a new framework for learning efficient, closed-form solutions to integrals using implicit neural representation networks. For training, we instantiate the computational graph corresponding to the derivative of the implicit neural representation. The graph is fitted to the signal to integrate. After optimization, we reassemble the graph to obtain a network that represents the \u003cb\u003eantiderivative\u003c/b\u003e. By the fundamental theorem of calculus, this enables the calculation of any definite integral in two evaluations of the network. Using this approach, we demonstrate a greater than 10x improvement in computation requirements, enabling fast neural volume rendering.\u003c/sub\u003e\n\n---\n[2020 - A Curvature and Density‐based Generative Representation of Shapes](https://onlinelibrary.wiley.com/doi/full/10.1111/cgf.14094)\n\n\u003cimg src=\"https://i.pinimg.com/564x/b7/8d/35/b78d351ffc32e224cac2f243b70275e2.jpg\" width=\"350\"\u003e\n\n\u003csub\u003e This paper introduces a \u003cb\u003egenerative model\u003c/b\u003e for 3D surfaces based on a representation of shapes with \u003cb\u003emean curvature and metric\u003c/b\u003e, which are i\u003cb\u003envariant under rigid transformation\u003c/b\u003e. Hence, compared with existing 3D machine learning frameworks, our model substantially reduces the influence of translation and rotation. In addition, the local structure of shapes will be more precisely captured, since the curvature is explicitly encoded in our model. Specifically, every surface is first conformally \u003cb\u003emapped to a canonical domain\u003c/b\u003e, such as a \u003cb\u003eunit disk\u003c/b\u003e or a \u003cb\u003eunit sphere\u003c/b\u003e. Then, it is represented by two functions: the mean curvature half‐density and the vertex density, over this canonical domain. Assuming that input shapes follow a certain distribution in a latent space, we use the variational autoencoder to learn the latent space representation. After the learning, we can generate variations of shapes by randomly sampling the distribution in the latent space. Surfaces with triangular meshes can be reconstructed from the generated data by applying isotropic remeshing and spin transformation, which is given by Dirac equation. We demonstrate the effectiveness of our model on datasets of man‐made and biological shapes and compare the results with other methods. \u003c/sub\u003e\n\n---\n[2021 - Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks](https://paschalidoud.github.io/neural_parts)\n\n\u003cimg src=\"https://paschalidoud.github.io/projects/neural_parts/architecture.png\" width=\"450\"\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n### Logic and Semantics\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n[2019 - Neural Logic Machines](https://arxiv.org/abs/1904.11694)\n\n\u003csub\u003eWe propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning. NLMs exploit the power of both neural networks---as function approximators, and logic programming---as a symbolic processor for objects with properties, relations, logic connectives, and quantifiers. After being trained on small-scale tasks (such as sorting short arrays), NLMs can recover lifted rules, and generalize to large-scale tasks (such as sorting longer arrays). In our experiments, NLMs achieve perfect generalization in a number of tasks, from relational reasoning tasks on the family tree and general graphs, to decision making tasks including sorting arrays, finding shortest paths, and playing the blocks world. Most of these tasks are hard to accomplish for neural networks or inductive logic programming alone.\u003c/sub\u003e\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/f780c5e1-9adc-4ea5-b856-87091ec636ce/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210614%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210614T092347Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=4376dfc3297b1f9995bd7b265aad5ef9b1d0667f2757ef26d3627a27f48632a4\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Optimization and Regularisation\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n[Random search for hyper-parameter optimisation](http://www.jmlr.org/papers/v13/bergstra12a.html)\n\n[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf)\n\n[**[Adam]**: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)\n\n[**[Dropout]**: A Simple Way to Prevent Neural Networks from Overfitting](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)\n\n\n[Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks](https://arxiv.org/abs/1406.6909) ✅\n\nMulti-Scale Context Aggregation by Dilated Convolutions\nhttps://arxiv.org/abs/1511.07122\n\n\u003cimg src=\"https://user-images.githubusercontent.com/22321977/48708394-7121c980-ec3d-11e8-98ab-2c116df0aaae.png\" width=\"300\"\u003e\n\n---\n[2017 - The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://papers.nips.cc/paper/7003-the-marginal-value-of-adaptive-gradient-methods-in-machine-learning)\n\n\u003e (i) Adaptive methods find solutions that generalize worse than those found by non-adaptive methods.\n\n\u003e (ii) Even when the adaptive methods achieve\nthe same training loss or lower than non-adaptive methods, the development or test performance\nis worse.\n\n\u003e (iii) Adaptive methods often display faster initial progress on the training set, but their\nperformance quickly plateaus on the development set. \n\n\u003e (iv) Though conventional wisdom suggests\nthat Adam does not require tuning, we find that tuning the initial learning rate and decay scheme for\nAdam yields significant improvements over its default settings in all cases.\n\nDARTS: Differentiable Architecture Search\nhttps://arxiv.org/abs/1806.09055\n \n[**Bag of Tricks** for Image Classification with Convolutional Neural Networks](htps://arxiv.org/abs/1812.01187v1) ✅\n\n[2018 - **Tune**: A Research Platform for Distributed Model Selection and Training](https://arxiv.org/abs/1807.05118) [[github]](https://github.com/ray-project/ray/tree/master/python/ray/tune)\n\n[2017 - Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation](https://arxiv.org/abs/1602.05179)\n\n[2017 - Understanding deep learning requires rethinking generalization](https://arxiv.org/abs/1611.03530) ⭕\n\n[2018 - Error Forward-Propagation: Reusing Feedforward Connections to Propagate Errors in Deep Learning](https://arxiv.org/abs/1808.03357)\n\n[2018 - An Empirical Model of Large-Batch Training](https://arxiv.org/abs/1812.06162v1)\n\n\u003cimg src=\"https://i.pinimg.com/564x/36/bb/e4/36bbe4d951a1c100714ea7baa43e0e44.jpg\" width=\"350\"\u003e\n\n[2018 - A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay](https://arxiv.org/abs/1803.09820) ⭕\n\n[2019 - Training Neural Networks with Local Error Signals](https://arxiv.org/abs/1901.06656) [[github]](https://github.com/anokland/local-loss) ⭕\n\n[2019 - Switchable Normalization for Learning-to-Normalize Deep Representation](https://arxiv.org/abs/1907.10473)\n\n\u003cimg src=\"http://luoping.me/post/family-normalization/SN.png\" width=\"350\"\u003e\n\n[2019 - Revisiting Small Batch Training for Deep Neural Networks](https://arxiv.org/abs/1804.07612)\n\n[2019 - Cyclical Learning Rates for Training Neural Networks](https://arxiv.org/abs/1506.01186)\n\n\u003cimg src=\"https://www.pyimagesearch.com/wp-content/uploads/2019/07/keras_clr_triangular2.png\" width=\"350\"\u003e\n\n[2019 - DeepOBS: A Deep Learning Optimizer Benchmark Suite](https://arxiv.org/abs/1903.05499)\n\n\u003cimg src=\"https://github.com/fsschneider/DeepOBS/raw/master/docs/deepobs_banner.png\" width=\"350\"\u003e\n\n[2019 - A Recipe for Training Neural Networks. Andrey Karpathi Blog](http://karpathy.github.io/2019/04/25/recipe/)\n\n[2020 - Fantastic Generalization Measures and Where to Find Them](https://arxiv.org/abs/1912.02178) ✅\n\n\u003e The most direct and principled approach for studying\ngeneralization in deep learning is to prove a **generalization bound** which is typically an upper\nbound on the test error based on some quantity that can be calculated on the training set.\n\n\u003e **Kendall’s Rank-Correlation Coefficient**: Given a set of models\nresulted by training with hyperparameters in the set Θ, their associated generalization gap {g(θ)| θ ∈\nΘ}, and their respective values of the measure {µ(θ)| θ ∈ Θ}, our goal is to analyze how consistent\na measure (e.g. L2 norm of network weights) is with the empirically observed generalization. \nIf complexity and generalization are independent, the coefficient becomes zero\n\n\u003e **VC-dimension** as well as the number of parameters are **negatively correlated** with\ngeneralization gap which confirms the widely known empirical observation that overparametrization\nimproves generalization in deep learning.\n\n\u003e These results confirm the general understanding that larger margin, **lower cross-entropy** and higher entropy would\nlead to **better generalization**\n\n\u003e we observed that the **initial phase** (to reach cross-entropy value of 0.1) of the optimization is **negatively\ncorrelated** with the ??speed of optimization?? (error?) for both τ and Ψ. This would suggest that the **difficulty\nof optimization** during the initial phase of the optimization **benefits the final generalization**.\n\n\u003e Towards the end of the training, the variance of the gradients also\ncaptures a particular type of “flatness” of the local minima. This measure is surprisingly predictive\nof the generalization both in terms of τ and Ψ, and more importantly, is positively correlated across\nevery type of hyperparameter. \n\n\u003e There are **mixed** results about how the **optimization speed** is relevant to generalization. On one hand\nwe know that adding Batch Normalization or using shortcuts in residual architectures help both\noptimization and generalization.On the other hand, there are empirical results showing that adaptive\noptimization methods that are faster, usually generalize worse (Wilson et al., 2017b).\n\n\u003e Based on empirical observations made by the community as a whole, the canonical ordering we give\nto each of the hyper-parameter categories are as follows:\n\u003e 1. Batchsize: smaller batchsize leads to smaller generalization gap\n\u003e 2. Depth: deeper network leads to smaller generalization gap\n\u003e 3. Width: wider network leads to smaller generalization gap\n\u003e 4. Dropout: The higher the dropout (≤ 0.5) the smaller the generalization gap\n\u003e 5. Weight decay: The higher the weight decay (smaller than the maximum for each optimizer)\nthe smaller the generalization gap\n\u003e 6. Learning rate: The higher the learning rate (smaller than the maximum for each optimizer)\nthe smaller the generalization gap\n\u003e 7. Optimizer: Generalization gap of Momentum SGD \u003c Generalization gap of Adam \u003c Generalization gap of RMSProp\n\n[2020 - Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers](https://arxiv.org/abs/2007.01547)\n\n\u003cimg src=\"https://user-images.githubusercontent.com/544269/95753705-f18fbe80-0cdc-11eb-9499-6bf22fa456e0.png\" width=\"250\"\u003e\n\n[2020 - Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth](https://arxiv.org/abs/2010.15327)\n\u003cimg src=\"https://i.pinimg.com/564x/15/8b/af/158baf37ea0b6f05cc0b0d1fd2f364d2.jpg\" width=\"250\"\u003e\n\n[2021 - Revisiting ResNets: Improved Training and Scaling Strategies](https://arxiv.org/abs/2103.07579)\n\n\u003csub\u003eNovel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan \u0026 Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.\u003c/sub\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Pruning and Compression\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n[2013 - Do Deep Nets Really Need to be Deep?](https://arxiv.org/abs/1312.6184)\n\n[2015 - Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626)\n\n\u003cimg src=\"https://xmfbit.github.io/img/paper-pruning-network-demo.png\" width=\"350\"\u003e\n\n[2015 - Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](https://arxiv.org/abs/1510.00149)\n\n\u003cimg src=\"https://anandj.in/wp-content/uploads/dc.png\" width=\"350\"\u003e\n\n[2015 - Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) ⭕\n\n[2017 - Learning Efficient Convolutional Networks through Network Slimming](https://arxiv.org/abs/1708.06519) - [[github]](https://github.com/liuzhuang13/slimming) ⭕\n\n\u003cimg src=\"https://user-images.githubusercontent.com/8370623/29604272-d56a73f4-879b-11e7-80ea-0702de6bd584.jpg\" width=\"350\"\u003e\n\n[2018 - Rethinking the Value of Network Pruning](https://arxiv.org/abs/1810.05270) ✅\n\n\u003cimg src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTRq9LlknFNmCyXoKoEVqfMX3JgP66T5Ezpbh4FF9xUVLBU0jO6\" width=\"350\"\u003e\n\n\n\u003e For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives\ncomparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch.\n\n\u003e Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: \n\n\u003e 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model\n\n\u003e 2) learned “important” weights of the large model are typically not useful for the small pruned\nmodel\n\n\u003e 3) the pruned architecture itself, rather than a set of inherited “important”\nweights, is more crucial to the efficiency in the final model, which suggests that in\nsome cases pruning can be useful as an architecture search paradigm.\n\n[2018 - Slimmable Neural Networks](https://arxiv.org/abs/1812.08928)\n\n\u003cimg src=\"https://user-images.githubusercontent.com/22609465/50390872-1b3fb600-0702-11e9-8034-d0f41825d775.png\" width=\"350\"\u003e\n\n\n[2019 - Universally Slimmable Networks and Improved Training Techniques](https://arxiv.org/abs/1903.05134)\n\n\u003cimg src=\"https://user-images.githubusercontent.com/22609465/54562571-45b5ae00-4995-11e9-8984-49e32d07e325.png\" width=\"300\"\u003e\n\n\n\n[2019 - The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635) ✅\n\n\u003cimg src=\"https://miro.medium.com/max/2916/1*IraKnowykSyMZtrW1dJOVA.png\" width=\"350\"\u003e\n\n\u003e Based on these results, we articulate the lottery ticket hypothesis: dense, randomly-initialized, feed-forward\nnetworks contain subnetworks (winning tickets) that—when trained in isolation—\nreach test accuracy comparable to the original network in a similar number of\niterations.\n\n\u003e The winning tickets we find have won the **initialization** lottery: their\nconnections have initial weights that make training particularly effective.\n\n[2019 - AutoSlim: Towards One-Shot Architecture Search for Channel Numbers](https://arxiv.org/abs/1903.11728)\n\n\u003cimg src=\"https://storage.googleapis.com/groundai-web-prod/media%2Fusers%2Fuser_14%2Fproject_372245%2Fimages%2Fx1.png\" width=\"350\"\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Analysis and Interpretability\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n[2015 - Visualizing and Understanding Recurrent Networks](https://arxiv.org/abs/1506.02078)\n\n[2016 - Discovering Causal Signals in Images](https://arxiv.org/abs/1605.08179)\n\n\u003cimg src=\"https://2.bp.blogspot.com/-ZS7WHgo3f9U/XD26idxNEEI/AAAAAAAABl8/DipJ1Fm3ZK0C3tXhu03psC4nByTlID-sQCLcBGAs/s1600/Screen%2BShot%2B2019-01-15%2Bat%2B19.48.13.png\" width=\"400\"\u003e\n\n[2016 - **[Grad-CAM]**: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/abs/1610.02391) [[github]](https://github.com/jacobgil/pytorch-grad-cam)\n\n\u003cimg src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSR95EORUuYqxk3MtWiiQoDmHnizHVPxr1JnGVbfWJrHesJjZln\u0026s\" width=\"350\"\u003e\n\n[2017 - Visualizing the Loss Landscape of Neural Nets](https://arxiv.org/abs/1712.09913)\n\n\u003cimg src=\"https://github.com/tomgoldstein/loss-landscape/raw/master/doc/images/resnet56_noshort_small.jpg\" width=\"250\"\u003e\n\n[2019 - **[SURVEY]** Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers](https://ieeexplore.ieee.org/document/8371286) 📜\n\n\n[2018 - GAN Dissection: Visualizing and Understanding Generative Adversarial Networks](https://arxiv.org/abs/1811.10597v1)\n\n\u003cimg src=\"https://i.pinimg.com/originals/5a/df/e9/5adfe97e85a9023d7f11499ab57e7daf.png\" width=\"350\"\u003e\n\n[2018 Interactive tool](https://gandissect.csail.mit.edu/)\n\n[**[Netron ]** Visualizer for deep learning and machine learning models](https://github.com/lutzroeder/Netron)\n\n\u003cimg src=\"https://raw.githubusercontent.com/lutzroeder/netron/master/media/screenshot.png\" width=\"400\"\u003e\n\n[2019 - **[Distill]**: Computing Receptive Fields of Convolutional Neural Networks](https://distill.pub/2019/computing-receptive-fields/)\n\n[2019 - On the Units of GANs](https://arxiv.org/abs/1901.09887)\n\n\u003cimg src=\"https://neurohive.io/wp-content/uploads/2018/12/unit-distr-770x382.jpg\" width=\"350\"\u003e\n\n[2019 - Unmasking Clever Hans Predictors and Assessing What Machines Really Learn](https://arxiv.org/abs/1902.10178)\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/7d26dceb-3ca2-4039-92c2-0fcb75f7dbfc/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210429%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210429T131615Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=41283711dcab0235330a252327d62c73224e73b650a1ab6288646fb491485af2\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\u003csub\u003eCurrent learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly \"intelligent\" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.\u003c/sub\u003e\n\n\n[2020 - Actionable Attribution Maps for Scientific Machine Learning](https://arxiv.org/abs/2006.16533)\n\n\u003cimg src=\"https://i.pinimg.com/564x/45/b0/51/45b05100bff866b98ff050433d4e64dd.jpg\" width=\"350\"\u003e\n\n\n[2020 - Shortcut Learning in Deep Neural Networks](https://arxiv.org/abs/2004.07780)\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/1f0c83d4-1c8a-41aa-8664-02828932bc0c/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210429%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210429T131515Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=b3069a55cb86532fbd4cad3a15eca0988024fedd3593be311e150c99196b5317\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\u003csub\u003eDeep learning has triggered the current rise of artificial intelligence and is the workhorse of today's machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distil how many of deep learning's problem can be seen as different symptoms of the same underlying problem: shortcut learning. Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Related issues are known in Comparative Psychology, Education and Linguistics, suggesting that shortcut learning may be a common characteristic of learning systems, biological and artificial alike. Based on these observations, \u003cb\u003ewe develop a set of recommendations for model interpretation and benchmarking\u003c/b\u003e, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.\u003c/sub\u003e\n\n\n--- \n[2021 - VIDEO: CVPR 2021 Workshop.  Interpretable Neural Networks for Computer Vision: Clinical Decisions that are Aided, not Automated](https://www.youtube.com/watch?v=x7U5qC6eMnE)\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/39b2474e-5930-489d-a215-1aa51be40681/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210622%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210622T114331Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=73f043a653249e6520a3d770efbbe4845f27dd2c7842e499ad409744b01fb606\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\n---\n[2021 - VIDEO. CVPR 2021 Workshop. Interpreting Deep Generative Models for Interactive AI Content Creation by Bolei Zhou (CUHK)](https://www.youtube.com/watch?v=PtRU2B6Iml4)\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/962a7de7-a418-4d8b-a1b5-e009034a6506/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210622%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210622T114826Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=b38aa5bcb1a300935b2aa062d48404426716803cf630ba22c055130b1a95385c\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n# Tasks\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Segmentation\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n[2019 - Panoptic Segmentation](http://openaccess.thecvf.com/content_CVPR_2019/html/Kirillov_Panoptic_Segmentation_CVPR_2019_paper.html) ✅\n\n\u003cimg src=\"https://miro.medium.com/max/1400/1*OelVuv2thUGAj_400WfseQ.png\" width=\"350\"\u003e\n\n[2019 - The Best of Both Modes: Separately Leveraging RGB and Depth for Unseen Object Instance Segmentation](https://arxiv.org/abs/1907.13236) ✅\n\n\u003cimg src=\"https://i.pinimg.com/564x/31/a7/a1/31a7a1a70bd76e035d92f811cb4701d0.jpg\" width=\"350\"\u003e\n\n\u003e Recognizing unseen objects is a challenging perception task\nsince the robot needs to learn the concept of “objects” and generalize it to unseen objects\n\n\u003e An ideal method would combine the generalization capability of training on synthetic depth\nand the ability to produce sharp masks by training on RGB.\n\n\u003e Training DSN with depth images allows for better generalization to the real world data\n\n\u003e We posit that mask refinement is an easier problem than directly using RGB as input to produce instance masks.\n\n\u003e For the semantic segmentation loss, we use a weighted cross entropy as this\nhas been shown to work well in detecting object boundaries in imbalanced images [29].\n\n\u003e In order to train the RRN, we need examples of perturbed masks along with ground truth masks.\nSince such perturbations do not exist, this problem can be seen as a data augmentation task where we\naugment the ground truth mask into something that resembles an initial mask\n\n\u003e In order to seek a fair comparison, all models trained in this section are trained for 100k iterations\nof SGD using a fixed learning rate of 1e-2 and batch size of 8. \n\n[2019 - ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors](https://arxiv.org/abs/1904.03239)\n\n\u003cimg src=\"https://storage.googleapis.com/groundai-web-prod/media/users/user_225114/project_350444/images/figures/shapemask_fig1_v3.jpg\" width=\"300\"\u003e\n\n[2019 - Learning to Segment via Cut-and-Paste](https://arxiv.org/abs/1803.06414)\n\n\u003cimg src=\"https://media.springernature.com/original/springer-static/image/chp%3A10.1007%2F978-3-030-01234-2_3/MediaObjects/474212_1_En_3_Fig3_HTML.gif\" width=\"350\"\u003e\n\n[2019 - YOLACT Real-time Instance Segmentation](https://arxiv.org/abs/1904.02689)[[github]](https://github.com/dbolya/yolact)\n\n\u003cimg src=\"https://i.pinimg.com/564x/52/0c/3e/520c3ee5e0695482c12a73e096dd4b9f.jpg\" width=\"350\"\u003e\n\n---\n\n[2021 - Boundary IoU: Improving Object-Centric Image Segmentation Evaluation](https://arxiv.org/abs/2103.16562) [[github]](https://bowenc0221.github.io/boundary-iou/)\n\n\u003csub\u003eWe present Boundary IoU (Intersection-over-Union), a new segmentation evaluation measure focused on boundary quality. We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects. The new quality measure displays several desirable characteristics like symmetry w.r.t. prediction/ground truth pairs and balanced responsiveness across scales, which makes it more suitable for segmentation evaluation than other boundary-focused measures like Trimap IoU and F-measure. Based on Boundary IoU, we update the standard evaluation protocols for instance and panoptic segmentation tasks by proposing the Boundary AP (Average Precision) and Boundary PQ (Panoptic Quality) metrics, respectively. Our experiments show that the new evaluation metrics track boundary quality improvements that are generally overlooked by current Mask IoU-based evaluation metrics. We hope that the adoption of the new boundary-sensitive evaluation metrics will lead to rapid progress in segmentation methods that improve boundary quality.\u003c/sub\u003e\n\n\u003cimg src=\"https://bowenc0221.github.io/boundary-iou/boundary_iou.png\" width=\"400\"\u003e\n\n\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Instance Segmentation\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n[2017 - Mask R-CNN](https://arxiv.org/abs/1703.06870v3) ⭕\n\n\u003csub\u003eWe present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition\u003c/sub\u003e\n\n\n\u003cimg src=\"https://paperswithcode.com/media/methods/Screen_Shot_2020-05-23_at_7.44.34_PM.png\" width=\"350\"\u003e\n\n[2019 - Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth](https://arxiv.org/abs/1906.11109)[[github](https://github.com/axruff/SpatialEmbeddings)] ⭕\n\n\u003csub\u003eCurrent state-of-the-art instance segmentation methods are not suited for real-time applications like autonomous driving, which require fast execution times at high accuracy. Although the currently dominant proposal-based methods have high accuracy, they are slow and generate masks at a fixed and low resolution. Proposal-free methods, by contrast, can generate masks at high resolution and are often faster, but fail to reach the same accuracy as the \u003cb\u003eproposal-based methods\u003c/b\u003e. In this work we propose a new clustering loss function for proposal-free instance segmentation. The loss function pulls the spatial embeddings of pixels belonging to the same instance together and jointly learns an \u003cb\u003einstance-specific clustering bandwidth\u003c/b\u003e, maximizing the intersection-over-union of the resulting instance mask. When combined with a fast architecture, the network can perform instance segmentation in real-time while maintaining a high accuracy. We evaluate our method on the challenging Cityscapes benchmark and achieve top results (5\\% improvement over Mask R-CNN) at more than 10 fps on 2MP images.\u003c/sub\u003e\n\n\u003cimg src=\"https://github.com/axruff/SpatialEmbeddings/raw/master/static/teaser.jpg\" width=\"350\"\u003e\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Interactive Segmentation\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n[2020 - Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections](https://arxiv.org/abs/1911.12709) ⭕\n\n\u003csub\u003eIn interactive object segmentation a user collaborates with a computer vision model to segment an object. Recent works employ convolutional neural networks for this task: Given an image and a set of corrections made by the user as input, they output a segmentation mask. These approaches achieve strong performance by training on large datasets but they keep the model parameters unchanged at test time. Instead, we recognize that user corrections can serve as sparse training examples and we propose a method that capitalizes on that idea to update the model parameters on-the-fly to the data at hand. Our approach enables the adaptation to a particular object and its background, to distributions shifts in a test set, to specific object classes, and even to large domain changes, where the imaging modality changes between training and testing. We perform extensive experiments on 8 diverse datasets and show: Compared to a model with frozen parameters, our method reduces the required corrections (i) by 9%-30% when distribution shifts are small between training and testing; (ii) by 12%-44% when specializing to a specific class; (iii) and by 60% and 77% when we completely change domain between training and testing.\u003c/sub\u003e\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/fc0ced7e-beaf-4ee4-9fce-90ee0f9d31c0/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210617%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210617T081829Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=ffc1d89cc0dced220a2a00ae435121b7fabe7c3faeb6e1155f717139ebe2b357\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Anomaly Detection\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n[2009 - Anomaly Detection: A Survey](https://www.vs.inf.ethz.ch/edu/HS2011/CPS/papers/chandola09_anomaly-detection-survey.pdf) 📜\n\n[2017 - Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery](https://arxiv.org/abs/1703.05921v1)\n\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Semantic Correspondence\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n[2017 - End-to-end weakly-supervised semantic alignment](https://arxiv.org/abs/1712.06861)\n\n\u003cimg src=\"https://camo.githubusercontent.com/c05b4ff567b7341240ebc406ae37739f31e41aea17e0e497d530dcabd2f7cd54/687474703a2f2f7777772e64692e656e732e66722f77696c6c6f772f72657365617263682f7765616b616c69676e2f696d616765732f7465617365722e6a7067\" width=\"350\"\u003e\n\n[2019 - SFNet: Learning Object-aware Semantic Correspondence](https://arxiv.org/abs/1904.01810) - [[github]](https://github.com/cvlab-yonsei/SFNet)\n\n\u003cimg src=\"https://cvlab.yonsei.ac.kr/projects/SFNet/SFNet_files/teaser.png\" width=\"350\"\u003e\n\n[2020 - Deep Semantic Matching with Foreground Detection and Cycle-Consistency](https://arxiv.org/abs/2004.00144)\n\n\u003cimg src=\"https://i.pinimg.com/564x/e9/71/20/e971200126e02c86f8ac2ce349ded90e.jpg\" width=\"350\"\u003e\n\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Optical Flow\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n[2019 - SelFlow: Self-Supervised Learning of Optical Flow](https://arxiv.org/abs/1904.09117) [- [github]](https://github.com/ppliuboy/SelFlow)\n\n\u003cimg src=\"https://i.pinimg.com/564x/80/87/74/80877422d35afa1aa17fe6eedf6eaaf6.jpg\" width=\"350\"\u003e\n\n\u003csub\u003eWe present a self-supervised learning approach for optical flow. Our method \u003cb\u003edistills reliable flow estimations from non-occluded pixels\u003c/b\u003e, and uses these predictions as ground truth to learn optical flow for hallucinated occlusions. We further design a simple CNN to utilize temporal information from multiple frames for better flow estimation. These two principles lead to an approach that yields the best performance for unsupervised optical flow learning on the challenging benchmarks including MPI Sintel, KITTI 2012 and 2015. More notably, our self-supervised pre-trained model provides an excellent initialization for supervised fine-tuning. Our fine-tuned models achieve state-of-the-art results on all three datasets. At the time of writing, we achieve EPE=4.26 on the Sintel benchmark, outperforming all submitted methods.\u003c/sub\u003e\n\n\n[2021 - AutoFlow: Learning a Better Training Set for Optical Flow](http://people.csail.mit.edu/celiu/pdfs/CVPR21_AutoFlow.pdf)\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/1571aa7a-bff8-4e78-a843-170e2e6f43e3/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210429%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210429T074617Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=dad304d49177674ec93b9b66edf658387f6849659aecce407e0434d8e94778ae\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\u003csub\u003e\u003cb\u003eSynthetic datasets\u003c/b\u003e play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications. To automate the process, we present AutoFlow, a simple and effective method to render training data for optical flow that \u003cb\u003eoptimizes the performance of a model on a target dataset\u003c/b\u003e. AutoFlow takes a layered approach to render synthetic data, where the motion, shape, and appearance of each layer are controlled by \u003cb\u003elearnable hyperparameters\u003c/b\u003e. Experimental results show that AutoFlow achieves state-of-the-art accuracy in pre-training both PWC-Net and RAFT.\u003c/sub\u003e\n\n[2021 - SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping](https://arxiv.org/abs/2105.07014)\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/62334cbd-9fd4-45c4-abba-f2e20ac2fb6c/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210713%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210713T162556Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=4af32ddf83a24615ed0af6315aebfaa02afbb69da3aa0cdccaed6532810e69ec\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n# Methods\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Transfer Learning\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n- [Transfer Learning](https://github.com/axruff/TransferLearning)\n- [Domain Adaptation](https://github.com/axruff/TransferLearning)\n- [Domain Randomization](https://github.com/axruff/TransferLearning#domain-randomization)\n- [Style Transfer](https://github.com/axruff/TransferLearning#style-transfer)\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Generative Modelling\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n- [Generative Models](https://github.com/axruff/TransferLearning#generative-models)\n\n \u003c!--- ===================================================================================\n \u003c!---   ____                 _                                       _              _ \n \u003c!---  / ___|  ___ _ __ ___ (_)      ___ _   _ _ __   ___ _ ____   _(_)___  ___  __| |\n \u003c!---  \\___ \\ / _ \\ '_ ` _ \\| |_____/ __| | | | '_ \\ / _ \\ '__\\ \\ / / / __|/ _ \\/ _` |\n \u003c!---   ___) |  __/ | | | | | |_____\\__ \\ |_| | |_) |  __/ |   \\ V /| \\__ \\  __/ (_| |\n \u003c!---  |____/ \\___|_| |_| |_|_|     |___/\\__,_| .__/ \\___|_|    \\_/ |_|___/\\___|\\__,_|\n \u003c!---                                         |_|                                     \n\u003c!--- ===================================================================================\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Weakly Supervised\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n[2015 - Constrained Convolutional Neural Networks for Weakly Supervised Segmentation](https://arxiv.org/abs/1506.03648)\n\n\u003cimg src=\"https://people.eecs.berkeley.edu/~pathak/images/iccv15.png\" width=\"300\"\u003e\n\n[2018 - Deep Learning with Mixed Supervision for Brain Tumor Segmentation](https://arxiv.org/abs/1812.04571)\n\u003cimg src=\"https://www.spiedigitallibrary.org/ContentImages/Journals/JMIOBU/6/3/034002/WebImages/JMI_6_3_034002_f001.png\" widtg=\"350\"\u003e\n\n[2019 - Localization with Limited Annotation for Chest X-rays](https://arxiv.org/abs/1909.08842v1)\n\n\u003cimg src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTOFaxbxbwuKln6SgbFVWyVP2A7tj-CTQe05isVKH3gb1IGqg84ig\u0026s\" width=\"350\"\u003e\n\n[2019 - Doubly Weak Supervision of Deep Learning Models for Head CT](https://jdunnmon.github.io/miccai_crc.pdf)\n\n\u003cimg src=\"https://media.springernature.com/original/springer-static/image/chp%3A10.1007%2F978-3-030-32248-9_90/MediaObjects/490277_1_En_90_Fig2_HTML.png\" width=\"350\"\u003e\n\n[2019 - Training Complex Models with Multi-Task Weak Supervision](https://www.ncbi.nlm.nih.gov/pubmed/31565535)\n\n\u003csub\u003eAs machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply at different levels of granularity. We propose a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting. We show that by solving a matrix completion-style problem, we can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model. Theoretically, we show that the generalization error of models trained with this approach improves with the number of unlabeled data points, and characterize the scaling with respect to the task and dependency structures. On three fine-grained classification problems, we show that our approach leads to average gains of 20.2 points in accuracy over a traditional supervised approach, 6.8 points over a majority vote baseline, and 4.1 points over a previously proposed weak supervision method that models tasks separately.\u003c/sub\u003e\n\n\u003cimg src=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765366/bin/nihms-1037643-f0001.jpg\" width=\"350\"\u003e\n\n[2020 - Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods](https://arxiv.org/abs/2002.11955)\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Semi Supervised\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n\n[2014 - Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks](https://arxiv.org/abs/1406.6909) ✅\n\n\u003cimg src=\"https://www.inference.vc/content/images/2017/05/Screen-Shot-2017-05-11-at-9.31.37-AM.png\" width=\"300\"\u003e\n \n[2017 - Random Erasing Data Augmentation](https://arxiv.org/abs/1708.04896v2)[[github]](https://github.com/zhunzhong07/Random-Erasing)\n\n\u003cimg src=\"https://github.com/zhunzhong07/Random-Erasing/raw/master/all_examples-page-001.jpg\" width=\"350\"\u003e\n\n[2017 - Smart Augmentation - Learning an Optimal Data Augmentation Strategy](https://arxiv.org/abs/1703.08383)\n\n[2017 - Population Based Training of Neural Networks](https://arxiv.org/abs/1711.09846) ⭕\n\n\n[2018 - **[Survey]**: Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis](https://arxiv.org/abs/1804.06353) 📜\n\n[2018 - Albumentations: fast and flexible image augmentations](https://arxiv.org/abs/1809.06839) - [[github]](https://github.com/albu/albumentations) ✅\n\n\n[2018 - Data Augmentation by Pairing Samples for Images Classification](https://arxiv.org/abs/1801.02929)\n\n[2018 - **[AutoAugment]**: Learning Augmentation Policies from Data](https://arxiv.org/abs/1805.09501)\n\n[2018 - Synthetic Data Augmentation using GAN for Improved Liver Lesion Classification](https://arxiv.org/abs/1801.02385)\n\n[2018 - GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks](https://arxiv.org/abs/1810.10863)\n\n[2019 - **[UDA]**: Unsupervised Data Augmentation for Consistency Training](https://arxiv.org/abs/1904.12848) - [[github]](https://github.com/google-research/uda) ⭕\n\n\u003csub\u003eCommon among recent approaches is the use of \u003cb\u003econsistency training\u003c/b\u003e on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by \u003cb\u003eadvanced data augmentation methods\u003c/b\u003e, plays a \u003cb\u003ecrucial role\u003c/b\u003e in semi-supervised learning. Our method also combines well with \u003cb\u003etransfer learning\u003c/b\u003e, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used.\u003c/sub\u003e\n\n\u003cimg src=\"https://camo.githubusercontent.com/0896cb65f9a87983bee3f2f71f3c064c33216413/68747470733a2f2f692e696d6775722e636f6d2f4c38476b3634622e706e67\" width=\"350\"\u003e\n\n[2019 - **[MixMatch]**: A Holistic Approach to Semi-Supervised Learning](https://arxiv.org/abs/1905.02249) ⭕\n\n\u003csub\u003eSemi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by \u003cb\u003eguessing low-entropy labels\u003c/b\u003e for data-augmented unlabeled examples and \u003cb\u003emixing labeled and unlabeled\u003c/b\u003e data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts.\u003c/sub\u003e\n\n\u003cimg src=\"https://miro.medium.com/max/1402/1*i4OfXztihCXgrxR52ZlowQ.png\" width=\"350\"\u003e\n\n\n[2019 - **[RealMix]**: Towards Realistic Semi-Supervised Deep Learning Algorithms](https://arxiv.org/abs/1912.08766v1) ✅\n\n\u003cimg src=\"https://storage.googleapis.com/groundai-web-prod/media/users/user_14/project_402411/images/RealMix.png\" width=\"350\"\u003e\n\n\n[2019 - Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules](https://arxiv.org/abs/1905.05393) [[github]](https://github.com/arcelien/pba) ✅\n\n[2019 - **[AugMix]**: A Simple Data Processing Method to Improve Robustness and Uncertainty](https://arxiv.org/abs/1912.02781v1) [[github]](https://github.com/google-research/augmix) ✅\n\n\u003cimg src=\"https://pythonawesome.com/content/images/2019/12/AugMix.jpg\" width=\"350\"\u003e\n\n[2019 - Self-training with **[Noisy Student]** improves ImageNet classification](https://arxiv.org/abs/1911.04252) ✅\n\n\u003csub\u003eWe present a simple \u003cb\u003eself-training\u003c/b\u003e method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On \u003cb\u003erobustness test sets\u003c/b\u003e, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%.\nTo achieve this result, we first train an EfficientNet model on labeled ImageNet images and \u003cb\u003euse it as a teacher to generate pseudo labels\u003c/b\u003e on 300M unlabeled images. We then train a larger EfficientNet as \u003cb\u003ea student model on the combination of labeled and pseudo labeled images\u003c/b\u003e. We \u003cb\u003eiterate this process\u003c/b\u003e by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as accurate as possible. However, during the learning of the student, we \u003cb\u003einject noise such as dropout, stochastic depth and data augmentation\u003c/b\u003e via RandAugment to the student so that the student generalizes better than the teacher.\u003c/sub\u003e\n\n\u003cimg src=\"https://storage.googleapis.com/groundai-web-prod/media%2Fusers%2Fuser_23782%2Fproject_397607%2Fimages%2Fx1.png\" width=\"250\"\u003e\n\n\n[2020 - Rain rendering for evaluating and improving robustness to bad weather](https://arxiv.org/abs/2009.03683)\n\n\u003cimg src=\"https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs11263-020-01366-3/MediaObjects/11263_2020_1366_Fig13_HTML.png\" width=\"350\"\u003e\n\n\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n## Unsupervised Learning\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\u003c!--- =============================================================================== --\u003e\n\u003c!--- ------------------------------------------------------------------------------- --\u003e\n\n---\n[2015 - Unsupervised Visual Representation Learning by Context Prediction](https://arxiv.org/abs/1505.05192)\n\n\u003csub\u003eThis work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts. We demonstrate that the feature representation learned using this within-image context indeed captures visual similarity across images. For example, this representation allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Furthermore, we show that the learned ConvNet can be used in the R-CNN framework and provides a significant boost over a randomly-initialized ConvNet, resulting in state-of-the-art performance among algorithms which use only Pascal-provided training set annotations.\u003c/sub\u003e\n\n\u003cimg src=\"https://davidstutz.de/wordpress/wp-content/uploads/2017/03/doersch.jpg\" width=\"350\"\u003e\n\n---\n[2016 - Colorful Image Colorization](https://arxiv.org/abs/1603.08511)\n\n\u003cimg src=\"https://richzhang.github.io/colorization/resources/images/net_diagram.jpg\" width=\"350\"\u003e\n\n---\n[2016 - Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles](https://arxiv.org/abs/1603.09246)\n\n\u003csub\u003eIn this paper we study the problem of image representation learning without human annotation. By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve \u003cb\u003eJigsaw puzzles as a pretext task\u003c/b\u003e, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks we introduce the \u003cb\u003econtext-free network (CFN), a siamese-ennead CNN\u003c/b\u003e. The CFN takes image tiles as input and explicitly limits the receptive field (or context) of its early processing units to one tile at a time. We show that the CFN includes fewer parameters than AlexNet while preserving the same semantic learning capabilities. By training the CFN to solve Jigsaw puzzles, we learn both a feature mapping of object parts as well as their correct spatial arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. Our proposed method for learning visual representations outperforms state of the art methods in several transfer learning benchmarks.\u003c/sub\u003e\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/976fef1e-c4fe-459c-86b2-3538814e5924/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210621%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210621T091009Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=50221238f395513f2052afda5609e807ca293474018ec8e87d75bece815791ac\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"400\"\u003e\n\n---\n[2016 - Context Encoders: Feature Learning by Inpainting](https://www.semanticscholar.org/paper/Context-Encoders%3A-Feature-Learning-by-Inpainting-Pathak-Kr%C3%A4henb%C3%BChl/7d0effebfa4bed19b6ba41f3af5b7e5b6890de87)\n\n\u003cimg src=\"https://i.pinimg.com/564x/c1/2a/9b/c12a9bb34f048531dd086f9706d4306f.jpg\" width=\"350\"\u003e\n\n---\n[2018 - Unsupervised Representation Learning by Predicting Image Rotations](https://www.semanticscholar.org/paper/Unsupervised-Representation-Learning-by-Predicting-Gidaris-Singh/aab368284210c1bb917ec2d31b84588e3d2d7eb4)\n\n\u003cimg src=\"https://media.arxiv-vanity.com/render-output/4649620/x1.png\" width=\"350\"\u003e\n\n[2019 - Greedy InfoMax for Biologically Plausible Self-Supervised Representation Learning](https://arxiv.org/abs/1905.11786)\n\n[2019 - Unsupervised Learning via Meta-Learning](https://arxiv.org/abs/1810.02334)\n\n---\n[2019 - **[PIRL]**: Self-Supervised Learning of Pretext-Invariant Representations](https://www.semanticscholar.org/paper/Self-Supervised-Learning-of-Pretext-Invariant-Misra-Maaten/0170bb0b524df2c81b5adc3062c6001a2eb34c96)\n\u003csub\u003eIshan Misra, L. V. D. Maaten\u003c/sub\u003e\n\n\u003csub\u003eThe goal of self-supervised learning from images is to construct image representations that are semantically meaningful via pretext tasks that do not require semantic annotations. Many pretext tasks lead to representations that are covariant with image transformations. We argue that, instead, semantic representations ought to be invariant under such transformations. Specifically, we develop Pretext-Invariant Representation Learning (PIRL, pronounced as `pearl') that learns invariant representations based on pretext tasks. We use PIRL with a commonly used pretext task that involves solving jigsaw puzzles. We find that PIRL substantially improves the semantic quality of the learned image representations. Our approach sets a new state-of-the-art in self-supervised learning from images on several popular benchmarks for self-supervised learning. Despite being unsupervised, PIRL outperforms supervised pre-training in learning image representations for object detection. Altogether, our results demonstrate the potential of self-supervised representations with good invariance properties\u003c/sub\u003e\n\n\u003cimg src=\"https://i.pinimg.com/564x/04/82/84/048284efc48f9a6252cd3891a0640be3.jpg\" width=\"350\"\u003e\n\n[2019 - Representation Learning with Contrastive Predictive Coding](https://arxiv.org/abs/1807.03748)\n\n---\n[2019 - **[MoCo]**: Momentum Contrast for Unsupervised Visual Representation Learning](https://arxiv.org/abs/1911.05722)\n\n\u003csub\u003eWe present Momentum Contrast (MoCo) for unsupervised \u003cb\u003evisual representation learning\u003c/b\u003e. From a perspective on contrastive learning as dictionary look-up, we build a \u003cb\u003edynamic dictionary with a queue\u003c/b\u003e and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that \u003cb\u003ethe gap between unsupervised and supervised\u003c/b\u003e representation learning has been largely closed in many vision tasks.\u003c/sub\u003e\n\n\u003cimg src=\"https://pythonawesome.com/content/images/2020/03/MoCo.png\" width=\"350\"\u003e\n\n[2019 - Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey](https://arxiv.org/abs/1902.06162)\n\n---\n[2020 - **[SimCLR]**: A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/abs/2002.05709) ✅\n\n\u003csub\u003eThis paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.\u003c/sub\u003e\n\n\u003cimg src=\"https://miro.medium.com/max/8300/1*1uaA1tE5PDnVpSljxSTEoQ.png\" width=\"250\"\u003e\n\n[2020 - **::SURVEY::** Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey](https://arxiv.org/abs/1902.06162) 📜⭕\n\n[2020 - **[NeurIPS 2020 Workshop]**: Self-Supervised Learning - Theory and Practice](https://sslneuips20.github.io/pages/Accepted%20Paper.html) ⭕\n\n---\n[2020 - **[BYOL]**: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning](https://www.semanticscholar.org/paper/Bootstrap-Your-Own-Latent%3A-A-New-Approach-to-Grill-Strub/38f93092ece8eee9771e61c1edaf11b1293cae1b)\n\n\u003csub\u003eWe introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. \u003c/sub\u003e\n\n\u003cimg src=\"https://s3.us-west-2.amazonaws.com/secure.notion-static.com/879f4ca1-8f25-4f32-8701-bacb1bd972c5/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20210621%2Fus-west-2%2Fs3%2Faws4_request\u0026X-Amz-Date=20210621T094750Z\u0026X-Amz-Expires=86400\u0026X-Amz-Signature=32d57952ec190524478619c83f4efe0552b58c8f76472ec2b02ee4580d637cae\u0026X-Amz-SignedHeaders=host\u0026response-content-disposition=filename%20%3D%22Untitled.png%22\" width=\"350\"\u003e\n\n---\n[2021 - [POST] Facebook: Self-supervised learning: The dark matter of intelligence](https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/)\n\n\u003cimg src=\"http","projects_url":"https://awesome.ecosyste.ms/api/v1/lists/axruff%2Fdeeplearning/projects"}