https://github.com/dansuh17/deep-learning-roadmap

my own deep learning mastery roadmap
https://github.com/dansuh17/deep-learning-roadmap
ai cnn deep-learning deep-neural-networks gan gans generative-adversarial-network nas neural-architecture-search neural-network papers roadmap
Last synced: 3 months ago
JSON representation
my own deep learning mastery roadmap
Host: GitHub
URL: https://github.com/dansuh17/deep-learning-roadmap
Owner: dansuh17
Created: 2019-04-05T09:09:14.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-07-22T17:00:55.000Z (almost 6 years ago)
Last Synced: 2025-02-05T00:46:43.365Z (5 months ago)
Topics: ai, cnn, deep-learning, deep-neural-networks, gan, gans, generative-adversarial-network, nas, neural-architecture-search, neural-network, papers, roadmap
Size: 140 KB
Stars: 70
Watchers: 2
Forks: 14
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # Deep Learning Roadmap

My own deep learning mastery roadmap, inspired by [Deep Learning Papers Reading Roadmap](https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap).

**There are some customized differences:**

- not only academic papers but also blog posts, online courses, and other references are included

- customized for my own plans - may not include RL, NLP, etc.

- updated for 2019 SOTA

### Introductory Courses

- [Deeplearning.ai courses by Andrew Ng](https://www.youtube.com/channel/UCcIXc5mJsHVYTZR1maL5l9w)

- [Fast.ai](https://www.fast.ai/)

- [Dive Into Deep Learning](http://d2l.ai/)

- [Stanford's CS231n Class Notes](http://cs231n.github.io/)

  

### Basic CNN Architectures

- [x] **AlexNet** (2012) [[paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks)]

    - Alex Krizhevsky et al. "ImageNet Classification with Deep Convolutional Neural Networks"

- [x] **ZFNet** (2013) [[paper](https://arxiv.org/abs/1311.2901)]

    - Zeiler et al. "Visualizing and Understanding Convolutional Networks"

- [x] **VGG** (2014)

    - Simonyan et al. "Very Deep Convolutional Networks for Large-Scale Image Recognition" (2014) [Google DeepMind & Oxford's Visual Geometry Group (VGG)] [[paper](https://arxiv.org/abs/1409.1556)]

    - _VGG-16_: Zhang et al. "Accelerating Very Deep Convolutional Networks for Classification and Detection" [[paper](https://arxiv.org/abs/1505.06798?context=cs)]

- [x] **GoogLeNet**, a.k.a **Inception v.1** (2014) [[paper](https://arxiv.org/abs/1409.4842)]

    - Szegedy et al. "Going Deeper with Convolutions" [Google]

    - Original [LeNet page](http://yann.lecun.com/exdb/lenet/) from Yann LeCun's homepage.

    - [x] **Inception v.2 and v.3** (2015) Szegedy et al. "Rethinking the Inception Architecture for Computer Vision" [[paper](https://arxiv.org/abs/1512.00567)]

    - [x] **Inception v.4 and InceptionResNet** (2016) Szegedy et al. "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" [[paper](https://arxiv.org/abs/1602.07261)]

    - "A Simple Guide to the Versions of the Inception Network" [[blogpost](https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202)]

- [x] **ResNet** (2015) [[paper](https://arxiv.org/abs/1512.03385)]

    - He et al. "Deep Residual Learning for Image Recognition"

- [x] **Xception** (2016) [[paper](https://arxiv.org/abs/1610.02357)]

    - Chollet, Francois - "Xception: Deep Learning with Depthwise Separable Convolutions"

- [x] **MobileNet** (2016) [[paper](https://arxiv.org/abs/1704.04861)]

    - Howard et al. "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications"

    - A nice paper about reducing CNN parameter sizes while maintaining performance.

- [x] **DenseNet** (2016) [[paper](https://arxiv.org/abs/1608.06993)]

    - Huang et al. "Densely Connected Convolutional Networks"

### Generative adversarial networks

- [x] **GAN** (2014.6) [[paper](https://arxiv.org/abs/1406.2661)]

    - Goodfellow et al. "Generative Adversarial Networks"

- [x] **DCGAN** (2015.11) [[paper](https://arxiv.org/abs/1511.06434)]

    - Radford et al. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"

- [x] **Info GAN** (2016.6) [[paper](https://arxiv.org/abs/1606.03657)]

    - Chen et al. "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

- [x] **Improved Techinques for Training GANs** (2016.6) [[paper](https://arxiv.org/abs/1606.03498)]

    - Salimans et al. "Improved Techinques for Training GANs"

    - This paper suggests multiple GAN training techinques such as _feautre matching_, _minibatch discrimination_, _one sided label smoothing_, _virtual batch normalization_.

    - It also suggests a renown generator performance metric, called the **inception score**.

- [x] **f-GAN** (2016.6) [[paper](https://arxiv.org/abs/1606.00709)]

    - Nowozin et al. "f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization"

- [x] **Unrolled GAN** (2016.7) [[paper](https://arxiv.org/abs/1611.02163)]

    - Metz et al. "Unrolled Generative Adversarial Networks"

- [x] **ACGAN** (2016.10) [[paper](https://arxiv.org/abs/1610.09585)]

    - Odena et al. "Conditional Image Synthesis With Auxiliary Classifier GANs"

- [x] **LSGAN** (2016.11) [[paper](https://arxiv.org/abs/1611.04076)]

    - Mao et al. "Least Squares Generative Adversarial Networks"

- [x] **Pix2Pix** (2016.11) [[paper](https://arxiv.org/abs/1611.07004)]

    - Isola et al. "Image-to-Image Translation with Conditional Adversarial Networks"

- [x] **EBGAN** (2016.11) [[paper](https://arxiv.org/abs/1609.03126)]

    - Zhao et al. "Energy-based Generative Adversarial Network"

- [x] **WGAN** (2017.4) [[paper](https://arxiv.org/abs/1701.07875)]

    - Arjovsky et al., "Wasserstein GAN"

- [x] **WGAN_GP** (2017.5) [[paper](https://arxiv.org/abs/1704.00028)]

    - Gulrajani et al., "Improved Training of Wasserstein GANs"

    - Improves the training stability by applying **"gradient penalty (GP)"** to the loss function

- [x] **BEGAN** (2017.5) [[paper](https://arxiv.org/abs/1703.10717)]

    - Berthelot et al. "BEGAN: Boundary Equilibrium Generative Adversarial Networks"

    - Introduces a _diversity ratio_, or an _equilibrium constant_ that controls the variety - quality tradeoff, and also proposes a convergence measure using it.

- [x] **CycleGAN** (2017.5) [[paper](https://arxiv.org/abs/1703.10593)]

    - [x] **DiscoGAN** (2017.5) [[paper](https://arxiv.org/abs/1703.05192)]

    - DiscoGAN and CycleGAN proposes the EXACT SAME learning techniques for style transfer task using GAN, developed independently at the same time.

- [x] **Frechet Inception Distance (FID)** (2017.6) [[paper](https://arxiv.org/abs/1706.08500)]

    - Heusel et al. "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"

    - The paper's main contribution is a technique called Two Time-Scale Update Rule (TTSU), but it is mostly known for the distance metric called _Frechet Inception Distance_ that measures the distance between two distributions of activation values.

- [ ] **ProGAN** (2017.10) [[paper](https://arxiv.org/abs/1710.10196)]

    - Karras et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation"

- [ ] **PacGAN** (2017.12) [[paper](https://arxiv.org/abs/1712.04086)]

    - Higgins et al. "PacGAN: The power of two samples in generative adversarial networks"

- [ ] **BigGAN** (2018) [[paper](https://arxiv.org/abs/1809.11096)]

- [ ] **GauGAN** (2019.3) [[paper](https://arxiv.org/abs/1903.07291)]

    - Park et al. "Semantic Image Synthesis with Spatially-Adaptive Normalization"

    

### Advanced GANs

- [ ] **DRAGAN** (2017.5) [[paper](https://arxiv.org/abs/1705.07215)]

    - Kodali et al. "On Convergence and Stability of GANs"

- [ ] **Are GANs Created Equal?** (2017.11) [[paper](https://arxiv.org/abs/1711.10337)]

    - Lucic et al. "Are GANs Created Equal? A Large-Scale Study"

- [ ] **SGAN** (2017.12) [[paper](https://arxiv.org/abs/1712.02330)]

    - Chavdarova et al. "SGAN: An Alternative Training of Generative Adversarial Networks"

- [ ] **MaskGAN** (2018.1) [[paper](https://arxiv.org/abs/1801.07736)]

    - Fedus et al. "MaskGAN: Better Text Generation via Filling in the _____"

- [ ] **Spectral Normalization** (2018.2) [[paper](https://arxiv.org/abs/1802.05957)]

    - Miyato et al. "Spectral Normalization for Generative Adversarial Networks"

- [ ] **SAGAN** (2018.5)  [[paper](https://arxiv.org/abs/1805.08318)]  [[tensorflow](https://github.com/brain-research/self-attention-gan)]

    - Zhang et al. "Self-Attention Generative Adversarial Networks"

- [ ] **Unusual Effectiveness of Averaging in GAN Training** (2018) [[paper](https://arxiv.org/abs/1806.04498)]

    - "Benefitting from training on past snapshots."

    - Uses exponential moving averaging (EMA)

- [ ] **Disconnected Manifold Learning** (2018.6) [[paper](https://arxiv.org/abs/1806.00880)]

    - Khayatkhoei, et al. "Disconnected Manifold Learning for Generative Adversarial Networks"

- [ ] **A Note on the Inception Score** (2018.6) [[paper](https://arxiv.org/abs/1801.01973)]

    - Barratt et al., "A Note on the Inception Score"

- [ ] **Which Training Methods for GAN do actually converge?** (2018.7) [[paper](https://arxiv.org/abs/1801.04406)]

    - Mescheder et al., "Which Training Methods for GANs do actually Converge?"

- [ ] **GAN Dissection** (2018.11) [[paper](https://arxiv.org/abs/1811.10597)]

    - Bau et al. "GAN Dissection: Visualizing and Understanding Generative Adversarial Networks"

- [ ] **Improving Generalization and Stability for GANs** (2019.2) [[paper](https://arxiv.org/abs/1902.03984)]

    - Thanh-Tung et al., "Improving Generalization and Stability of Generative Adversarial Networks"

- [ ] Augustus Odena - _"Open Questions about GANs"_ (2019.4) [[distill.pub](https://distill.pub/2019/gan-open-problems/)]

    - Very nice article about current state of GAN research and discusses problems yet to be solved.

### Autoencoders

- [ ] Original autoencoder (1986) [[paper](https://web.stanford.edu/class/psych209a/ReadingsByDate/02_06/PDPVolIChapter8.pdf)]

    - Rumelhart, Hinton, and Williams, "Learning Internal Representations by Error Propagation"

- [x] **AutoEncoder** [[science](https://www.cs.toronto.edu/~hinton/science.pdf)]

    - Hinton et al., "Reducing the Dimensionality of Data with Neural Networks"

- [ ] **Denoising Autoencoders** (2008) [[paper](http://www.cs.toronto.edu/~larocheh/publications/icml-2008-denoising-autoencoders.pdf)]

    - Vincent et al. "Extracting and Composing Robust Features with Denoising Autoencoders"

- [ ] **Wasserstein Autoencoder** (2017) [[paper](https://arxiv.org/abs/1711.01558)]

    - Tolstikhin et al. "Wasserstein Auto Encoders"

### Autoregressive models

- [ ] **PixelCNN** (2016) [[paper](https://arxiv.org/abs/1606.05328)]

    - van den Oord et al. "Conditional image generation with PixelCNN decoders."

- [ ] **WaveNet** (2016) [[paper](https://arxiv.org/abs/1609.03499)]

    - van den Oord et al. "WaveNet: A Generative Model for Raw Audio"

- [ ] tacotron?

### Layer Normalizations

- [x] **Batch Normalization** (2015.2) [[paper](https://arxiv.org/abs/1502.03167)]

    - Ioeffe et al. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"

- Group Norm

- [x] **Instance Normalization** (2016.7) [[paper](https://arxiv.org/abs/1607.08022)]

    - Ulyanov et al. "Instance Normalization: The Missing Ingredient for Fast Stylization"

- [ ] Santurkar et al. **"How does Batch Normalization help Optimization?"** (2018.5) [[paper](https://arxiv.org/abs/1805.11604)]

- [ ] **Switchable Normalization** (2019) [[paper](https://arxiv.org/abs/1806.10779)]

    - Luo et al. "Differentiable Learning-to-Normalize via Switchable Normalization"

- [ ] **Weight Standardization** (2019.3) [[paper](https://arxiv.org/abs/1903.10520)]

    - Qiao et al. "Weight Standardization"

    

### Initializations

- [ ] **Xavier Initialization** (2010) [[paper](http://proceedings.mlr.press/v9/glorot10a.html)]

    - Glorot et al., "Understanding the difficulty of training deep feedforward neural networks"

- [ ] **Kaiming (He) Initialization** (2015.2) [[paper](https://arxiv.org/abs/1502.01852)]

    - He et al., "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification"

- [ ] **All you need is a good init** (2015.11) [[paper](https://arxiv.org/abs/1511.06422)]

    - Mishkin et al., "All you need is a good init"

- [ ] **All you need is beyond a good init** (2017.4) [[paper](https://arxiv.org/abs/1703.01827)]

    - Xie et al. "All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation"

### Dropouts

- **Dropout** (2014) [[paper](http://jmlr.org/papers/v15/srivastava14a.html)]

    - Srivastava et al. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"

- **Inverted Dropouts** [[notes on CS231n](http://cs231n.github.io/neural-networks-2/#reg)]

    - Multiplying the inverted _keep_prob_ value on training so that values during inference (or testing) is consistent.

- [ ] Li et al., **"Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift"** (2018.1) [[paper](https://arxiv.org/abs/1801.05134)]

### Meta-Learning / Representation Learning (Zero-Shot learning, Few-Shot learning)

- [x] **Zero-Data Learning** (2008) [[paper](https://www.aaai.org/Papers/AAAI/2008/AAAI08-103.pdf)]

    - Larochelle et al., "Zero-data Learning of New Tasks"

- [ ] Palatucci et al., **"Zero-shot Learning with Semantic Output Codes"** (NIPS 2009) [[paper](https://papers.nips.cc/paper/3650-zero-shot-learning-with-semantic-output-codes)]

- [ ] Socher et al., **"Zero-Shot Learning Through Cross-Modal Transfer"** (2013.1) [[paper](https://arxiv.org/abs/1301.3666)]

- [ ] Lampert et al., **"Attribute-Based Classification for Zero-Shot Visual Object Categorization"** (2013.7) [[paper](https://ieeexplore.ieee.org/abstract/document/6571196)]

- [ ] Dinu et al., **"Improving zero-shot learning by mitigating the hubness problem"** (2014.12) [[paper](https://arxiv.org/abs/1412.6568)]

- [ ] Romera-Paredes et al. - **"An embarrassingly simple approach to zero-shot learning"** (2015) [[paper](http://proceedings.mlr.press/v37/romera-paredes15.pdf)]

- [ ] **Prototypical Networks** (2017.3) [[paper](https://arxiv.org/abs/1703.05175)]

    - Snell et al., "Prototypical Networks for Few-shot Learning"

- [ ] **Zero-shot learning - the Good, the Bad and the Ugly"** (2017.3) [[paper](https://arxiv.org/abs/1703.04394)]

    - Xian et al., "Zero-Shot Learning - The Good, the Bad and the Ugly"

- [ ] **In defence of the Triplet Loss** (2017.3) [[paper](https://arxiv.org/abs/1703.07737)]

    - Hermans et al., "In Defense of the Triplet Loss for Person Re-Identification"

- [ ] **MAML** (2017.3) [[paper](https://arxiv.org/abs/1703.03400)]

    - Finn et al, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"

- [ ] **Triplet Loss and Online Triplet Mining in Tensorflow** (2018.3) [[Oliver Moindrot Blog](https://omoindrot.github.io/triplet-loss)]

- [ ] **Few-Shot learning Survey** (2019.4) [[paper](https://arxiv.org/abs/1904.05046)]

    - Wang et al. "Few-shot Learning: A Survey"

### Transfer learning

- [ ] **Survey 2018** (2018) [[paper](https://arxiv.org/abs/1808.01974)]

    - Tan et al. "A Survey on Deep Transfer Learning"

### Geometric learning

- **Geometric Deep Learning** (2016) [[paper](https://arxiv.org/abs/1611.08097)]

    - Bronstein et al. "Geometric deep learning: going beyond Euclidean data"

### Variational Autoencoders (VAE)

- [ ] **VQ-VAE** (2017.11) [[paper](https://arxiv.org/abs/1711.00937)]

    - van den Oord et al., "Neural Discrete Representation Learning"

- [ ] **Semi-Amortized Variational Autoencoders** (2018.2) [[paper](https://arxiv.org/abs/1802.02550)]

    - Kim et al. "Semi-Amortized Variational Autoencoders"

### Object detection

- [ ] RCNN: https://arxiv.org/abs/1311.2524

- [ ] Fast-RCNN: https://arxiv.org/abs/1504.08083

- [ ] Faster-RCNN: https://arxiv.org/abs/1506.01497

- [ ] SSD: https://arxiv.org/abs/1512.02325

- [ ] YOLO: https://arxiv.org/abs/1506.02640

- [ ] YOLO9000: https://arxiv.org/abs/1612.08242

### Semantic Segmentation

- [ ] FCN: https://arxiv.org/abs/1411.4038

- [ ] SegNet: https://arxiv.org/abs/1511.00561

- [ ] UNet: https://arxiv.org/abs/1505.04597

- [ ] PSPNet: https://arxiv.org/abs/1612.01105

- [ ] DeepLab: https://arxiv.org/abs/1606.00915

- [ ] ICNet: https://arxiv.org/abs/1704.08545

- [ ] ENet: https://arxiv.org/abs/1606.02147

- [Nice survey](https://github.com/hoya012/deep_learning_object_detection)

### Sequential Model

- [ ] **Seq2Seq** (2014) [[paper](https://arxiv.org/abs/1409.3215)]

    - Sutskever et al. "Sequence to sequence learning with neural networks."

### Neural Turing Machine

- [ ] **Neural Turing Machines** (2014) [[paper](https://arxiv.org/abs/1410.5401)]

    - Graves et al., "Neural turing machines."

- [ ] **Pointer Networks** (2015) [[paper]](https://arxiv.org/abs/1506.03134)]

    - Vinyals et al., "Pointer networks."

### Attention / Question-Answering

- [ ] **NMT (Neural Machine Translation)** (2014) [[paper](https://arxiv.org/abs/1409.0473)]

    - Bahdanau et al, "Neural Machine Translation by Jointly Learning to Align and Translate"

- [ ] **Stanford Attentive Reader** (2016.6) [[paper](https://arxiv.org/abs/1606.02858)]

    - Chen et al. "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"

- [ ] **BiDAF** (2016.11) [[paper](https://arxiv.org/abs/1611.01603)]

    - Seo et al. "Bidirectional Attention Flow for Machine Comprehension"

- [ ] **DrQA** or **Stanford Attentive Reader++** (2017.3) [[paper](https://arxiv.org/abs/1704.00051)]

    - Chen et al. "Reading Wikipedia to Answer Open-Domain Questions"

- [ ] **Transformer** (2017.8) [[paper](https://arxiv.org/abs/1706.03762)] [[google ai blog](https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html)]

     -  Vaswani et al. "Attention is all you need"    

- [ ] [read] Lilian Weng - **"Attention? Attention!"** (2018) [[blog_post](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html)]

    - A nice explanation of attention mechanism and its concepts.

- [ ] **BERT** (2018.10) [[paper](https://arxiv.org/abs/1810.04805)]

    - Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

- [ ] **GPT-2** (2019) [[paper (pdf)](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf)]

    - Radford et al. "Language Models are Unsupervised Multitask Learners"

### Advanced RNNs

- Unitary evolution RNNs : https://arxiv.org/abs/1511.06464

- Recurrent Batch Norm : https://arxiv.org/abs/1603.09025

- Zoneout : https://arxiv.org/abs/1606.01305

- IndRNN : https://arxiv.org/abs/1803.04831

- DilatedRNNs : https://arxiv.org/abs/1710.02224

### Model Compression

- **MobileNet** (2016) (see above: [Basic CNN Architectures](https://github.com/deNsuh/deep-learning-roadmap/blob/master/README.md#basic-cnn-architectures))

- [ ] **ShuffleNet** (2017)

    - Zhang et al. "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices"

### Neural Processes

- [ ] **Neural Processes** (2018) [[paper](https://arxiv.org/abs/1807.01622)]

    - Garnelo et al. "Neural Processes"

- [ ] **Attentive Neural Processes** (2019) [[paper](https://arxiv.org/abs/1901.05761)]

    - Kim et al. "Attentive Neural Processes"

- [ ] **A Visual Exploration of Gaussian Processes** (2019) [[Distill.pub](https://distill.pub/2019/visual-exploration-gaussian-processes/)]

    - Not a neural process, but gives very nice intuition about Gaussian Processes. Good Read.

### Self-supervised learning

- [x] Denoising AE https://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf

- [ ] Exemplar Nets https://arxiv.org/abs/1406.6909

- [ ] Co-occ https://arxiv.org/abs/1511.06811

- [ ] Egomotion https://arxiv.org/abs/1505.01596

- [ ] Jigsaw https://arxiv.org/abs/1603.09246

- [ ] Context Encoders https://arxiv.org/abs/1604.07379

- [ ] Split-brain autoencoders https://arxiv.org/abs/1611.09842

- [ ] multi-task self-supervised learning https://arxiv.org/abs/1708.07860

- [ ] Audio-visual scene analysis https://arxiv.org/abs/1804.03641

- [ ] a survey https://slideplayer.com/slide/13195863/

- [ ] Supervising unsupervised learning https://arxiv.org/abs/1709.05262

- [ ] Unsupervised Representation Learning by Predicting Image Rotations https://arxiv.org/abs/1803.07728

- [ ] Mahjourian et al., **"Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints"** (2018.2) [[paper](https://arxiv.org/abs/1802.05522)]

- [ ] Gordon et al., **"Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras"** (2019.4) [[paper](https://arxiv.org/abs/1904.04998)]

### Data Augmentation

- [ ] **Shake Shake Regularization** (2017.5) [[paper](https://arxiv.org/abs/1705.07485)]

    - Gastaldi, Xavier - "Shake-Shake Regularization"

### Interpretation and Theory on Generalization, Overfitting, and Learning Capacity

- [ ] **MDL (Minimum Description Length)**

    - Peter Grunwald - "A tutorial introduction to the minimum description length principle" (2004) [[paper](https://arxiv.org/abs/math/0406077)]

- [ ] Grunwald et al., - **"Shannon Information and Kolmogorov Complexity"** (2010) [[paper](https://arxiv.org/abs/cs/0410002)]

- [ ] Dauphin et al. **"Identifying and attacking the saddle point problem in high-dimensional non-convex optimization"** (2014.6) [[paper](https://arxiv.org/abs/1406.2572)]

- [ ] Choromanska et al. **"The Loss Surfaces of Multilayer Networks"** (2014.11) [[paper](https://arxiv.org/abs/1412.0233)]

    - argues that non-convexity in NNs are not a huge problem

- [ ] **Knowledge Distillation** (2015.3) [[paper](https://arxiv.org/abs/1503.02531)]

    - Hinton et al., "Distilling the Knowledge in a Neural Network"

- [ ] **3-Part Learning Theory** by Mostafa Samir

    - [part 1: Introduction](https://mostafa-samir.github.io/ml-theory-pt1/)

    - [part 2: Generalization Bounds](https://mostafa-samir.github.io/ml-theory-pt2/)

    - [part 3: Regularization and Variance-Bias Tradeoff](https://mostafa-samir.github.io/ml-theory-pt3/)

- [ ] **Deconvolution and Checkerboard Artifacts** - Odena (2016) [[distill.pub article](https://distill.pub/2016/deconv-checkerboard/)]

- [ ] Keskar et al. "**On Large-Batch Training for Deep Learning**: Generalization Gap and Sharp Minima" (2016.9) [[paper](https://arxiv.org/abs/1609.04836)]

- [ ] **Rethinking Generalization** (2016.11) [[paper](https://arxiv.org/abs/1611.03530)]

    - Zhang et al. "Understanding deep learning requires rethinking generalization"

- [ ] **Information Bottleneck** (2017) [[paper](https://arxiv.org/abs/1703.00810)] [[original paper on information bottleneck (2000)](https://arxiv.org/abs/physics/0004057)] [[youtube-talk](https://youtu.be/bLqJHjXihK8)] [[article in quantamagazine](https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/)]

    - Shwartz-Ziv and Tishby, "Opening the Black Box of Deep Neural Networks via Information"

- [ ] Neyshabur et al, "Exploring Generalization in Deep Learning" (2017.7) [[paper](https://arxiv.org/abs/1706.08947)]

- [ ] Sun et al., **"Revisiting Unreasonable Effectiveness of Data in Deep Learning Era"** (2017.7) [[paper](https://arxiv.org/abs/1707.02968)]

- [ ] **Super-Convergence** (2017.8) [[paper](https://arxiv.org/abs/1708.07120)]

    - Smith et al. - "Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates"

- [ ] **Don't Decay the Learning Rate, Increase the Batch Size** (2017.11) [[paper](https://arxiv.org/abs/1711.00489)]

    - Smith et al. "Don't Decay the Learning Rate, Increase the Batch Size"

- [ ] Hestness et al. **"Deep Learning Scaling is Predictable, Empirically"** (2017.12) [[paper](https://arxiv.org/abs/1712.00409)]

- [ ] **Visualizing loss landscape of neural nets** (2018) [[paper](https://arxiv.org/abs/1712.09913v3)]

- [ ] Olson et al., **"Modern Neural Networks Generalize on Small Data Sets"** (NeurIPS 2018) [[paper](https://papers.nips.cc/paper/7620-modern-neural-networks-generalize-on-small-data-sets)]

- [x] **Lottery Ticket Hypothesis** (2018.3) [[paper](https://arxiv.org/abs/1803.03635)]

    - Frankle et al., "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks"

    - Empirically showed that zeroing small weights after training, rewinding except zeroed wegiths, and then re-triaining with 'pruned' weights showed even better results.

- [ ] **Intrinsic Dimension** (2018.4) [[paper](https://arxiv.org/abs/1804.08838)]

    - Li et al., "Measuring the Intrinsic Dimension of Objective Landscapes"

- [ ] Geirhos et al. "**ImageNet-trained CNNs are biased towards texture;** increasing shape bias improves accuracy and robustness" (2018.11) [[paper](https://arxiv.org/abs/1811.12231)]

- [ ] Belkin et al. **"Reconciling modern machine learning and the bias-variance trade-off"** (2018.12) [[paper](https://arxiv.org/abs/1812.11118)]

- [ ] Graetz - "How to visualize convolution features in 40 lines of code" (2019) [[medium](https://towardsdatascience.com/how-to-visualize-convolutional-features-in-40-lines-of-code-70b7d87b0030)]

- [ ] Geiger et al. **"Scaling description of generalization with number of parameters in deep learning"** (2019.1) [[paper](https://arxiv.org/abs/1901.01608)]

- [ ] **Are all layers created equal?** (2019.2) [[paper](https://arxiv.org/abs/1902.01996)]

    - Zhang et al. "Are all layers created equal?" 

- [x] Lilian Weng - **"Are Deep Neural Networks Dramatically Overfitted?"** (2019.4) [[lil'log](https://lilianweng.github.io/lil-log/2019/03/14/are-deep-neural-networks-dramatically-overfitted.html)]

    - Excellent article about generalization and overfitting of deep neural networks

    

### Adversarial Attacks and Defense against attacks (RobustML)

- [RobustML site](https://www.robust-ml.org/)

- [x] **Adversarial Examples** Szegedy et al. - Intreguing Properties of Neural Networks (2013.12) [[paper](https://arxiv.org/abs/1312.6199)]

    - induces missclassification by applying small perturbations

    - this paper was the first to coin the term "Adversarial Example"

- [ ] **Fast Gradient Sign Attack (FGSM)** (2014.12)

    - Goodfellow et al., "Explaining and Harnessing Adversarial Examples" (ICLR 2015) [[paper](https://arxiv.org/abs/1412.6572)]

    - This paper presented the famous "panda example" (as also seen in [pytorch tutorial](https://pytorch.org/tutorials/beginner/fgsm_tutorial.html))

- [ ] Kurakin et al., **"Adversarial Machine Learning at Scale"** (2016.11) [[paper](https://arxiv.org/abs/1611.01236)]

- [ ] Mandry et al., **"Towards Deep Learning Models Resistant to Adversarial Attacks"** (2017.6) [[paper](https://arxiv.org/abs/1706.06083)]

- [ ] Carlini et al., **"Audio Adversarial Examples: Targeted Attacks on Speech-to-Text"** (2018.1) [[paper](https://arxiv.org/abs/1801.01944)]

### Neural architecture search (NAS) and AutoML

- **GREAT AutoML Website** [[site](https://www.automl.org/)]

    - They maintain a blog, a list of NAS literatures, analysis page, and a web book.

- [ ] **AdaNet** (2016.7) [[paper](https://arxiv.org/abs/1607.01097?context=cs)] [[GoogleAI blog](https://ai.googleblog.com/2018/10/introducing-adanet-fast-and-flexible.html)]

    - Cortes et al. "AdaNet: Adaptive Structural Learning of Artificial Neural Networks"

- [ ] **NAS** (2016.12) [[paper](https://arxiv.org/abs/1611.01578)]

    - Zoph et al. "Neural Architecture Search with Reinforcement Learning"

- [ ] **PNAS** (2017.12) [[paper](https://arxiv.org/abs/1712.00559)]

    - Liu et al. "Progressive Neural Architecture Search"

- [ ] **ENAS** (2018.2) [[paper](https://arxiv.org/abs/1802.03268)]

    - Pham et al. "Efficient Neural Architecture Search via Parameter Sharing"

- [ ] **DARTS** (2018.6) [[paper](https://arxiv.org/abs/1806.09055)]

    - Liu et al. "DARTS: Differentiable Architecture Search"

    - Uses a continuous relaxation over the discrete neural architecture space.

- [ ] **RandWire** (2019) [[paper](https://arxiv.org/abs/1904.01569)]

    - Xie et al. "Exploring Randomly Wired Neural Networks for Image Recognition" [Facebook AI Research]

- [ ] **A Survey on Neural Architecture Search** (2019) [[paper](https://arxiv.org/abs/1905.01392)]

    - Witsuba et al., "A Survey on Neural Architecture Search"

    

    

### Practical Techniques

- [x] Andrej Karpathy - **"A recipe for training neural networks"** (2019) [[Andrej Karpathy Blog Post](http://karpathy.github.io/2019/04/25/recipe/)]

    

## DL roadmap reference

- https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap

- https://github.com/terryum/awesome-deep-learning-papers

- which DL algorithms should I implement to learn?

https://www.reddit.com/r/MachineLearning/comments/8vmuet/d_what_deep_learning_papers_should_i_implement_to/

### Theory

- [The MML(Mathematics for Machine Learning) book](https://mml-book.github.io/)

- [Andrej Karpathy - Yes You Should Understand Backprop](https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b)

- [Theoretical principles for Deep Learning](http://mitliagkas.github.io/ift6085-dl-theory-class-2019/?fbclid=IwAR02mfw0hMM5UFaBuxBgDi5wfT6TaIt35wktZGpCmmhu0_3GMA6HqFN1GFs)

- [Stanford STATS 385 - Theories of Deep Learning](https://stats385.github.io/)

    - [Lecture Videos](https://www.researchgate.net/project/Theories-of-Deep-Learning?fbclid=IwAR0dwnuswA1jMwIOuydb_a83AM22FfuD6PpAWPiIW-76OCemcRBrVVLKLoM)

- CSC 231 notes : http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/

### Resources

- **A Selective Overview of Deep Learning** (2019) [[paper](https://arxiv.org/abs/1904.05526)]

    - Fan et al. "A Selective Overview of Deep Learning"

    - A nice overview paper on deep learning up to early 2019 (about 30 pages)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dansuh17/deep-learning-roadmap

Awesome Lists containing this project

README