{"id":15647687,"url":"https://github.com/dansuh17/deep-learning-roadmap","last_synced_at":"2026-01-07T09:41:11.468Z","repository":{"id":98775533,"uuid":"179651716","full_name":"dansuh17/deep-learning-roadmap","owner":"dansuh17","description":"my own deep learning mastery roadmap","archived":false,"fork":false,"pushed_at":"2019-07-22T17:00:55.000Z","size":143,"stargazers_count":70,"open_issues_count":0,"forks_count":14,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-05T00:46:43.365Z","etag":null,"topics":["ai","cnn","deep-learning","deep-neural-networks","gan","gans","generative-adversarial-network","nas","neural-architecture-search","neural-network","papers","roadmap"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dansuh17.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-05T09:09:14.000Z","updated_at":"2025-01-30T06:04:27.000Z","dependencies_parsed_at":"2023-05-25T07:15:14.946Z","dependency_job_id":null,"html_url":"https://github.com/dansuh17/deep-learning-roadmap","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Fdeep-learning-roadmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Fdeep-learning-roadmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Fdeep-learning-roadmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dansuh17%2Fdeep-learning-roadmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dansuh17","download_url":"https://codeload.github.com/dansuh17/deep-learning-roadmap/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246258228,"owners_count":20748538,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","cnn","deep-learning","deep-neural-networks","gan","gans","generative-adversarial-network","nas","neural-architecture-search","neural-network","papers","roadmap"],"created_at":"2024-10-03T12:20:36.988Z","updated_at":"2026-01-07T09:41:11.424Z","avatar_url":"https://github.com/dansuh17.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deep Learning Roadmap\nMy own deep learning mastery roadmap, inspired by [Deep Learning Papers Reading Roadmap](https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap).\n\n**There are some customized differences:**\n- not only academic papers but also blog posts, online courses, and other references are included\n- customized for my own plans - may not include RL, NLP, etc.\n- updated for 2019 SOTA\n\n### Introductory Courses\n- [Deeplearning.ai courses by Andrew Ng](https://www.youtube.com/channel/UCcIXc5mJsHVYTZR1maL5l9w)\n- [Fast.ai](https://www.fast.ai/)\n- [Dive Into Deep Learning](http://d2l.ai/)\n- [Stanford's CS231n Class Notes](http://cs231n.github.io/)\n  \n### Basic CNN Architectures\n- [x] **AlexNet** (2012) [[paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks)]\n    - Alex Krizhevsky et al. \"ImageNet Classification with Deep Convolutional Neural Networks\"\n- [x] **ZFNet** (2013) [[paper](https://arxiv.org/abs/1311.2901)]\n    - Zeiler et al. \"Visualizing and Understanding Convolutional Networks\"\n- [x] **VGG** (2014)\n    - Simonyan et al. \"Very Deep Convolutional Networks for Large-Scale Image Recognition\" (2014) [Google DeepMind \u0026 Oxford's Visual Geometry Group (VGG)] [[paper](https://arxiv.org/abs/1409.1556)]\n    - _VGG-16_: Zhang et al. \"Accelerating Very Deep Convolutional Networks for Classification and Detection\" [[paper](https://arxiv.org/abs/1505.06798?context=cs)]\n- [x] **GoogLeNet**, a.k.a **Inception v.1** (2014) [[paper](https://arxiv.org/abs/1409.4842)]\n    - Szegedy et al. \"Going Deeper with Convolutions\" [Google]\n    - Original [LeNet page](http://yann.lecun.com/exdb/lenet/) from Yann LeCun's homepage.\n    - [x] **Inception v.2 and v.3** (2015) Szegedy et al. \"Rethinking the Inception Architecture for Computer Vision\" [[paper](https://arxiv.org/abs/1512.00567)]\n    - [x] **Inception v.4 and InceptionResNet** (2016) Szegedy et al. \"Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning\" [[paper](https://arxiv.org/abs/1602.07261)]\n    - \"A Simple Guide to the Versions of the Inception Network\" [[blogpost](https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202)]\n- [x] **ResNet** (2015) [[paper](https://arxiv.org/abs/1512.03385)]\n    - He et al. \"Deep Residual Learning for Image Recognition\"\n- [x] **Xception** (2016) [[paper](https://arxiv.org/abs/1610.02357)]\n    - Chollet, Francois - \"Xception: Deep Learning with Depthwise Separable Convolutions\"\n- [x] **MobileNet** (2016) [[paper](https://arxiv.org/abs/1704.04861)]\n    - Howard et al. \"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications\"\n    - A nice paper about reducing CNN parameter sizes while maintaining performance.\n- [x] **DenseNet** (2016) [[paper](https://arxiv.org/abs/1608.06993)]\n    - Huang et al. \"Densely Connected Convolutional Networks\"\n\n\n### Generative adversarial networks\n- [x] **GAN** (2014.6) [[paper](https://arxiv.org/abs/1406.2661)]\n    - Goodfellow et al. \"Generative Adversarial Networks\"\n- [x] **DCGAN** (2015.11) [[paper](https://arxiv.org/abs/1511.06434)]\n    - Radford et al. \"Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks\"\n- [x] **Info GAN** (2016.6) [[paper](https://arxiv.org/abs/1606.03657)]\n    - Chen et al. \"InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets\"\n- [x] **Improved Techinques for Training GANs** (2016.6) [[paper](https://arxiv.org/abs/1606.03498)]\n    - Salimans et al. \"Improved Techinques for Training GANs\"\n    - This paper suggests multiple GAN training techinques such as _feautre matching_, _minibatch discrimination_, _one sided label smoothing_, _virtual batch normalization_.\n    - It also suggests a renown generator performance metric, called the **inception score**.\n- [x] **f-GAN** (2016.6) [[paper](https://arxiv.org/abs/1606.00709)]\n    - Nowozin et al. \"f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization\"\n- [x] **Unrolled GAN** (2016.7) [[paper](https://arxiv.org/abs/1611.02163)]\n    - Metz et al. \"Unrolled Generative Adversarial Networks\"\n- [x] **ACGAN** (2016.10) [[paper](https://arxiv.org/abs/1610.09585)]\n    - Odena et al. \"Conditional Image Synthesis With Auxiliary Classifier GANs\"\n- [x] **LSGAN** (2016.11) [[paper](https://arxiv.org/abs/1611.04076)]\n    - Mao et al. \"Least Squares Generative Adversarial Networks\"\n- [x] **Pix2Pix** (2016.11) [[paper](https://arxiv.org/abs/1611.07004)]\n    - Isola et al. \"Image-to-Image Translation with Conditional Adversarial Networks\"\n- [x] **EBGAN** (2016.11) [[paper](https://arxiv.org/abs/1609.03126)]\n    - Zhao et al. \"Energy-based Generative Adversarial Network\"\n- [x] **WGAN** (2017.4) [[paper](https://arxiv.org/abs/1701.07875)]\n    - Arjovsky et al., \"Wasserstein GAN\"\n- [x] **WGAN_GP** (2017.5) [[paper](https://arxiv.org/abs/1704.00028)]\n    - Gulrajani et al., \"Improved Training of Wasserstein GANs\"\n    - Improves the training stability by applying **\"gradient penalty (GP)\"** to the loss function\n- [x] **BEGAN** (2017.5) [[paper](https://arxiv.org/abs/1703.10717)]\n    - Berthelot et al. \"BEGAN: Boundary Equilibrium Generative Adversarial Networks\"\n    - Introduces a _diversity ratio_, or an _equilibrium constant_ that controls the variety - quality tradeoff, and also proposes a convergence measure using it.\n- [x] **CycleGAN** (2017.5) [[paper](https://arxiv.org/abs/1703.10593)]\n    - [x] **DiscoGAN** (2017.5) [[paper](https://arxiv.org/abs/1703.05192)]\n    - DiscoGAN and CycleGAN proposes the EXACT SAME learning techniques for style transfer task using GAN, developed independently at the same time.\n- [x] **Frechet Inception Distance (FID)** (2017.6) [[paper](https://arxiv.org/abs/1706.08500)]\n    - Heusel et al. \"GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium\"\n    - The paper's main contribution is a technique called Two Time-Scale Update Rule (TTSU), but it is mostly known for the distance metric called _Frechet Inception Distance_ that measures the distance between two distributions of activation values.\n- [ ] **ProGAN** (2017.10) [[paper](https://arxiv.org/abs/1710.10196)]\n    - Karras et al. \"Progressive Growing of GANs for Improved Quality, Stability, and Variation\"\n- [ ] **PacGAN** (2017.12) [[paper](https://arxiv.org/abs/1712.04086)]\n    - Higgins et al. \"PacGAN: The power of two samples in generative adversarial networks\"\n- [ ] **BigGAN** (2018) [[paper](https://arxiv.org/abs/1809.11096)]\n- [ ] **GauGAN** (2019.3) [[paper](https://arxiv.org/abs/1903.07291)]\n    - Park et al. \"Semantic Image Synthesis with Spatially-Adaptive Normalization\"\n    \n\n### Advanced GANs\n- [ ] **DRAGAN** (2017.5) [[paper](https://arxiv.org/abs/1705.07215)]\n    - Kodali et al. \"On Convergence and Stability of GANs\"\n- [ ] **Are GANs Created Equal?** (2017.11) [[paper](https://arxiv.org/abs/1711.10337)]\n    - Lucic et al. \"Are GANs Created Equal? A Large-Scale Study\"\n- [ ] **SGAN** (2017.12) [[paper](https://arxiv.org/abs/1712.02330)]\n    - Chavdarova et al. \"SGAN: An Alternative Training of Generative Adversarial Networks\"\n- [ ] **MaskGAN** (2018.1) [[paper](https://arxiv.org/abs/1801.07736)]\n    - Fedus et al. \"MaskGAN: Better Text Generation via Filling in the _____\"\n- [ ] **Spectral Normalization** (2018.2) [[paper](https://arxiv.org/abs/1802.05957)]\n    - Miyato et al. \"Spectral Normalization for Generative Adversarial Networks\"\n- [ ] **SAGAN** (2018.5)  [[paper](https://arxiv.org/abs/1805.08318)]  [[tensorflow](https://github.com/brain-research/self-attention-gan)]\n    - Zhang et al. \"Self-Attention Generative Adversarial Networks\"\n- [ ] **Unusual Effectiveness of Averaging in GAN Training** (2018) [[paper](https://arxiv.org/abs/1806.04498)]\n    - \"Benefitting from training on past snapshots.\"\n    - Uses exponential moving averaging (EMA)\n- [ ] **Disconnected Manifold Learning** (2018.6) [[paper](https://arxiv.org/abs/1806.00880)]\n    - Khayatkhoei, et al. \"Disconnected Manifold Learning for Generative Adversarial Networks\"\n- [ ] **A Note on the Inception Score** (2018.6) [[paper](https://arxiv.org/abs/1801.01973)]\n    - Barratt et al., \"A Note on the Inception Score\"\n- [ ] **Which Training Methods for GAN do actually converge?** (2018.7) [[paper](https://arxiv.org/abs/1801.04406)]\n    - Mescheder et al., \"Which Training Methods for GANs do actually Converge?\"\n- [ ] **GAN Dissection** (2018.11) [[paper](https://arxiv.org/abs/1811.10597)]\n    - Bau et al. \"GAN Dissection: Visualizing and Understanding Generative Adversarial Networks\"\n- [ ] **Improving Generalization and Stability for GANs** (2019.2) [[paper](https://arxiv.org/abs/1902.03984)]\n    - Thanh-Tung et al., \"Improving Generalization and Stability of Generative Adversarial Networks\"\n- [ ] Augustus Odena - _\"Open Questions about GANs\"_ (2019.4) [[distill.pub](https://distill.pub/2019/gan-open-problems/)]\n    - Very nice article about current state of GAN research and discusses problems yet to be solved.\n\n### Autoencoders\n- [ ] Original autoencoder (1986) [[paper](https://web.stanford.edu/class/psych209a/ReadingsByDate/02_06/PDPVolIChapter8.pdf)]\n    - Rumelhart, Hinton, and Williams, \"Learning Internal Representations by Error Propagation\"\n- [x] **AutoEncoder** [[science](https://www.cs.toronto.edu/~hinton/science.pdf)]\n    - Hinton et al., \"Reducing the Dimensionality of Data with Neural Networks\"\n- [ ] **Denoising Autoencoders** (2008) [[paper](http://www.cs.toronto.edu/~larocheh/publications/icml-2008-denoising-autoencoders.pdf)]\n    - Vincent et al. \"Extracting and Composing Robust Features with Denoising Autoencoders\"\n- [ ] **Wasserstein Autoencoder** (2017) [[paper](https://arxiv.org/abs/1711.01558)]\n    - Tolstikhin et al. \"Wasserstein Auto Encoders\"\n\n### Autoregressive models\n- [ ] **PixelCNN** (2016) [[paper](https://arxiv.org/abs/1606.05328)]\n    - van den Oord et al. \"Conditional image generation with PixelCNN decoders.\"\n- [ ] **WaveNet** (2016) [[paper](https://arxiv.org/abs/1609.03499)]\n    - van den Oord et al. \"WaveNet: A Generative Model for Raw Audio\"\n- [ ] tacotron?\n\n\n### Layer Normalizations\n- [x] **Batch Normalization** (2015.2) [[paper](https://arxiv.org/abs/1502.03167)]\n    - Ioeffe et al. \"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift\"\n- Group Norm\n- [x] **Instance Normalization** (2016.7) [[paper](https://arxiv.org/abs/1607.08022)]\n    - Ulyanov et al. \"Instance Normalization: The Missing Ingredient for Fast Stylization\"\n- [ ] Santurkar et al. **\"How does Batch Normalization help Optimization?\"** (2018.5) [[paper](https://arxiv.org/abs/1805.11604)]\n- [ ] **Switchable Normalization** (2019) [[paper](https://arxiv.org/abs/1806.10779)]\n    - Luo et al. \"Differentiable Learning-to-Normalize via Switchable Normalization\"\n- [ ] **Weight Standardization** (2019.3) [[paper](https://arxiv.org/abs/1903.10520)]\n    - Qiao et al. \"Weight Standardization\"\n    \n### Initializations\n- [ ] **Xavier Initialization** (2010) [[paper](http://proceedings.mlr.press/v9/glorot10a.html)]\n    - Glorot et al., \"Understanding the difficulty of training deep feedforward neural networks\"\n- [ ] **Kaiming (He) Initialization** (2015.2) [[paper](https://arxiv.org/abs/1502.01852)]\n    - He et al., \"Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification\"\n- [ ] **All you need is a good init** (2015.11) [[paper](https://arxiv.org/abs/1511.06422)]\n    - Mishkin et al., \"All you need is a good init\"\n- [ ] **All you need is beyond a good init** (2017.4) [[paper](https://arxiv.org/abs/1703.01827)]\n    - Xie et al. \"All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation\"\n\n### Dropouts\n- **Dropout** (2014) [[paper](http://jmlr.org/papers/v15/srivastava14a.html)]\n    - Srivastava et al. \"Dropout: A Simple Way to Prevent Neural Networks from Overfitting\"\n- **Inverted Dropouts** [[notes on CS231n](http://cs231n.github.io/neural-networks-2/#reg)]\n    - Multiplying the inverted _keep_prob_ value on training so that values during inference (or testing) is consistent.\n- [ ] Li et al., **\"Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift\"** (2018.1) [[paper](https://arxiv.org/abs/1801.05134)]\n\n\n### Meta-Learning / Representation Learning (Zero-Shot learning, Few-Shot learning)\n- [x] **Zero-Data Learning** (2008) [[paper](https://www.aaai.org/Papers/AAAI/2008/AAAI08-103.pdf)]\n    - Larochelle et al., \"Zero-data Learning of New Tasks\"\n- [ ] Palatucci et al., **\"Zero-shot Learning with Semantic Output Codes\"** (NIPS 2009) [[paper](https://papers.nips.cc/paper/3650-zero-shot-learning-with-semantic-output-codes)]\n- [ ] Socher et al., **\"Zero-Shot Learning Through Cross-Modal Transfer\"** (2013.1) [[paper](https://arxiv.org/abs/1301.3666)]\n- [ ] Lampert et al., **\"Attribute-Based Classification for Zero-Shot Visual Object Categorization\"** (2013.7) [[paper](https://ieeexplore.ieee.org/abstract/document/6571196)]\n- [ ] Dinu et al., **\"Improving zero-shot learning by mitigating the hubness problem\"** (2014.12) [[paper](https://arxiv.org/abs/1412.6568)]\n- [ ] Romera-Paredes et al. - **\"An embarrassingly simple approach to zero-shot learning\"** (2015) [[paper](http://proceedings.mlr.press/v37/romera-paredes15.pdf)]\n- [ ] **Prototypical Networks** (2017.3) [[paper](https://arxiv.org/abs/1703.05175)]\n    - Snell et al., \"Prototypical Networks for Few-shot Learning\"\n- [ ] **Zero-shot learning - the Good, the Bad and the Ugly\"** (2017.3) [[paper](https://arxiv.org/abs/1703.04394)]\n    - Xian et al., \"Zero-Shot Learning - The Good, the Bad and the Ugly\"\n- [ ] **In defence of the Triplet Loss** (2017.3) [[paper](https://arxiv.org/abs/1703.07737)]\n    - Hermans et al., \"In Defense of the Triplet Loss for Person Re-Identification\"\n- [ ] **MAML** (2017.3) [[paper](https://arxiv.org/abs/1703.03400)]\n    - Finn et al, \"Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks\"\n- [ ] **Triplet Loss and Online Triplet Mining in Tensorflow** (2018.3) [[Oliver Moindrot Blog](https://omoindrot.github.io/triplet-loss)]\n- [ ] **Few-Shot learning Survey** (2019.4) [[paper](https://arxiv.org/abs/1904.05046)]\n    - Wang et al. \"Few-shot Learning: A Survey\"\n\n\n### Transfer learning\n- [ ] **Survey 2018** (2018) [[paper](https://arxiv.org/abs/1808.01974)]\n    - Tan et al. \"A Survey on Deep Transfer Learning\"\n\n### Geometric learning\n- **Geometric Deep Learning** (2016) [[paper](https://arxiv.org/abs/1611.08097)]\n    - Bronstein et al. \"Geometric deep learning: going beyond Euclidean data\"\n\n### Variational Autoencoders (VAE)\n- [ ] **VQ-VAE** (2017.11) [[paper](https://arxiv.org/abs/1711.00937)]\n    - van den Oord et al., \"Neural Discrete Representation Learning\"\n- [ ] **Semi-Amortized Variational Autoencoders** (2018.2) [[paper](https://arxiv.org/abs/1802.02550)]\n    - Kim et al. \"Semi-Amortized Variational Autoencoders\"\n\n### Object detection\n- [ ] RCNN: https://arxiv.org/abs/1311.2524\n- [ ] Fast-RCNN: https://arxiv.org/abs/1504.08083\n- [ ] Faster-RCNN: https://arxiv.org/abs/1506.01497\n- [ ] SSD: https://arxiv.org/abs/1512.02325\n- [ ] YOLO: https://arxiv.org/abs/1506.02640\n- [ ] YOLO9000: https://arxiv.org/abs/1612.08242\n\n### Semantic Segmentation\n- [ ] FCN: https://arxiv.org/abs/1411.4038\n- [ ] SegNet: https://arxiv.org/abs/1511.00561\n- [ ] UNet: https://arxiv.org/abs/1505.04597\n- [ ] PSPNet: https://arxiv.org/abs/1612.01105\n- [ ] DeepLab: https://arxiv.org/abs/1606.00915\n- [ ] ICNet: https://arxiv.org/abs/1704.08545\n- [ ] ENet: https://arxiv.org/abs/1606.02147\n- [Nice survey](https://github.com/hoya012/deep_learning_object_detection)\n\n### Sequential Model\n- [ ] **Seq2Seq** (2014) [[paper](https://arxiv.org/abs/1409.3215)]\n    - Sutskever et al. \"Sequence to sequence learning with neural networks.\"\n\n### Neural Turing Machine\n- [ ] **Neural Turing Machines** (2014) [[paper](https://arxiv.org/abs/1410.5401)]\n    - Graves et al., \"Neural turing machines.\"\n- [ ] **Pointer Networks** (2015) [[paper]](https://arxiv.org/abs/1506.03134)]\n    - Vinyals et al., \"Pointer networks.\"\n\n### Attention / Question-Answering\n- [ ] **NMT (Neural Machine Translation)** (2014) [[paper](https://arxiv.org/abs/1409.0473)]\n    - Bahdanau et al, \"Neural Machine Translation by Jointly Learning to Align and Translate\"\n- [ ] **Stanford Attentive Reader** (2016.6) [[paper](https://arxiv.org/abs/1606.02858)]\n    - Chen et al. \"A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task\"\n- [ ] **BiDAF** (2016.11) [[paper](https://arxiv.org/abs/1611.01603)]\n    - Seo et al. \"Bidirectional Attention Flow for Machine Comprehension\"\n- [ ] **DrQA** or **Stanford Attentive Reader++** (2017.3) [[paper](https://arxiv.org/abs/1704.00051)]\n    - Chen et al. \"Reading Wikipedia to Answer Open-Domain Questions\"\n- [ ] **Transformer** (2017.8) [[paper](https://arxiv.org/abs/1706.03762)] [[google ai blog](https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html)]\n     -  Vaswani et al. \"Attention is all you need\"    \n- [ ] [read] Lilian Weng - **\"Attention? Attention!\"** (2018) [[blog_post](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html)]\n    - A nice explanation of attention mechanism and its concepts.\n- [ ] **BERT** (2018.10) [[paper](https://arxiv.org/abs/1810.04805)]\n    - Devlin et al., \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding\"\n- [ ] **GPT-2** (2019) [[paper (pdf)](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf)]\n    - Radford et al. \"Language Models are Unsupervised Multitask Learners\"\n\n### Advanced RNNs\n- Unitary evolution RNNs : https://arxiv.org/abs/1511.06464\n- Recurrent Batch Norm : https://arxiv.org/abs/1603.09025\n- Zoneout : https://arxiv.org/abs/1606.01305\n- IndRNN : https://arxiv.org/abs/1803.04831\n- DilatedRNNs : https://arxiv.org/abs/1710.02224\n\n### Model Compression\n- **MobileNet** (2016) (see above: [Basic CNN Architectures](https://github.com/deNsuh/deep-learning-roadmap/blob/master/README.md#basic-cnn-architectures))\n- [ ] **ShuffleNet** (2017)\n    - Zhang et al. \"ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices\"\n\n### Neural Processes\n- [ ] **Neural Processes** (2018) [[paper](https://arxiv.org/abs/1807.01622)]\n    - Garnelo et al. \"Neural Processes\"\n- [ ] **Attentive Neural Processes** (2019) [[paper](https://arxiv.org/abs/1901.05761)]\n    - Kim et al. \"Attentive Neural Processes\"\n- [ ] **A Visual Exploration of Gaussian Processes** (2019) [[Distill.pub](https://distill.pub/2019/visual-exploration-gaussian-processes/)]\n    - Not a neural process, but gives very nice intuition about Gaussian Processes. Good Read.\n\n### Self-supervised learning\n- [x] Denoising AE https://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf\n- [ ] Exemplar Nets https://arxiv.org/abs/1406.6909\n- [ ] Co-occ https://arxiv.org/abs/1511.06811\n- [ ] Egomotion https://arxiv.org/abs/1505.01596\n- [ ] Jigsaw https://arxiv.org/abs/1603.09246\n- [ ] Context Encoders https://arxiv.org/abs/1604.07379\n- [ ] Split-brain autoencoders https://arxiv.org/abs/1611.09842\n- [ ] multi-task self-supervised learning https://arxiv.org/abs/1708.07860\n- [ ] Audio-visual scene analysis https://arxiv.org/abs/1804.03641\n- [ ] a survey https://slideplayer.com/slide/13195863/\n- [ ] Supervising unsupervised learning https://arxiv.org/abs/1709.05262\n- [ ] Unsupervised Representation Learning by Predicting Image Rotations https://arxiv.org/abs/1803.07728\n- [ ] Mahjourian et al., **\"Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints\"** (2018.2) [[paper](https://arxiv.org/abs/1802.05522)]\n- [ ] Gordon et al., **\"Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras\"** (2019.4) [[paper](https://arxiv.org/abs/1904.04998)]\n\n\n### Data Augmentation\n- [ ] **Shake Shake Regularization** (2017.5) [[paper](https://arxiv.org/abs/1705.07485)]\n    - Gastaldi, Xavier - \"Shake-Shake Regularization\"\n\n\n### Interpretation and Theory on Generalization, Overfitting, and Learning Capacity\n- [ ] **MDL (Minimum Description Length)**\n    - Peter Grunwald - \"A tutorial introduction to the minimum description length principle\" (2004) [[paper](https://arxiv.org/abs/math/0406077)]\n- [ ] Grunwald et al., - **\"Shannon Information and Kolmogorov Complexity\"** (2010) [[paper](https://arxiv.org/abs/cs/0410002)]\n- [ ] Dauphin et al. **\"Identifying and attacking the saddle point problem in high-dimensional non-convex optimization\"** (2014.6) [[paper](https://arxiv.org/abs/1406.2572)]\n- [ ] Choromanska et al. **\"The Loss Surfaces of Multilayer Networks\"** (2014.11) [[paper](https://arxiv.org/abs/1412.0233)]\n    - argues that non-convexity in NNs are not a huge problem\n- [ ] **Knowledge Distillation** (2015.3) [[paper](https://arxiv.org/abs/1503.02531)]\n    - Hinton et al., \"Distilling the Knowledge in a Neural Network\"\n- [ ] **3-Part Learning Theory** by Mostafa Samir\n    - [part 1: Introduction](https://mostafa-samir.github.io/ml-theory-pt1/)\n    - [part 2: Generalization Bounds](https://mostafa-samir.github.io/ml-theory-pt2/)\n    - [part 3: Regularization and Variance-Bias Tradeoff](https://mostafa-samir.github.io/ml-theory-pt3/)\n- [ ] **Deconvolution and Checkerboard Artifacts** - Odena (2016) [[distill.pub article](https://distill.pub/2016/deconv-checkerboard/)]\n- [ ] Keskar et al. \"**On Large-Batch Training for Deep Learning**: Generalization Gap and Sharp Minima\" (2016.9) [[paper](https://arxiv.org/abs/1609.04836)]\n- [ ] **Rethinking Generalization** (2016.11) [[paper](https://arxiv.org/abs/1611.03530)]\n    - Zhang et al. \"Understanding deep learning requires rethinking generalization\"\n- [ ] **Information Bottleneck** (2017) [[paper](https://arxiv.org/abs/1703.00810)] [[original paper on information bottleneck (2000)](https://arxiv.org/abs/physics/0004057)] [[youtube-talk](https://youtu.be/bLqJHjXihK8)] [[article in quantamagazine](https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/)]\n    - Shwartz-Ziv and Tishby, \"Opening the Black Box of Deep Neural Networks via Information\"\n- [ ] Neyshabur et al, \"Exploring Generalization in Deep Learning\" (2017.7) [[paper](https://arxiv.org/abs/1706.08947)]\n- [ ] Sun et al., **\"Revisiting Unreasonable Effectiveness of Data in Deep Learning Era\"** (2017.7) [[paper](https://arxiv.org/abs/1707.02968)]\n- [ ] **Super-Convergence** (2017.8) [[paper](https://arxiv.org/abs/1708.07120)]\n    - Smith et al. - \"Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates\"\n- [ ] **Don't Decay the Learning Rate, Increase the Batch Size** (2017.11) [[paper](https://arxiv.org/abs/1711.00489)]\n    - Smith et al. \"Don't Decay the Learning Rate, Increase the Batch Size\"\n- [ ] Hestness et al. **\"Deep Learning Scaling is Predictable, Empirically\"** (2017.12) [[paper](https://arxiv.org/abs/1712.00409)]\n- [ ] **Visualizing loss landscape of neural nets** (2018) [[paper](https://arxiv.org/abs/1712.09913v3)]\n- [ ] Olson et al., **\"Modern Neural Networks Generalize on Small Data Sets\"** (NeurIPS 2018) [[paper](https://papers.nips.cc/paper/7620-modern-neural-networks-generalize-on-small-data-sets)]\n- [x] **Lottery Ticket Hypothesis** (2018.3) [[paper](https://arxiv.org/abs/1803.03635)]\n    - Frankle et al., \"The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks\"\n    - Empirically showed that zeroing small weights after training, rewinding except zeroed wegiths, and then re-triaining with 'pruned' weights showed even better results.\n- [ ] **Intrinsic Dimension** (2018.4) [[paper](https://arxiv.org/abs/1804.08838)]\n    - Li et al., \"Measuring the Intrinsic Dimension of Objective Landscapes\"\n- [ ] Geirhos et al. \"**ImageNet-trained CNNs are biased towards texture;** increasing shape bias improves accuracy and robustness\" (2018.11) [[paper](https://arxiv.org/abs/1811.12231)]\n- [ ] Belkin et al. **\"Reconciling modern machine learning and the bias-variance trade-off\"** (2018.12) [[paper](https://arxiv.org/abs/1812.11118)]\n- [ ] Graetz - \"How to visualize convolution features in 40 lines of code\" (2019) [[medium](https://towardsdatascience.com/how-to-visualize-convolutional-features-in-40-lines-of-code-70b7d87b0030)]\n- [ ] Geiger et al. **\"Scaling description of generalization with number of parameters in deep learning\"** (2019.1) [[paper](https://arxiv.org/abs/1901.01608)]\n- [ ] **Are all layers created equal?** (2019.2) [[paper](https://arxiv.org/abs/1902.01996)]\n    - Zhang et al. \"Are all layers created equal?\" \n- [x] Lilian Weng - **\"Are Deep Neural Networks Dramatically Overfitted?\"** (2019.4) [[lil'log](https://lilianweng.github.io/lil-log/2019/03/14/are-deep-neural-networks-dramatically-overfitted.html)]\n    - Excellent article about generalization and overfitting of deep neural networks\n    \n### Adversarial Attacks and Defense against attacks (RobustML)\n- [RobustML site](https://www.robust-ml.org/)\n- [x] **Adversarial Examples** Szegedy et al. - Intreguing Properties of Neural Networks (2013.12) [[paper](https://arxiv.org/abs/1312.6199)]\n    - induces missclassification by applying small perturbations\n    - this paper was the first to coin the term \"Adversarial Example\"\n- [ ] **Fast Gradient Sign Attack (FGSM)** (2014.12)\n    - Goodfellow et al., \"Explaining and Harnessing Adversarial Examples\" (ICLR 2015) [[paper](https://arxiv.org/abs/1412.6572)]\n    - This paper presented the famous \"panda example\" (as also seen in [pytorch tutorial](https://pytorch.org/tutorials/beginner/fgsm_tutorial.html))\n- [ ] Kurakin et al., **\"Adversarial Machine Learning at Scale\"** (2016.11) [[paper](https://arxiv.org/abs/1611.01236)]\n- [ ] Mandry et al., **\"Towards Deep Learning Models Resistant to Adversarial Attacks\"** (2017.6) [[paper](https://arxiv.org/abs/1706.06083)]\n- [ ] Carlini et al., **\"Audio Adversarial Examples: Targeted Attacks on Speech-to-Text\"** (2018.1) [[paper](https://arxiv.org/abs/1801.01944)]\n\n\n### Neural architecture search (NAS) and AutoML\n- **GREAT AutoML Website** [[site](https://www.automl.org/)]\n    - They maintain a blog, a list of NAS literatures, analysis page, and a web book.\n- [ ] **AdaNet** (2016.7) [[paper](https://arxiv.org/abs/1607.01097?context=cs)] [[GoogleAI blog](https://ai.googleblog.com/2018/10/introducing-adanet-fast-and-flexible.html)]\n    - Cortes et al. \"AdaNet: Adaptive Structural Learning of Artificial Neural Networks\"\n- [ ] **NAS** (2016.12) [[paper](https://arxiv.org/abs/1611.01578)]\n    - Zoph et al. \"Neural Architecture Search with Reinforcement Learning\"\n- [ ] **PNAS** (2017.12) [[paper](https://arxiv.org/abs/1712.00559)]\n    - Liu et al. \"Progressive Neural Architecture Search\"\n- [ ] **ENAS** (2018.2) [[paper](https://arxiv.org/abs/1802.03268)]\n    - Pham et al. \"Efficient Neural Architecture Search via Parameter Sharing\"\n- [ ] **DARTS** (2018.6) [[paper](https://arxiv.org/abs/1806.09055)]\n    - Liu et al. \"DARTS: Differentiable Architecture Search\"\n    - Uses a continuous relaxation over the discrete neural architecture space.\n- [ ] **RandWire** (2019) [[paper](https://arxiv.org/abs/1904.01569)]\n    - Xie et al. \"Exploring Randomly Wired Neural Networks for Image Recognition\" [Facebook AI Research]\n- [ ] **A Survey on Neural Architecture Search** (2019) [[paper](https://arxiv.org/abs/1905.01392)]\n    - Witsuba et al., \"A Survey on Neural Architecture Search\"\n    \n    \n### Practical Techniques\n- [x] Andrej Karpathy - **\"A recipe for training neural networks\"** (2019) [[Andrej Karpathy Blog Post](http://karpathy.github.io/2019/04/25/recipe/)]\n    \n\n## DL roadmap reference\n- https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadmap\n- https://github.com/terryum/awesome-deep-learning-papers\n- which DL algorithms should I implement to learn?\nhttps://www.reddit.com/r/MachineLearning/comments/8vmuet/d_what_deep_learning_papers_should_i_implement_to/\n\n\n### Theory\n- [The MML(Mathematics for Machine Learning) book](https://mml-book.github.io/)\n- [Andrej Karpathy - Yes You Should Understand Backprop](https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b)\n- [Theoretical principles for Deep Learning](http://mitliagkas.github.io/ift6085-dl-theory-class-2019/?fbclid=IwAR02mfw0hMM5UFaBuxBgDi5wfT6TaIt35wktZGpCmmhu0_3GMA6HqFN1GFs)\n- [Stanford STATS 385 - Theories of Deep Learning](https://stats385.github.io/)\n    - [Lecture Videos](https://www.researchgate.net/project/Theories-of-Deep-Learning?fbclid=IwAR0dwnuswA1jMwIOuydb_a83AM22FfuD6PpAWPiIW-76OCemcRBrVVLKLoM)\n- CSC 231 notes : http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/\n\n\n### Resources\n- **A Selective Overview of Deep Learning** (2019) [[paper](https://arxiv.org/abs/1904.05526)]\n    - Fan et al. \"A Selective Overview of Deep Learning\"\n    - A nice overview paper on deep learning up to early 2019 (about 30 pages)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdansuh17%2Fdeep-learning-roadmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdansuh17%2Fdeep-learning-roadmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdansuh17%2Fdeep-learning-roadmap/lists"}