https://github.com/robertsdionne/neural-network-papers

awesome-lists
Last synced: about 1 month ago
JSON representation
Host: GitHub
URL: https://github.com/robertsdionne/neural-network-papers
Owner: robertsdionne
Created: 2015-09-06T20:45:54.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2020-07-19T08:17:52.000Z (almost 5 years ago)
Last Synced: 2025-05-15T23:03:03.644Z (about 1 month ago)
Topics: awesome-lists
Language: JavaScript
Homepage:
Size: 223 KB
Stars: 1,969
Watchers: 183
Forks: 378
Open Issues: 2
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

fucking-lists - neural-network-papers
awesomelist - neural-network-papers
collection - neural-network-papers
lists - neural-network-papers
README

        # neural-network-papers

## Table of Contents

1. [Other Lists](#other-lists)

2. [Surveys](#surveys)

3. [Books](#books)

4. [Datasets](#datasets)

5. [Pretrained Models](#pretrained-models)

6. [Programming Frameworks](#programming-frameworks)

7. [Learning to Compute](#learning-to-compute)

8. [Natural Language Processing](#natural-language-processing)

9. [Convolutional Neural Networks](#convolutional-neural-networks)

10. [Recurrent Neural Networks](#recurrent-neural-networks)

11. [Convolutional Recurrent Neural Networks](#convolutional-recurrent-neural-networks)

12. [Adversarial Neural Networks](#adversarial-neural-networks)

12. [Autoencoders](#autoencoders)

13. [Restricted Boltzmann Machines](#restricted-boltzmann-machines)

14. [Biologically Plausible Learning](#biologically-plausible-learning)

15. [Supervised Learning](#supervised-learning)

16. [Unsupervised Learning](#unsupervised-learning)

17. [Reinforcement Learning](#reinforcement-learning)

18. [Theory](#theory)

19. [Quantum Computing](#quantum-computing)

20. [Training Innovations](#training-innovations)

21. [Parallel Training](#parallel-training)

22. [Weight Compression](#weight-compression)

23. [Numerical Precision](#numerical-precision)

24. [Numerical Optimization](#numerical-optimization)

25. [Motion Planning](#motion-planning)

26. [Simulation](#simulation)

27. [Hardware](#hardware)

28. [Cognitive Architectures](#cognitive-architectures)

29. [Computational Creativity](#computational-creativity)

30. [Cryptography](#cryptography)

31. [Distributed Computing](#distributed-computing)

32. [Clustering](#clustering)

## Other Lists

* [DeepLearning.University – An Annotated Deep Learning Bibliography | Memkite](http://memkite.com/deep-learning-bibliography/ "Amund Tveit") ([github.com/memkite/DeepLearningBibliography](https://github.com/memkite/DeepLearningBibliography))

* [Deep Learning for NLP resources](https://github.com/andrewt3000/DL4NLP "Andrew Thomas")

* [Reading List « Deep Learning](http://deeplearning.net/reading-list/ "Caglar Gulcehre et al.")

* [Reading lists for new MILA students](https://docs.google.com/document/d/1IXF3h0RU5zz4ukmTrVKVotPQypChscNGf5k6E25HGvA/edit "Institut des algorithmes d'apprentissage de Montréal (Montreal Institute for Learning Algorithms)")

* [Awesome Recurrent Neural Networks](https://github.com/kjw0612/awesome-rnn "Jiwon Kim")

* [Awesome Deep Learning](https://github.com/ChristosChristofidis/awesome-deep-learning "Christos Christofidis")

* [Deep learning Reading List](http://jmozah.github.io/links/ "J Mohamed Zahoor")

* [A curated list of speech and natural language processing resources](https://medium.com/@joshdotai/a-curated-list-of-speech-and-natural-language-processing-resources-4d89f94c032a "Paul Dixon") ([github.com/edobashira/speech-language-processing](https://github.com/edobashira/speech-language-processing))

* [CS089/CS189 | Deep Learning | Spring 2015](http://www.cs.dartmouth.edu/~lorenzo/teaching/cs189/readinglist.html "Lorenzo Torresani")

## Surveys

* [Deep Learning](http://rdcu.be/cW4c "Yann LeCunn, Yoshua Bengio, Geoffrey Hinton")

* [Deep Learning in Neural Networks: An Overview](http://arxiv.org/abs/1404.7828 "Juergen Schmidhuber")

* [Deep neural networks: a new framework for modelling biological vision and brain information processing](http://biorxiv.org/content/biorxiv/early/2015/10/26/029876.abstract "Nikolaus Kriegeskorte")

* [A Primer on Neural Network Models for Natural Language Processing](http://u.cs.biu.ac.il/~yogo/nnlp.pdf "Yoav Goldberg")

* [Natural Language Understanding with Distributed Representation](http://arxiv.org/abs/1511.07916 "Kyunghyun Cho")

## Books

* [Deep Learning](http://www.iro.umontreal.ca/~bengioy/dlbook/ "Yoshua Bengio, Ian J. Goodfellow, Aaron Courville")

* [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/ "Michael Nielsen")

## Datasets

* [Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks](http://arxiv.org/abs/1502.05698 "Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov, Alexander M. Rush") ([fb.ai/babi](http://fb.ai/babi))

* [Teaching Machines to Read and Comprehend](http://arxiv.org/abs/1506.03340 "Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom") ([github.com/deepmind/rc-data](https://github.com/deepmind/rc-data))

* [One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling](http://arxiv.org/abs/1312.3005 "Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson") ([github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark](https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark))

* [The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems](http://arxiv.org/abs/1506.08909 "Ryan Lowe, Nissan Pow, Iulian Serban, Joelle Pineau") ([cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0](http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/))

* [Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books](http://arxiv.org/abs/1506.06724 "Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler") ([BookCorpus](http://www.cs.toronto.edu/~mbweb/))

* [Every publicly available Reddit comment, for research.](https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/ "Stuck_In_the_Matrix")

* [Stack Exchange Data Dump](https://archive.org/details/stackexchange "Stack Exchange")

* [Europarl: A Parallel Corpus for Statistical Machine Translation](http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/europarl-mtsummit05.pdf "Philipp Koehn") ([www.statmt.org/europarl/](http://www.statmt.org/europarl/))

* [RTE Knowledge Resources](http://aclweb.org/aclwiki/index.php?title=RTE_Knowledge_Resources)

## Pretrained Models

* [Model Zoo](https://github.com/BVLC/caffe/wiki/Model-Zoo "Berkeley Vision and Learning Center")

* [word2vec](https://code.google.com/p/word2vec/ "Tomas Mikolov")

  * [GoogleNews-vectors-negative300.bin.gz](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing)

  * [freebase-vectors-skipgram1000.bin.gz](https://docs.google.com/file/d/0B7XkCwpI5KDYaDBDQm1tZGNDRHc/edit?usp=sharing)

* [GloVe](http://nlp.stanford.edu/projects/glove/ "Jeffrey Pennington, Richard Socher, Christopher D. Manning")

* [SENNA](http://ronan.collobert.com/senna/ "R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa")

## Programming Frameworks

* [TensorFlow](http://download.tensorflow.org/paper/whitepaper2015.pdf "Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng") ([tensorflow.org](http://tensorflow.org)) ([github.com/tensorflow/tensorflow](https://github.com/tensorflow/tensorflow))

* [Caffe: Convolutional Architecture for Fast Feature Embedding](http://arxiv.org/abs/1408.5093 "Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell") ([github.com/BVLC/caffe](https://github.com/BVLC/caffe)) ([github.com/amd/OpenCL-caffe](https://github.com/amd/OpenCL-caffe))

  * [Improving Caffe: Some Refactoring](http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-improving.pdf "Yangqing Jia") ([github.com/Yangqing/caffe2](https://github.com/Yangqing/caffe2))

* [Theano: A CPU and GPU Math Compiler in Python](http://www.iro.umontreal.ca/~lisa/pointeurs/theano_scipy2010.pdf "James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, Yoshua Bengio") ([github.com/Theano/Theano](https://github.com/Theano/Theano))

  * [Theano: new features and speed improvements](http://arxiv.org/abs/1211.5590 "Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio")

  * [Blocks and Fuel: Frameworks for deep learning](http://arxiv.org/abs/1506.00619 "Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio") ([github.com/mila-udem/blocks](https://github.com/mila-udem/blocks)) ([github.com/mila-udem/blocks-examples](https://github.com/mila-udem/blocks-examples)) ([github.com/mila-udem/fuel](https://github.com/mila-udem/fuel))

  * [Announcing Computation Graph Toolkit](http://joschu.github.io/index.html#Announcing CGT "John Schulman") ([github.com/joschu/cgt](https://github.com/joschu/cgt))

* [Torch7: A Matlab-like Environment for Machine Learning](http://ronan.collobert.com/pub/matos/2011_torch7_nipsw.pdf "Ronan Collobert, Koray Kavukcuoglu, Clément Farabet") ([github.com/torch/distro](https://github.com/torch/distro))

* [Brainstorm](https://github.com/IDSIA/brainstorm "Istituto Dalle Molle di studi sull'intelligenza artificiale")

* [Deeplearning4j - Open-source, distributed deep learning for the JVM](http://deeplearning4j.org/ "Adam Gibson, Chris Nicholson, Josh Patterson et al.") ([github.com/deeplearning4j/deeplearning4j](https://github.com/deeplearning4j/deeplearning4j))

  * [ND4J: N-Dimensional Arrays for Java N-Dimensional Scientific Computing for Java](http://nd4j.org/ "Adam Gibson, Chris Nicholson, Josh Patterson et al.") ([github.com/deeplearning4j/nd4j](https://github.com/deeplearning4j/nd4j))

* [linalg: Matrix Computations in Apache Spark](http://arxiv.org/abs/1509.02256 "Reza Bosagh Zadeh, Xiangrui Meng, Burak Yavuz, Aaron Staple, Li Pu, Shivaram Venkataraman, Evan Sparks, Alexander Ulanov, Matei Zaharia")

* [cuDNN: Efficient Primitives for Deep Learning](http://arxiv.org/abs/1410.0759 "Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, Evan Shelhamer")

* [Fast Convolutional Nets With fbfft: A GPU Performance Evaluation](http://arxiv.org/abs/1412.7580 "Nicolas Vasilache, Jeff Johnson, Michael Mathieu, Soumith Chintala, Serkan Piantino, Yann LeCun") ([github.com/facebook/fbcuda](https://github.com/facebook/fbcuda))

* [Guide to NumPy](http://web.mit.edu/dvp/Public/numpybook.pdf "Travis Oliphant")

* [Probabilistic Programming in Python using PyMC](http://arxiv.org/abs/1507.08050 "John Salvatier, Thomas Wiecki, Christopher Fonnesbeck")

## Learning to Compute

* [Neural GPUs Learn Algorithms](http://arxiv.org/abs/1511.08228 "Łukasz Kaiser, Ilya Sutskever")

* [A Roadmap towards Machine Intelligence](http://arxiv.org/abs/1511.08130 "Tomas Mikolov, Armand Joulin, Marco Baroni")

* [On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models](http://arxiv.org/abs/1511.09249 "Juergen Schmidhuber")

* [Binding via Reconstruction Clustering](http://arxiv.org/abs/1511.06418 "Klaus Greff, Rupesh Kumar Srivastava, Jürgen Schmidhuber")

* [Neural Random-Access Machines](http://arxiv.org/abs/1511.06392 "Karol Kurach, Marcin Andrychowicz, Ilya Sutskever")

* [Learning Simple Algorithms from Examples](http://arxiv.org/abs/1511.07275 "Wojciech Zaremba, Tomas Mikolov, Armand Joulin, Rob Fergus")

* [Neural Programmer: Inducing Latent Programs with Gradient Descent](http://arxiv.org/abs/1511.04834 "Arvind Neelakantan, Quoc V. Le, Ilya Sutskever")

* [Neural Programmer-Interpreters](http://arxiv.org/abs/1511.06279 "Scott Reed, Nando de Freitas")

* [Neural Turing Machines](http://arxiv.org/abs/1410.5401 "Alex Graves, Greg Wayne, Ivo Danihelka")

  * [Reinforcement Learning Neural Turing Machines](http://arxiv.org/abs/1505.00521 "Wojciech Zaremba, Ilya Sutskever")

  * [Structured Memory for Neural Turing Machines](http://arxiv.org/abs/1510.03931 "Wei Zhang, Yang Yu, Bowen Zhou")

* [Memory Networks](http://arxiv.org/abs/1410.3916 "Jason Weston, Sumit Chopra, Antoine Bordes") ([github.com/facebook/MemNN](https://github.com/facebook/MemNN))

  * [End-To-End Memory Networks](http://arxiv.org/abs/1503.08895 "Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus")

* [Learning to Transduce with Unbounded Memory](http://arxiv.org/abs/1506.02516 "Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, Phil Blunsom")

* [Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets](http://arxiv.org/abs/1503.01007 "Armand Joulin, Tomas Mikolov") ([github.com/facebook/Stack-RNN](https://github.com/facebook/Stack-RNN))

  * [Learning Context Free Grammars: Limitations of a Recurrent Neural Network with an External Stack Memory](https://clgiles.ist.psu.edu/papers/Cog.Sci.conf.14th.NNPDA.pdf "Sreerupa Das, C. Lee Giles, Guo-Zheng Sun")

  * [A connectionist symbol manipulator that discovers the structure of context-free languages](http://papers.nips.cc/paper/626-a-connectionist-symbol-manipulator-that-discovers-the-structure-of-context-free-languages.pdf "Michael C. Mozer, Sreerupa Das")

* [Feedforward Sequential Memory Neural Networks without Recurrent Feedback](http://arxiv.org/abs/1510.02693 "ShiLiang Zhang, Hui Jiang, Si Wei, LiRong Dai")

* [Pointer Networks](http://arxiv.org/abs/1506.03134 "Oriol Vinyals, Meire Fortunato, Navdeep Jaitly")

* [On End-to-End Program Generation from User Intention by Deep Neural Networks](http://arxiv.org/abs/1510.07211 "Lili Mou, Rui Men, Ge Li, Lu Zhang, Zhi Jin")

* [Deep Knowledge Tracing](http://arxiv.org/abs/1506.05908 "Chris Piech, Jonathan Spencer, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas Guibas, Jascha Sohl-Dickstein") ([github.com/chrispiech/DeepKnowledgeTracing](https://github.com/chrispiech/DeepKnowledgeTracing))

* [Learning to Execute](http://arxiv.org/abs/1410.4615 "Wojciech Zaremba, Ilya Sutskever")

* [Tree-structured composition in neural networks without tree-structured architectures](http://arxiv.org/abs/1506.04834 "Samuel R. Bowman, Christopher D. Manning, Christopher Potts")

* [Grammar as a Foreign Language](http://arxiv.org/abs/1412.7449 "Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton")

* [Learning To Learn Using Gradient Descent](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.323 "Sepp Hochreiter, A. Steven Younger, Peter R. Conwell")

* [Learning to control fast-weight memories: An alternative to recurrent nets](http://people.idsia.ch/~juergen/fastweights/ncfastweightsrev.html "Jürgen Schmidhuber") (ftp://ftp.idsia.ch/pub/juergen/fastweights.ps.gz)

* [An introspective network that can learn to run its own weight change algorithm](http://people.idsia.ch/~juergen/rnn.html "Jürgen Schmidhuber") (ftp://ftp.idsia.ch/pub/juergen/iee93self.ps.gz)

* [Goedel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements](http://arxiv.org/abs/cs/0309048 "Juergen Schmidhuber")

* [Optimal Ordered Problem Solver](http://people.idsia.ch/~juergen/oops.html "Jürgen Schmidhuber") (ftp://ftp.idsia.ch/pub/juergen/oopsmlj.pdf)

  * [The Fastest and Shortest Algorithm for All Well-Defined Problems](http://people.idsia.ch/~juergen/optimalsearch.html "Marcus Hutter") (ftp://ftp.idsia.ch/pub/techrep/IDSIA-16-00.ps.gz)

  * [The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions](http://people.idsia.ch/~juergen/speedprior.html "Jürgen Schmidhuber") (ftp://ftp.idsia.ch/pub/juergen/coltspeed.pdf)

  * [Learning Game of Life with a Convolutional Neural Network](http://danielrapp.github.io/cnn-gol/ "Daniel Rapp") ([github.com/DanielRapp/cnn-gol](https://github.com/DanielRapp/cnn-gol))

## Natural Language Processing

* [Deep Learning, NLP, and Representations](http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ "Christopher Olah")

* [Language Models for Image Captioning: The Quirks and What Works](http://arxiv.org/abs/1505.01809 "Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell")

* [Zero-Shot Learning Through Cross-Modal Transfer](http://arxiv.org/abs/1301.3666 "Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng")

* [On Using Very Large Target Vocabulary for Neural Machine Translation](http://arxiv.org/abs/1412.2007 "Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio")

* [BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies](http://arxiv.org/abs/1511.06909 "Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey")

* [Deep Unordered Composition Rivals Syntactic Methods for Text Classification](http://cs.umd.edu/~miyyer/pubs/2015_acl_dan.pdf "Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, Hal Daume III")

### Word Vectors

* [So similar and yet incompatible: Toward automated identification of semantically compatible words](http://clic.cimec.unitn.it/marco/publications/kruszewski-baroni-compatibility-naacl-2015.pdf "Germán Kruszewski, Marco Baroni") ([github.com/germank/compatibility-naacl2015](https://github.com/germank/compatibility-naacl2015))

* [Controlled Experiments for Word Embeddings](http://arxiv.org/abs/1510.02675 "Benjamin J. Wilson, Adriaan M. J. Schakel") ([github.com/benjaminwilson/word2vec-norm-experiments](https://github.com/benjaminwilson/word2vec-norm-experiments))

* [Natural Language Processing (almost) from Scratch](http://arxiv.org/abs/1103.0398 "Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa")

* [Efficient Estimation of Word Representations in Vector Space](http://arxiv.org/abs/1301.3781 "Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean")

  * [Distributed Representations of Words and Phrases and their Compositionality](http://arxiv.org/abs/1310.4546 "Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean")

  * [Exploiting Similarities among Languages for Machine Translation](http://arxiv.org/abs/1309.4168 "Tomas Mikolov, Quoc V. Le, Ilya Sutskever")

* [GloVe: Global Vectors for Word Representation](http://nlp.stanford.edu/projects/glove/glove.pdf "Jeffrey Pennington, Richard Socher, Christopher D. Manning")

* [Learning to Understand Phrases by Embedding the Dictionary](http://arxiv.org/abs/1504.00548 "Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio")

* [Inverted indexing for cross-lingual NLP](http://cst.dk/anders/inverted.pdf "Anders Søgaard, Željko Agić, Héctor Martínez Alonso, Barbara Plank, Bernd Bohnet, Anders Johannsen")

* [Random walks on discourse spaces: a new generative language model with applications to semantic word embeddings](http://arxiv.org/abs/1502.03520 "Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski")

* [Breaking Sticks and Ambiguities with Adaptive Skip-gram](http://arxiv.org/abs/1502.07257 "Sergey Bartunov, Dmitry Kondrashkin, Anton Osokin, Dmitry Vetrov")

* [Language Recognition using Random Indexing](http://arxiv.org/abs/1412.7026 "Aditya Joshi, Johan Halseth, Pentti Kanerva")

### Sentence and Paragraph Vectors

* [Generating Sentences from a Continuous Space](http://arxiv.org/abs/1511.06349 "Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio")

* [Distributed Representations of Sentences and Documents](http://arxiv.org/abs/1405.4053 "Quoc V. Le, Tomas Mikolov")

* [Document Embedding with Paragraph Vectors](http://arxiv.org/abs/1507.07998 "Andrew M. Dai, Christopher Olah, Quoc V. Le")

* [A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models](http://arxiv.org/abs/1505.01504 "Shiliang Zhang, Hui Jiang, Mingbin Xu, Junfeng Hou, Lirong Dai")

* [Skip-Thought Vectors](http://arxiv.org/abs/1506.06726 "Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler") ([github.com/ryankiros/skip-thoughts](https://github.com/ryankiros/skip-thoughts))

* [From Word Embeddings To Document Distances](http://jmlr.org/proceedings/papers/v37/kusnerb15.pdf "Matt J. Kusner, Yu Sun, Nicholas I. Kolkin, Kilian Q. Weinberger")

### Character Vectors

* [Alternative structures for character-level RNNs](http://arxiv.org/abs/1511.06303 "Piotr Bojanowski, Armand Joulin, Tomas Mikolov")

* [Character-based Neural Machine Translation](http://arxiv.org/abs/1511.04586 "Wang Ling, Isabel Trancoso, Chris Dyer, Alan W Black")

* [Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation](http://arxiv.org/abs/1508.02096 "Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso") ([github.com/wlin12/JNN](https://github.com/wlin12/JNN))

* [Character-Aware Neural Language Models](http://arxiv.org/abs/1508.06615 "Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush") ([github.com/yoonkim/lstm-char-cnn](https://github.com/yoonkim/lstm-char-cnn))

* [Modeling Order in Neural Word Embeddings at Scale](http://arxiv.org/abs/1506.02338 "Andrew Trask, David Gilmore, Matthew Russell")

* [Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs](http://arxiv.org/abs/1508.00657 "Miguel Ballesteros, Chris Dyer, Noah A. Smith")

### Attention Mechanisms

* [Neural Machine Translation by Jointly Learning to Align and Translate](http://arxiv.org/abs/1409.0473 "Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio")

* [Ask Me Anything: Dynamic Memory Networks for Natural Language Processing](http://arxiv.org/abs/1506.07285 "Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert English, Brian Pierce, Peter Ondruska, Mohit Iyyer, Ishaan Gulrajani, Richard Socher")

* [Attention with Intention for a Neural Network Conversation Model](http://arxiv.org/abs/1510.08565 "Kaisheng Yao, Geoffrey Zweig, Baolin Peng")

### Sequence-to-Sequence Learning

* [Multi-task Sequence to Sequence Learning](http://arxiv.org/abs/1511.06114 "Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser")

* [Order Matters: Sequence to sequence for sets](http://arxiv.org/abs/1511.06391 "Oriol Vinyals, Samy Bengio, Manjunath Kudlur")

* [Task Loss Estimation for Sequence Prediction](http://arxiv.org/abs/1511.06456 "Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio")

* [Semi-supervised Sequence Learning](http://arxiv.org/abs/1511.01432 "Andrew M. Dai, Quoc V. Le")

* [A Hierarchical Neural Autoencoder for Paragraphs and Documents](http://arxiv.org/abs/1506.01057 "Jiwei Li, Minh-Thang Luong, Dan Jurafsky") ([github.com/jiweil/Hierarchical-Neural-Autoencoder](https://github.com/jiweil/Hierarchical-Neural-Autoencoder))

* [Sequence to Sequence Learning with Neural Networks](http://arxiv.org/abs/1409.3215 "Ilya Sutskever, Oriol Vinyals, Quoc V. Le")

* [Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation](http://arxiv.org/abs/1406.1078 "Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio")

* [Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning](http://arxiv.org/abs/1506.06442 "Fandong Meng, Zhengdong Lu, Zhaopeng Tu, Hang Li, Qun Liu")

* [On Using Monolingual Corpora in Neural Machine Translation](http://arxiv.org/abs/1503.03535 "Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio")

### Language Understanding

* [Reasoning about Entailment with Neural Attention](http://arxiv.org/abs/1509.06664 "Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom")

* [The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations](http://arxiv.org/abs/1511.02301 "Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston")

* [Investigation of Recurrent-Neural-Network Architectures and Learning Methods for Spoken Language Understanding](http://www.iro.umontreal.ca/~lisa/pointeurs/RNNSpokenLanguage2013.pdf "Grégoire Mesnil, Xiaodong He, Li Deng, Yoshua Bengio")

* [Language Understanding for Text-based Games Using Deep Reinforcement Learning](http://arxiv.org/abs/1506.08941 "Karthik Narasimhan, Tejas Kulkarni, Regina Barzilay") ([github.com/karthikncode/text-world-player](https://github.com/karthikncode/text-world-player))

### Question Answering, and Conversing

* [A Cognitive Neural Architecture Able to Learn and Communicate through Natural Language](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0140866 "Bruno Golosio, Angelo Cangelosi, Olesya Gamotina, Giovanni Luca Masala") ([github.com/golosio/annabell](https://github.com/golosio/annabell))

* [Large-scale Simple Question Answering with Memory Networks](http://arxiv.org/abs/1506.02075 "Antoine Bordes, Nicolas Usunier, Sumit Chopra, Jason Weston")

* [Reasoning in Vector Space: An Exploratory Study of Question Answering](http://arxiv.org/abs/1511.06426 "Moontae Lee, Xiaodong He, Wen-tau Yih, Jianfeng Gao, Li Deng, Paul Smolensky")

* [Deep Learning for Answer Sentence Selection](http://arxiv.org/abs/1412.1632 "Lei Yu, Karl Moritz Hermann, Phil Blunsom, Stephen Pulman")

* [Neural Responding Machine for Short-Text Conversation](http://arxiv.org/abs/1503.02364 "Lifeng Shang, Zhengdong Lu, Hang Li")

* [A Neural Conversational Model](http://arxiv.org/abs/1506.05869 "Oriol Vinyals, Quoc Le")

* [VQA: Visual Question Answering](http://arxiv.org/abs/1505.00468 "Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh")

* [Question Answering with Subgraph Embeddings](https://research.facebook.com/publications/1473550739586509/question-answering-with-subgraph-embeddings/ "Antoine Bordes, Sumit Chopra, Jason Weston")

* [Hierarchical Neural Network Generative Models for Movie Dialogues](http://arxiv.org/abs/1507.04808v1 "Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau")

* [Ask Your Neurons: A Neural-based Approach to Answering Questions about Images](http://arxiv.org/abs/1505.01121 "Mateusz Malinowski, Marcus Rohrbach, Mario Fritz")

* [Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering](http://arxiv.org/abs/1505.05612 "Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, Wei Xu")

### Convolutional

* [Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks.](http://biorxiv.org/content/early/2015/10/05/028399.abstract "David R Kelley, Jasper Snoek, John Rinn") ([github.com/davek44/Basset](https://github.com/davek44/Basset))

* [A Convolutional Neural Network for Modelling Sentences](http://arxiv.org/abs/1404.2188 "Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom")

* [Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1408.5882 "Yoon Kim") ([github.com/yoonkim/CNN_sentence](https://github.com/yoonkim/CNN_sentence))

* [Text Understanding from Scratch](http://arxiv.org/abs/1502.01710 "Xiang Zhang, Yann LeCun")

  * [Character-level Convolutional Networks for Text Classification](http://arxiv.org/abs/1509.01626 "Xiang Zhang, Junbo Zhao, Yann LeCun")

* [DeepWriterID: An End-to-end Online Text-independent Writer Identification System](http://arxiv.org/abs/1508.04945 "Weixin Yang, Lianwen Jin, Manfei Liu")

* [Encoding Source Language with Convolutional Neural Network for Machine Translation](http://arxiv.org/abs/1503.01838 "Fandong Meng, Zhengdong Lu, Mingxuan Wang, Hang Li, Wenbin Jiang, Qun Liu")

* [Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling](http://arxiv.org/abs/1506.07650 "Kun Xu, Yansong Feng, Songfang Huang, Dongyan Zhao")

* [Convolutional Neural Network Architectures for Matching Natural Language Sentences](http://arxiv.org/abs/1503.03244 "Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen")

### Recurrent

* [Long Short-Term Memory Over Tree Structures](http://arxiv.org/abs/1503.04881 "Xiaodan Zhu, Parinaz Sobhani, Hongyu Guo")

* [Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks](http://arxiv.org/abs/1503.00075 "Kai Sheng Tai, Richard Socher, Christopher D. Manning")

* [CCG Supertagging with a Recurrent Neural Network](http://www.aclweb.org/anthology/P15-2041 "Wenduan Xu, Michael Auli, Stephen Clark")

## Convolutional Neural Networks

* [Spatial Transformer Networks](http://arxiv.org/abs/1506.02025 "Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu")

* [SimNets: A Generalization of Convolutional Networks](http://arxiv.org/abs/1410.0781 "Nadav Cohen, Amnon Shashua")

* [Fast Algorithms for Convolutional Neural Networks](http://arxiv.org/abs/1509.09308 "Andrew Lavin")

* [Striving for Simplicity: The All Convolutional Net](http://arxiv.org/abs/1412.6806 "Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller")

* [Very Deep Convolutional Networks for Large-Scale Image Recognition](http://arxiv.org/abs/1409.1556 "Karen Simonyan, Andrew Zisserman")

* [Very Deep Multilingual Convolutional Neural Networks for LVCSR](http://arxiv.org/abs/1509.08967 "Tom Sercu, Christian Puhrsch, Brian Kingsbury, Yann LeCun")

* [Network In Network](http://arxiv.org/abs/1312.4400 "Min Lin, Qiang Chen, Shuicheng Yan")

* [Going Deeper with Convolutions](http://arxiv.org/abs/1409.4842 "Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich") ([github.com/google/inception](https://github.com/google/inception))

* [Convolutional Networks on Graphs for Learning Molecular Fingerprints](http://arxiv.org/abs/1509.09292 "David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, Ryan P. Adams") ([github.com/HIPS/neural-fingerprint](https://github.com/HIPS/neural-fingerprint))

* [Deep Learning for Single-View Instance Recognition](http://arxiv.org/abs/1507.08286 "David Held, Sebastian Thrun, Silvio Savarese")

* [Learning to Generate Chairs with Convolutional Neural Networks](http://arxiv.org/abs/1411.5928 "Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox") ([github.com/stokasto/caffe/tree/chairs_deconv](https://github.com/stokasto/caffe/tree/chairs_deconv))

* [Deep Convolutional Inverse Graphics Network](http://arxiv.org/abs/1503.03167 "Tejas D. Kulkarni, Will Whitney, Pushmeet Kohli, Joshua B. Tenenbaum")

* [Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks](http://arxiv.org/abs/1506.05751 "Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus")

* [Long-term Recurrent Convolutional Networks for Visual Recognition and Description](http://arxiv.org/abs/1411.4389 "Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell")

* [A Machine Learning Approach for Filtering Monte Carlo Noise](http://cvc.ucsb.edu/graphics/Papers/SIGGRAPH2015_LBF "Nima Khademi Kalantari, Steve Bako, Pradeep Sen")

* [Image Super-Resolution Using Deep Convolutional Networks](http://arxiv.org/abs/1501.00092 "Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang")

* [Learning to Deblur](http://arxiv.org/abs/1406.7444 "Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Schölkopf")

* [Monocular Object Instance Segmentation and Depth Ordering with CNNs](http://arxiv.org/abs/1505.03159 "Ziyu Zhang, Alexander G. Schwing, Sanja Fidler, Raquel Urtasun")

* [FlowNet: Learning Optical Flow with Convolutional Networks](http://arxiv.org/abs/1504.06852 "Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox")

* [DeepStereo: Learning to Predict New Views from the World's Imagery](http://arxiv.org/abs/1506.06825 "John Flynn, Ivan Neulander, James Philbin, Noah Snavely")

* [Deep convolutional filter banks for texture recognition and segmentation](http://arxiv.org/abs/1411.6836 "Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi")

* [FaceNet: A Unified Embedding for Face Recognition and Clustering](http://arxiv.org/abs/1503.03832 "Florian Schroff, Dmitry Kalenichenko, James Philbin") ([github.com/cmusatyalab/openface](https://github.com/cmusatyalab/openface))

* [DeepFace: Closing the Gap to Human-Level Performance in Face Verification](https://research.facebook.com/publications/480567225376225/deepface-closing-the-gap-to-human-level-performance-in-face-verification/ "Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf")

* [Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network](http://arxiv.org/abs/1504.04658 "Andrew J.R. Simpson, Gerard Roma, Mark D. Plumbley")

* [3D ConvNets with Optical Flow Based Regularization](http://cs231n.stanford.edu/reports/kjchavez_final.pdf "Kevin Chavez")

* [DeepPose: Human Pose Estimation via Deep Neural Networks](http://arxiv.org/abs/1312.4659 "Alexander Toshev, Christian Szegedy")

* [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://arxiv.org/abs/1502.01852 "Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun")

* [Rotation-invariant convolutional neural networks for galaxy morphology prediction](http://arxiv.org/abs/1503.07077 "Sander Dieleman, Kyle W. Willett, Joni Dambre")

* [Deep Fried Convnets](http://arxiv.org/abs/1412.7149 "Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang")

* [Fractional Max-Pooling](http://arxiv.org/abs/1412.6071 "Benjamin Graham")

* [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](http://arxiv.org/abs/1312.6034 "Karen Simonyan, Andrea Vedaldi, Andrew Zisserman")

* [Learning FRAME Models Using CNN Filters for Knowledge Visualization](http://arxiv.org/abs/1509.08379v2 "Yang Lu, Song-Chun Zhu, Ying Nian Wu") ([code](http://www.stat.ucla.edu/~yang.lu/project/deepFrame/main.html))

* [Invariant backpropagation: how to train a transformation-invariant neural network](http://arxiv.org/abs/1502.04434 "Sergey Demyanov, James Bailey, Ramamohanarao Kotagiri, Christopher Leckie")

* [Recommending music on Spotify with deep learning](http://benanne.github.io/2014/08/05/spotify-cnns.html "Sander Dieleman")

* [Conv Nets: A Modular Perspective](http://colah.github.io/posts/2014-07-Conv-Nets-Modular/ "Christopher Olah")

* [Learning 3D Shape (1)](http://danfischetti.github.io/jekyll/update/2015/10/24/understanding-shape-1.html "Dan Fischetti") ([github.com/danfischetti/shape-classifier](https://github.com/danfischetti/shape-classifier))

## Recurrent Neural Networks

* [Unitary Evolution Recurrent Neural Networks](http://arxiv.org/abs/1511.06464v1 "Martin Arjovsky, Amar Shah, Yoshua Bengio")

* [Regularizing RNNs by Stabilizing Activations](http://arxiv.org/abs/1511.08400 "David Krueger, Roland Memisevic")

* [Training recurrent networks online without backtracking](http://arxiv.org/abs/1507.07680 "Yann Ollivier, Guillaume Charpiat")

* [Modeling sequential data using higher-order relational features and predictive training](http://arxiv.org/abs/1402.2333 "Vincent Michalski, Roland Memisevic, Kishore Konda") ([github.com/memisevic/grammar-cells](https://github.com/memisevic/grammar-cells))

* [Recurrent Neural Network Regularization](http://arxiv.org/abs/1409.2329 "Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals")

* [How to Construct Deep Recurrent Neural Networks](http://arxiv.org/abs/1312.6026 "Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio")

* [DAG-Recurrent Neural Networks For Scene Labeling](http://arxiv.org/abs/1509.00552 "Bing Shuai, Zhen Zuo, Gang Wang, Bing Wang")

* [Long Short-Term Memory](http://people.idsia.ch/~juergen/rnn.html "Sepp Hochreiter Jürgen Schmidhuber") (ftp://ftp.idsia.ch/pub/juergen/lstm.pdf)

  * [LSTM: A Search Space Odyssey](http://arxiv.org/abs/1503.04069 "Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, Jürgen Schmidhuber")

  * [Grid Long Short-Term Memory](http://arxiv.org/abs/1507.01526 "Nal Kalchbrenner, Ivo Danihelka, Alex Graves")

  * [Depth-Gated LSTM](http://arxiv.org/abs/1508.03790 "Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, Chris Dyer")

* [Learning Longer Memory in Recurrent Neural Networks](http://arxiv.org/abs/1412.7753 "Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato")

* [A Simple Way to Initialize Recurrent Networks of Rectified Linear Units](http://arxiv.org/abs/1504.00941 "Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton")

* [A Clockwork RNN](http://arxiv.org/abs/1402.3511 "Jan Koutník, Klaus Greff, Faustino Gomez, Jürgen Schmidhuber")

* [DRAW: A Recurrent Neural Network For Image Generation](http://arxiv.org/abs/1502.04623 "Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra")

* [Gated Feedback Recurrent Neural Networks](http://arxiv.org/abs/1502.02367 "Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio")

* [A Recurrent Latent Variable Model for Sequential Data](http://arxiv.org/abs/1506.02216 "Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio")

* [ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks](http://arxiv.org/abs/1505.00393 "Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio")

* [Translating Videos to Natural Language Using Deep Recurrent Neural Networks](http://arxiv.org/abs/1412.4729 "Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko")

* [Unsupervised Learning of Video Representations using LSTMs](http://arxiv.org/abs/1502.04681 "Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov")

* [Visualizing and Understanding Recurrent Networks](http://arxiv.org/abs/1506.02078 "Andrej Karpathy, Justin Johnson, Fei-Fei Li")

* [Advances in Optimizing Recurrent Networks](http://arxiv.org/abs/1212.0901 "Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu")

* [Learning Stochastic Recurrent Networks](http://arxiv.org/abs/1411.7610 "Justin Bayer, Christian Osendorfer")

* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/ "Christopher Olah")

* [Optimizing RNN performance](https://svail.github.io/ "Erich Elsen")

* [Mastering the Game of Go with Deep Neural Networks and Tree Search](http://www.readcube.com/articles/10.1038%2Fnature16961 "David Silver,	Aja Huang,	Chris J. Maddison,	Arthur Guez,	Laurent Sifre,	George van den Driessche,	Julian Schrittwieser,	Ioannis Antonoglou,	Veda Panneershelvam,	Marc Lanctot, Sander Dieleman,	Dominik Grewe,	John Nham,	Nal Kalchbrenner,	Ilya Sutskever,	Timothy Lillicrap,	Madeleine Leach,	Koray Kavukcuoglu,	Thore Graepel	& Demis Hassabis")

## Convolutional Recurrent Neural Networks

* [Recurrent Spatial Transformer Networks](http://arxiv.org/abs/1509.05329 "Søren Kaae Sønderby, Casper Kaae Sønderby, Lars Maaløe, Ole Winther") ([github.com/skaae/recurrent-spatial-transformer-code](https://github.com/skaae/recurrent-spatial-transformer-code))

* [Recurrent Models of Visual Attention](http://arxiv.org/abs/1406.6247 "Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu")

  * [Multiple Object Recognition with Visual Attention](http://arxiv.org/abs/1412.7755 "Jimmy Ba, Volodymyr Mnih, Koray Kavukcuoglu")

* [Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting](http://arxiv.org/abs/1506.04214 "Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, Wang-chun Woo")

* [Describing Multimedia Content using Attention-based Encoder--Decoder Networks](http://arxiv.org/abs/1507.01053 "Kyunghyun Cho, Aaron Courville, Yoshua Bengio")

## Adversarial Neural Networks

* [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](http://arxiv.org/abs/1511.06434 "Alec Radford, Luke Metz, Soumith Chintala")

* [Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks](http://arxiv.org/abs/1511.06390 "Jost Tobias Springenberg")

* [Adversarial Autoencoders](http://arxiv.org/abs/1511.05644v1 "Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow")

## Autoencoders

* [Correlational Neural Networks](http://arxiv.org/abs/1504.07225 "Sarath Chandar, Mitesh M. Khapra, Hugo Larochelle, Balaraman Ravindran")

* [Optimizing Neural Networks that Generate Images](http://www.cs.toronto.edu/~tijmen/tijmen_thesis.pdf "Tijmen Tieleman") ([github.com/mrkulk/Unsupervised-Capsule-Network](https://github.com/mrkulk/Unsupervised-Capsule-Network))

* [Auto-Encoding Variational Bayes](http://arxiv.org/abs/1312.6114 "Diederik P Kingma, Max Welling")

* [Analyzing noise in autoencoders and deep networks](http://arxiv.org/abs/1406.1831 "Ben Poole, Jascha Sohl-Dickstein, Surya Ganguli")

* [MADE: Masked Autoencoder for Distribution Estimation](http://arxiv.org/abs/1502.03509 "Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle") ([github.com/mgermain/MADE](https://github.com/mgermain/MADE))

* [Winner-Take-All Autoencoders](http://arxiv.org/abs/1409.2752 "Alireza Makhzani, Brendan Frey") ([github.com/stephenbalaban/convnet](https://github.com/stephenbalaban/convnet))

* [k-Sparse Autoencoders](http://arxiv.org/abs/1312.5663 "Alireza Makhzani, Brendan Frey") ([github.com/stephenbalaban/convnet](https://github.com/stephenbalaban/convnet))

* [Zero-bias autoencoders and the benefits of co-adapting features](http://arxiv.org/abs/1402.3337 "Kishore Konda, Roland Memisevic, David Krueger")

* [Importance Weighted Autoencoders](http://arxiv.org/abs/1509.00519 "Yuri Burda, Roger Grosse, Ruslan Salakhutdinov") ([github.com/yburda/iwae](https://github.com/yburda/iwae))

* [Generalized Denoising Auto-Encoders as Generative Models](http://arxiv.org/abs/1305.6663 "Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent")

* [Marginalized Denoising Auto-encoders for Nonlinear Representations](http://www.cse.wustl.edu/~mchen/papers/deepmsda.pdf "Minmin Chen, Kilian Weinberger, Fei Sha, Yoshua Bengio")

  * [Marginalized Denoising Autoencoders for Domain Adaptation](http://arxiv.org/abs/1206.4683 "Minmin Chen, Zhixiang Xu, Kilian Weinberger, Fei Sha")

* [Real-time Hebbian Learning from Autoencoder Features for Control Tasks](http://mitpress.mit.edu/sites/default/files/titles/content/alife14/ch034.html "Justin K. Pugh, Andrea Soltoggio, Kenneth O. Stanley")

* [Procedural Modeling Using Autoencoder Networks](http://www.meyumer.com/pm_autoencoder.html "Mehmet Ersin Yumer, Paul Asente, Radomir Mech, Levent Burak Kara") ([pdf](http://www.meyumer.com/pdfs/PmAutoencoder.pdf)) ([youtu.be/wl3h4S1g2u4](http://youtu.be/wl3h4S1g2u4))

* [Is Joint Training Better for Deep Auto-Encoders?](http://arxiv.org/abs/1405.1380 "Yingbo Zhou, Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju")

* [Towards universal neural nets: Gibbs machines and ACE](http://arxiv.org/abs/1508.06585 "Galin Georgiev")

* [Transforming Auto-encoders](http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf "G. E. Hinton, A. Krizhevsky, S. D. Wang")

* [Discovering Hidden Factors of Variation in Deep Networks](http://arxiv.org/abs/1412.6583 "Brian Cheung, Jesse A. Livezey, Arjun K. Bansal, Bruno A. Olshausen")

## Restricted Boltzmann Machines

* [The wake-sleep algorithm for unsupervised neural networks](https://www.cs.toronto.edu/~hinton/absps/ws.pdf "Geoffrey E Hinton, Peter Dayan, Brendan J Frey, Radford M Neals")

  * [A simple algorithm that discovers efficient perceptual codes](https://www.cs.toronto.edu/~hinton/absps/percepts.pdf "Brendan J. Frey, Peter Dayan, Geoffrey E. Hinton")

  * [Reweighted Wake-Sleep](http://arxiv.org/abs/1406.2751 "Jörg Bornschein, Yoshua Bengio")

* [An Infinite Restricted Boltzmann Machine](http://arxiv.org/abs/1502.02476 "Marc-Alexandre Côté, Hugo Larochelle")

* [Quantum Inspired Training for Boltzmann Machines](http://arxiv.org/abs/1507.02642 "Nathan Wiebe, Ashish Kapoor, Christopher Granade, Krysta M Svore")

* [Training Bidirectional Helmholtz Machines](http://arxiv.org/abs/1506.03877 "Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio")

## Biologically Plausible Learning

* [How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation](http://arxiv.org/abs/1407.7906 "Yoshua Bengio")

  * [Difference Target Propagation](http://arxiv.org/abs/1412.7525 "Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, Antoine Biard, Yoshua Bengio")

  * [Towards Biologically Plausible Deep Learning](http://arxiv.org/abs/1502.04156 "Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Zhouhan Lin")

* [How Important is Weight Symmetry in Backpropagation?](http://arxiv.org/abs/1510.05067 "Qianli Liao, Joel Z. Leibo, Tomaso Poggio") 

* [Random feedback weights support learning in deep neural networks](http://arxiv.org/abs/1411.0247 "Timothy P. Lillicrap, Daniel Cownden, Douglas B. Tweed, Colin J. Akerman")

## Supervised Learning

* [Fast Label Embeddings via Randomized Linear Algebra](http://arxiv.org/abs/1412.6547 "Paul Mineiro, Nikos Karampatziakis")

  * [Fast Label Embeddings for Extremely Large Output Spaces](http://arxiv.org/abs/1503.08873 "Paul Mineiro, Nikos Karampatziakis")

* [Locally Non-linear Embeddings for Extreme Multi-label Learning](http://arxiv.org/abs/1507.02743 "Kush Bhatia, Himanshu Jain, Purushottam Kar, Prateek Jain, Manik Varma")

* [Efficient and Parsimonious Agnostic Active Learning](http://arxiv.org/abs/1506.08669 "Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire")

## Unsupervised Learning

* [Towards Principled Unsupervised Learning](http://arxiv.org/abs/1511.06440 "Ilya Sutskever, Rafal Jozefowicz, Karol Gregor, Danilo Rezende, Tim Lillicrap, Oriol Vinyals")

* [Index-learning of unsupervised low dimensional embedding](http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/graham/indexlearning.pdf "Ben Graham")

* [An Analysis of Unsupervised Pre-training in Light of Recent Advances](http://arxiv.org/abs/1412.6597 "Tom Le Paine, Pooya Khorrami, Wei Han, Thomas S. Huang") ([github.com/ifp-uiuc/an-analysis-of-unsupervised-pre-training-iclr-2015](https://github.com/ifp-uiuc/an-analysis-of-unsupervised-pre-training-iclr-2015))

* [Is Joint Training Better for Deep Auto-Encoders?](http://arxiv.org/abs/1405.1380 "Yingbo Zhou, Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju")

* [Unsupervised Feature Learning from Temporal Data](http://arxiv.org/abs/1504.02518 "Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun")

* [Learning to Linearize Under Uncertainty](http://arxiv.org/abs/1506.03011 "Ross Goroshin, Michael Mathieu, Yann LeCun")

* [Semi-Supervised Learning with Ladder Network](http://arxiv.org/abs/1507.02672 "Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, Tapani Raiko") ([github.com/arasmus/ladder](https://github.com/arasmus/ladder))

  * [Denoising autoencoder with modulated lateral connections learns invariant representations of natural images](http://arxiv.org/abs/1412.7210 "Antti Rasmus, Tapani Raiko, Harri Valpola")

  * [Lateral Connections in Denoising Autoencoders Support Supervised Learning](http://arxiv.org/abs/1504.08215 "Antti Rasmus, Harri Valpola, Tapani Raiko")

* [Semi-Supervised Learning with Deep Generative Models](http://arxiv.org/abs/1406.5298 "Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling")

* [Rectified Factor Networks](http://arxiv.org/abs/1502.06464 "Djork-Arné Clevert, Andreas Mayr, Thomas Unterthiner, Sepp Hochreiter")

* [An Analysis of Single-Layer Networks in Unsupervised Feature Learning](http://ai.stanford.edu/~ang/papers/nipsdlufl10-AnalysisSingleLayerUnsupervisedFeatureLearning.pdf "Adam Coates, Honglak Lee, Andrew Y. Ng")

* [Deep Unsupervised Learning using Nonequilibrium Thermodynamics](http://arxiv.org/abs/1503.03585 "Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli")

* [On-the-Fly Learning in a Perpetual Learning Machine](http://arxiv.org/abs/1509.00913 "Andrew J.R. Simpson")

## Reinforcement Learning

* [Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning](http://arxiv.org/abs/1509.08731 "Shakir Mohamed, Danilo Jimenez Rezende")

* [Prioritized Experience Replay](http://arxiv.org/abs/1511.05952 "Tom Schaul, John Quan, Ioannis Antonoglou, David Silver")

* [Human-level control through deep reinforcement learning](http://rdcu.be/cdlg "Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei Rusu, Joel Veness, Marc Bellemare, Alex Graves, Martin Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis") ([sites.google.com/a/deepmind.com/dqn](https://sites.google.com/a/deepmind.com/dqn))

* [Playing Atari with Deep Reinforcement Learning](http://arxiv.org/abs/1312.5602 "Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller")

* [Universal Value Function Approximators](http://jmlr.org/proceedings/papers/v37/schaul15.html "Tom Schaul, Daniel Horgan, Karol Gregor, David Silver")

* [Giraffe: Using Deep Reinforcement Learning to Play Chess](http://arxiv.org/abs/1509.01549 "Matthew Lai") ([bitbucket.org/waterreaction/giraffe](https://bitbucket.org/waterreaction/giraffe))

## Theory

* [The Human Kernel](http://arxiv.org/abs/1510.07389 "Andrew Gordon Wilson, Christoph Dann, Christopher G. Lucas, Eric P. Xing")

* [Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex](http://arxiv.org/abs/1511.00083 "Jeff Hawkins, Subutai Ahmad")

* [Deep Manifold Traversal: Changing Labels with Convolutional Features](http://arxiv.org/abs/1511.06421 "Jacob R. Gardner, Matt J. Kusner, Yixuan Li, Paul Upchurch, Kilian Q. Weinberger, John E. Hopcroft")

* [On the Expressive Power of Deep Learning: A Tensor Analysis](http://arxiv.org/abs/1509.05009 "Nadav Cohen, Or Sharir, Amnon Shashua")

* [ℓ1-regularized Neural Networks are Improperly Learnable in Polynomial Time](http://arxiv.org/abs/1510.03528 "Yuchen Zhang, Jason D. Lee, Michael I. Jordan")

* [Provable approximation properties for deep neural networks](http://arxiv.org/abs/1509.07385 "Uri Shaham, Alexander Cloninger, Ronald R. Coifman")

* [Steps Toward Deep Kernel Methods from Infinite Neural Networks](http://arxiv.org/abs/1508.05133 "Tamir Hazan, Tommi Jaakkola")

* [On the Number of Linear Regions of Deep Neural Networks](http://arxiv.org/abs/1402.1869 "Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio")

* [On the saddle point problem for non-convex optimization](http://arxiv.org/abs/1405.4604 "Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio")

  * [Identifying and attacking the saddle point problem in high-dimensional non-convex optimization](http://arxiv.org/abs/1406.2572 "Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio")

* [The Loss Surfaces of Multilayer Networks](http://arxiv.org/abs/1412.0233 "Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun")

* [Statistical mechanics of complex neural systems and high dimensional data](http://arxiv.org/abs/1301.7115 "Madhu Advani, Subhaneil Lahiri, Surya Ganguli")

* [Qualitatively characterizing neural network optimization problems](http://arxiv.org/abs/1412.6544 "Ian J. Goodfellow, Oriol Vinyals, Andrew M. Saxe")

* [An average-case depth hierarchy theorem for Boolean circuits](http://arxiv.org/abs/1504.03398 "Benjamin Rossman, Rocco A. Servedio, Li-Yang Tan")

* [An exact mapping between the Variational Renormalization Group and Deep Learning](http://arxiv.org/abs/1410.3831 "Pankaj Mehta, David J. Schwab")

* [Why does Deep Learning work? - A perspective from Group Theory](http://arxiv.org/abs/1412.6621 "Arnab Paul, Suresh Venkatasubramanian")

* [A Group Theoretic Perspective on Unsupervised Deep Learning](http://arxiv.org/abs/1504.02462 "Arnab Paul, Suresh Venkatasubramanian")

* [Exact solutions to the nonlinear dynamics of learning in deep linear neural networks](http://arxiv.org/abs/1312.6120 "Andrew M. Saxe, James L. McClelland, Surya Ganguli")

* [On the Stability of Deep Networks](http://arxiv.org/abs/1412.5896 "Raja Giryes, Guillermo Sapiro, Alex M. Bronstein")

* [Over-Sampling in a Deep Neural Network](http://arxiv.org/abs/1502.03648 "Andrew J.R. Simpson")

* [A theoretical argument for complex-valued convolutional networks](http://arxiv.org/abs/1503.03438 "Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, Arthur Szlam, Mark Tygert")

* [Spectral Representations for Convolutional Neural Networks](http://arxiv.org/abs/1506.03767 "Oren Rippel, Jasper Snoek, Ryan P. Adams")

* [A Probabilistic Theory of Deep Learning](http://arxiv.org/abs/1504.00641 "Ankit B. Patel, Tan Nguyen, Richard G. Baraniuk")

* [Deep Convolutional Networks on Graph-Structured Data](http://arxiv.org/abs/1506.05163 "Mikael Henaff, Joan Bruna, Yann LeCun") ([github.com/mbhenaff/spectral-lib](https://github.com/mbhenaff/spectral-lib))

* [Learning with Group Invariant Features: A Kernel Perspective](http://arxiv.org/abs/1506.02544 "Youssef Mroueh, Stephen Voinea, Tomaso Poggio")

* [Randomized algorithms for matrices and data](http://arxiv.org/abs/1104.5557 "Michael W. Mahoney")

* [Calculus on Computational Graphs: Backpropagation](http://colah.github.io/posts/2015-08-Backprop/ "Christopher Olah")

* [Understanding Convolutions](http://colah.github.io/posts/2014-07-Understanding-Convolutions/ "Christopher Olah")

* [Groups & Group Convolutions](http://colah.github.io/posts/2014-12-Groups-Convolution/ "Christopher Olah")

* [Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ "Christopher Olah")

* [Neural Networks, Types, and Functional Programming](http://colah.github.io/posts/2015-09-NN-Types-FP/ "Christopher Olah")

* [Causal Entropic Forces](http://www.alexwg.org/publications/PhysRevLett_110-168702.pdf "A. D. Wissner-Gross, C. E. Freer")

* [On the Computability of AIXI](http://arxiv.org/abs/1510.05572 "Jan Leike, Marcus Hutter")

* [Physics, Topology, Logic and Computation: A Rosetta Stone](http://arxiv.org/abs/0903.0340 "John C. Baez, Mike Stay")

## Quantum Computing

* [Analyzing Big Data with Dynamic Quantum Clustering](http://arxiv.org/abs/1310.2700 "M. Weinstein, F. Meirer, A. Hume, Ph. Sciau, G. Shaked, R. Hofstetter, E. Persi, A. Mehta, D. Horn")

* [Quantum algorithms for supervised and unsupervised machine learning](http://arxiv.org/abs/1307.0411 "Seth Lloyd, Masoud Mohseni, Patrick Rebentrost")

* [Entanglement-Based Machine Learning on a Quantum Computer](http://arxiv.org/abs/1409.7770 "X.-D. Cai, D. Wu, Z.-E. Su, M.-C. Chen, X.-L. Wang, L. Li, N.-L. Liu, C.-Y. Lu, J.-W. Pan")

* [A quantum speedup in machine learning: Finding a N-bit Boolean function for a classification](http://arxiv.org/abs/1303.6055 "Seokwon Yoo, Jeongho Bang, Changhyoup Lee, Jinhyoung Lee")

* [Application of Quantum Annealing to Training of Deep Neural Networks](http://arxiv.org/abs/1510.06356 "Steven H. Adachi, Maxwell P. Henderson")

* [Quantum Deep Learning](http://arxiv.org/abs/1412.3489 "Nathan Wiebe, Ashish Kapoor, Krysta M. Svore")

* [Experimental Realization of Quantum Artificial Intelligence](http://arxiv.org/abs/1410.1054 "Li Zhaokai, Liu Xiaomei, Xu Nanyang, Du jiangfeng")

## Training Innovations

* [Adding Gradient Noise Improves Learning for Very Deep Networks](http://arxiv.org/abs/1511.06807 "Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens")

* [Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)](http://arxiv.org/abs/1511.07289 "Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter")

* [Net2Net: Accelerating Learning via Knowledge Transfer](http://arxiv.org/abs/1511.05641 "Tianqi Chen, Ian Goodfellow, Jonathon Shlens")

* [Learning the Architecture of Deep Neural Networks](http://arxiv.org/abs/1511.05497 "Suraj Srinivas, R. Venkatesh Babu")

* [GradNets: Dynamic Interpolation Between Neural Architectures](http://arxiv.org/abs/1511.06827 "Diogo Almeida, Nate Sauder")

* [Reducing the Training Time of Neural Networks by Partitioning](http://arxiv.org/abs/1511.02954 "Conrado S. Miranda, Fernando J. Von Zuben")

* [The Effects of Hyperparameters on SGD Training of Neural Networks](http://arxiv.org/abs/1508.02788 "Thomas M. Breuel")

* [Gradient-based Hyperparameter Optimization through Reversible Learning](http://arxiv.org/abs/1502.03492 "Dougal Maclaurin, David Duvenaud, Ryan P. Adams") ([github.com/HIPS/hypergrad](https://github.com/HIPS/hypergrad))

* [Learning Ordered Representations with Nested Dropout](http://arxiv.org/abs/1402.0915 "Oren Rippel, Michael A. Gelbart, Ryan P. Adams")

* [Learning Compact Convolutional Neural Networks with Nested Dropout](http://arxiv.org/abs/1412.7155 "Chelsea Finn, Lisa Anne Hendricks, Trevor Darrell")

* [Reducing Overfitting in Deep Networks by Decorrelating Representations](http://arxiv.org/abs/1511.06068 "Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra")

* [Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets](http://arxiv.org/abs/1412.7091 "Pascal Vincent, Alexandre de Brébisson, Xavier Bouthillier")

* [Efficient Per-Example Gradient Computations](http://arxiv.org/abs/1510.01799 "Ian Goodfellow")

* [Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks](http://arxiv.org/abs/1506.03099 "Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer")

* [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://arxiv.org/abs/1502.03167 "Sergey Ioffe, Christian Szegedy")

  * [Batch Normalized Recurrent Neural Networks](http://arxiv.org/abs/1510.01378 "César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio")

* [Highway Networks](http://arxiv.org/abs/1505.00387 "Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber")

  * [Training Very Deep Networks](http://arxiv.org/abs/1507.06228 "Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber")

* [Random Walk Initialization for Training Very Deep Feedforward Networks](http://arxiv.org/abs/1412.6558 "David Sussillo, L.F. Abbott")

* [Deeply-Supervised Nets](http://arxiv.org/abs/1409.5185 "Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu")

* [Improving neural networks by preventing co-adaptation of feature detectors](http://arxiv.org/abs/1207.0580 "Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov")

  * [Efficient batchwise dropout training using submatrices](http://arxiv.org/abs/1502.02478 "Ben Graham, Jeremy Reizenstein, Leigh Robinson") ([github.com/btgraham/Batchwise-Dropout](https://github.com/btgraham/Batchwise-Dropout))

  * [Dropout Training for Support Vector Machines](http://arxiv.org/abs/1404.4171 "Ning Chen, Jun Zhu, Jianfei Chen, Bo Zhang")

  * [Dropout as data augmentation](http://arxiv.org/abs/1506.08700 "Kishore Konda, Xavier Bouthillier, Roland Memisevic, Pascal Vincent")

  * [Partitioning Large Scale Deep Belief Networks Using Dropout](http://arxiv.org/abs/1508.07096 "Yanping Huang, Sai Zhang")

* [Maxout Networks](http://arxiv.org/abs/1302.4389 "Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio")

* [Regularization of Neural Networks using DropConnect](http://jmlr.org/proceedings/papers/v28/wan13.html "Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, Rob Fergus")

* [Distilling the Knowledge in a Neural Network](http://arxiv.org/abs/1503.02531 "Geoffrey Hinton, Oriol Vinyals, Jeff Dean")

* [Domain-Adversarial Neural Networks](http://arxiv.org/abs/1412.4446 "Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand")

* [Weight Uncertainty in Neural Networks](http://arxiv.org/abs/1505.05424 "Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra")

* [Notes on Noise Contrastive Estimation and Negative Sampling](http://arxiv.org/abs/1410.8251 "Chris Dyer")

* [Scale-invariant learning and convolutional networks](http://arxiv.org/abs/1506.08230 "Soumith Chintala, Marc'Aurelio Ranzato, Arthur Szlam, Yuandong Tian, Mark Tygert, Wojciech Zaremba")

* [Empirical Evaluation of Rectified Activations in Convolutional Network](http://arxiv.org/abs/1505.00853 "Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li")

* [Deep Boosting](http://jmlr.org/proceedings/papers/v32/cortesb14.html "Corinna Cortes, Mehryar Mohri, Umar Syed") ([github.com/google/deepboost](https://github.com/google/deepboost))

* [No Regret Bound for Extreme Bandits](http://arxiv.org/abs/1508.02933 "Robert Nishihara, David Lopez-Paz, Léon Bottou")

## Parallel Training

* [Scalable Distributed DNN Training Using Commodity GPU Cloud Computing](https://drive.google.com/file/d/0B6dKRGPLFSd0UGNOYkNaSC1UZTA/view "Nikko Strom")

* [AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization](http://arxiv.org/abs/1508.05003 "Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola")

* [Large Scale Distributed Deep Networks](http://research.google.com/archive/large_deep_networks_nips2012.html "Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Andrew Y. Ng")

* [HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent](http://arxiv.org/abs/1106.5730 "Feng Niu, Benjamin Recht, Christopher Re, Stephen J. Wright")

## Weight Compression

* [Tensorizing Neural Networks](http://arxiv.org/abs/1509.06569 "Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, Dmitry Vetrov") ([github.com/Bihaqo/TensorNet](https://github.com/Bihaqo/TensorNet)) ([github.com/vadimkantorov/tensornet.torch](https://github.com/vadimkantorov/tensornet.torch))

  * [Tensorizing Neural Networks presentation](http://matrix.inm.ras.ru/presentations-2015/Novikov.pdf "Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, Dmitry Vetrov")

  * [Tensor-Train Decomposition](http://epubs.siam.org/doi/abs/10.1137/090752286 "I. V. Oseledets") ([pdf](http://spring.inm.ras.ru/osel/wp-content/plugins/wp-publications-archive/openfile.php?action=open&file=28)) ([github.com/oseledets/TT-Toolbox](https://github.com/oseledets/TT-Toolbox))

  * [Spectral tensor-train decomposition](http://arxiv.org/abs/1405.5713 "Daniele Bigoni, Allan P. Engsig-Karup, Youssef M. Marzouk")

* [Structured Transforms for Small-Footprint Deep Learning](http://arxiv.org/abs/1510.01722 "Vikas Sindhwani, Tara N. Sainath, Sanjiv Kumar")

* [An exploration of parameter redundancy in deep networks with circulant projections](http://arxiv.org/abs/1502.03436 "Yu Cheng, Felix X. Yu, Rogerio S. Feris, Sanjiv Kumar, Alok Choudhary, Shih-Fu Chang")

* [A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding](http://arxiv.org/abs/1510.00149 "Song Han, Huizi Mao, William J. Dally")

* [Learning both Weights and Connections for Efficient Neural Networks](http://arxiv.org/abs/1506.02626 "Song Han, Jeff Pool, John Tran, William J. Dally")

* [Compressing Neural Networks with the Hashing Trick](http://arxiv.org/abs/1504.04788 "Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, Yixin Chen")

* [Flattened Convolutional Neural Networks for Feedforward Acceleration](http://arxiv.org/abs/1412.5474 "Jonghoon Jin, Aysegul Dundar, Eugenio Culurciello") ([github.com/jhjin/flattened-cnn](https://github.com/jhjin/flattened-cnn))

* [Predicting Parameters in Deep Learning](http://arxiv.org/abs/1306.0543 "Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas")

## Numerical Precision

* [Neural Networks with Few Multiplications](http://arxiv.org/abs/1510.03009 "Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio")

* [Deep Learning with Limited Numerical Precision](http://arxiv.org/abs/1502.02551 "Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, Pritish Narayanan")

* [Low precision storage for deep learning](http://arxiv.org/abs/1412.7024 "Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David")

* [1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs](http://research.microsoft.com/apps/pubs/?id=230137 "Frank Seide, Hao Fu, Jasha Droppo, Gang Li, Dong Yu")

## Numerical Optimization

* [Recursive Decomposition for Nonconvex Optimization](http://homes.cs.washington.edu/~pedrod/papers/ijcai15.pdf "Abram L. Friesen, Pedro Domingos") ([github.com/afriesen/rdis](https://github.com/afriesen/rdis))

  * [Recursive Decomposition for Nonconvex Optimization: Supplementary Material](http://homes.cs.washington.edu/~afriesen/papers/ijcai2015sp.pdf "Abram L. Friesen, Pedro Domingos")

* [Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods](http://arxiv.org/abs/1506.08473 "Majid Janzamin, Hanie Sedghi, Anima Anandkumar")

* [Adapting Resilient Propagation for Deep Learning](http://arxiv.org/abs/1509.04612 "Alan Mosca, George D. Magoulas")

* [Accelerating Stochastic Gradient Descent via Online Learning to Sample](http://arxiv.org/abs/1506.09016 "Guillaume Bouchard, Théo Trouillon, Julien Perez, Adrien Gaidon")

* [Preconditioned Spectral Descent for Deep Learning](http://infoscience.epfl.ch/record/211011/files/nips_spectral.pdf "David Carlson; Edo Collins; Ya-Ping Hsieh; Lawrence Carin; Volkan Cevher")

  * [Preconditioned Spectral Descent for Deep Learning: Supplemental Material](http://infoscience.epfl.ch/record/211011/files/nips_spectral_supplement.pdf "David Carlson; Edo Collins; Ya-Ping Hsieh; Lawrence Carin; Volkan Cevher")

* [Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks](http://arxiv.org/abs/1502.05336 "José Miguel Hernández-Lobato, Ryan P. Adams")

* [Beyond Convexity: Stochastic Quasi-Convex Optimization](http://arxiv.org/abs/1507.02030 "Elad Hazan, Kfir Y. Levy, Shai Shalev-Shwartz")

* [Graphical Newton](http://arxiv.org/abs/1508.00952 "Akshay Srinivasan, Emanuel Todorov")

* [Gradient Estimation Using Stochastic Computation Graphs](http://arxiv.org/abs/1506.05254 "John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel")

* [Equilibrated adaptive learning rates for non-convex optimization](http://arxiv.org/abs/1502.04390 "Yann N. Dauphin, Harm de Vries, Yoshua Bengio")

* [Path-SGD: Path-Normalized Optimization in Deep Neural Networks](http://arxiv.org/abs/1506.02617 "Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro")

* [Deep learning via Hessian-free optimization](http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf "James Martens")

* [On the importance of initialization and momentum in deep learning](http://jmlr.org/proceedings/papers/v28/sutskever13.pdf "Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton")

* [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf "John Duchi, Elad Hazan, Yoram Singer")

* [ADADELTA: An Adaptive Learning Rate Method](http://arxiv.org/abs/1212.5701 "Matthew D. Zeiler")

* [ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient](http://arxiv.org/abs/1412.7419 "Caglar Gulcehre, Yoshua Bengio")

* [Adam: A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980 "Diederik Kingma, Jimmy Ba")

* [A sufficient and necessary condition for global optimization](http://www.sciencedirect.com/science/article/pii/S0893965909002869 "Dong-Hua Wu, Wu-Yang Yu, Quan Zheng")

  * [A Level-Value Estimation Method and Stochastic Implementation for Global Optimization](http://link.springer.com/article/10.1007/s10957-012-0151-1 "Zheng Peng, Donghua Wu, Quan Zheng")

* [Unit Tests for Stochastic Optimization](http://arxiv.org/abs/1312.6055 "Tom Schaul, Ioannis Antonoglou, David Silver")

* [A* Sampling](http://arxiv.org/abs/1411.0030 "Chris J. Maddison, Daniel Tarlow, Tom Minka")

* [Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems](http://arxiv.org/abs/1505.05114 "Yuxin Chen, Emmanuel J. Candes")

* [When Are Nonconvex Problems Not Scary?](http://arxiv.org/abs/1510.06096 "Ju Sun, Qing Qu, John Wright")

* [Automatic differentiation in machine learning: a survey](http://arxiv.org/abs/1502.05767 "Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind")

## Motion Planning

* [Interactive Control of Diverse Complex Characters with Neural Networks](http://www.eecs.berkeley.edu/~igor.mordatch/policy/index.html "Igor Mordatch, Kendall Lowrey, Galen Andrew, Zoran Popović, Emanuel Todorov") ([video](http://www.eecs.berkeley.edu/~igor.mordatch/policy/video.mov))

* [Continuous control with deep reinforcement learning](http://arxiv.org/abs/1509.02971 "Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra")

* [Continuous Character Control with Low-Dimensional Embeddings](https://graphics.stanford.edu/projects/ccclde/ "Sergey Levine,	Jack M. Wang,	Alexis Haraux,	Zoran Popović,	Vladlen Koltun")

* [Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours](http://arxiv.org/abs/1509.06825 "Lerrel Pinto, Abhinav Gupta") ([youtu.be/oSqHc0nLkm8](http://youtu.be/oSqHc0nLkm8))

* [End-to-End Training of Deep Visuomotor Policies](http://arxiv.org/abs/1504.00702 "Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel") ([youtu.be/Q4bMcUk6pcw](http://youtu.be/Q4bMcUk6pcw))

* [Deep Spatial Autoencoders for Visuomotor Learning](http://rll.berkeley.edu/dsae/ "Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel")([youtu.be/TsPpoxKST2A](http://youtu.be/TsPpoxKST2A))

* [From Pixels to Torques: Policy Learning with Deep Dynamical Models](http://arxiv.org/abs/1502.02251 "Niklas Wahlström, Thomas B. Schön, Marc Peter Deisenroth, John-Alexander M. Assael") ([thesis](http://www.doc.ic.ac.uk/~mpd37/theses/DeepConvDynModels_JohnAssael2015.pdf)) ([github.com/iassael/torch-ddcnn](https://github.com/iassael/torch-ddcnn))

* [Sampling-based Algorithms for Optimal Motion Planning](http://arxiv.org/abs/1105.1186 "Sertac Karaman, Emilio Frazzoli") ([youtu.be/r34XWEZ41HA](http://youtu.be/r34XWEZ41HA))

  * [Informed RRT*: Optimal Sampling-based Path Planning Focused via Direct Sampling of an Admissible Ellipsoidal Heuristic](http://arxiv.org/abs/1404.2334 "Jonathan D. Gammell, Siddhartha S. Srinivasa, Timothy D. Barfoot") ([youtu.be/nsl-5MZfwu4](http://youtu.be/nsl-5MZfwu4))

  * [Batch Informed Trees (BIT*): Sampling-based Optimal Planning via the Heuristically Guided Search of Implicit Random Geometric Graphs](http://arxiv.org/abs/1405.5848 "Jonathan D. Gammell, Siddhartha S. Srinivasa, Timothy D. Barfoot") ([youtu.be/TQIoCC48gp4](http://youtu.be/TQIoCC48gp4)) ([github.com/utiasASRL/batch-informed-trees](https://github.com/utiasASRL/batch-informed-trees))

* [Planning biped locomotion using motion capture data and probabilistic roadmaps](http://dl.acm.org/citation.cfm?id=636889 "Min Gyu Choi, Jehee Lee, Sung Yong Shin")  ([youtu.be/cKrcjrdnD-M](http://youtu.be/cKrcjrdnD-M))

* [Stability of Surface Contacts for Humanoid Robots: Closed-Form Formulae of the Contact Wrench Cone for Rectangular Support Areas](http://arxiv.org/abs/1501.04719 "Stéphane Caron, Quang-Cuong Pham, Yoshihiko Nakamura") ([github.com/Tastalian/surface-contacts-icra-2015](https://github.com/Tastalian/surface-contacts-icra-2015))

## Simulation

* [Data-Driven Fluid Simulations using Regression Forests](http://people.inf.ethz.ch/sjeong/physicsforests/index.html "Ľubor Ladický, SoHyeon Jeong, Barbara Solenthaler, Marc Pollefeys, Markus Gross") ([vimeo.com/144267433](https://vimeo.com/144267433)) ([vimeo.com/144266101](https://vimeo.com/144266101))

## Hardware

* [Towards Trainable Media: Using Waves for Neural Network-Style Training](http://arxiv.org/abs/1510.03776 "Michiel Hermans, Thomas Van Vaerenbergh")

* [Random Projections through multiple optical scattering: Approximating kernels at the speed of light](http://arxiv.org/abs/1510.06664 "Alaa Saade, Francesco Caltagirone, Igor Carron, Laurent Daudet, Angélique Drémeau, Sylvain Gigan, Florent Krzakala")

* [VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing](http://arxiv.org/abs/1509.08972 "Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, Warren J. Gross")

* [Training and operation of an integrated neuromorphic network based on metal-oxide memristors](http://www.nature.com/nature/journal/v521/n7550/full/nature14441.html "M. Prezioso,	F. Merrikh-Bayat,	B. D. Hoskins,	G. C. Adam,	K. K. Likharev, D. B. Strukov")

* [AHaH Computing–From Metastable Switches to Attractors to Machine Learning](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0085175 "Michael Alexander Nugent, Timothy Wesley Molter")

* [Finding a roadmap to achieve large neuromorphic hardware systems](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3767911/ "Jennifer Hasler, Bo Marr")

## Cognitive Architectures

* [A Large-Scale Model of the Functioning Brain](http://www.sciencemag.org/content/338/6111/1202.full?ijkey=y5vph.jw5AgRQ&keytype=ref&siteid=sci "Chris Eliasmith, Terrence C. Stewart, Xuan Choo, Trevor Bekolay, Travis DeWolf, Yichuan Tang, Daniel Rasmussen")

  * [How to Build a Brain: A Neural Architecture for Biological Cognition](http://www.amazon.com/How-Build-Brain-Architecture-Architectures/dp/0199794545 "Chris Eliasmith")

* [Derivation of a novel efficient supervised learning algorithm from cortical-subcortical loops](http://journal.frontiersin.org/article/10.3389/fncom.2011.00050/full "Ashok Chandrashekar, Richard Granger")

* [A Minimal Architecture for General Cognition](http://arxiv.org/abs/1508.00019 "Michael S. Gashler, Zachariah Kindle, Michael R. Smith") ([github.com/mikegashler/manic](https://github.com/mikegashler/manic))

## Computational Creativity

* [Inceptionism: Going Deeper into Neural Networks](http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html "Alexander Mordvintsev, Christopher Olah, Mike Tyka")

  * [DeepDream - a code example for visualizing Neural Networks](http://googleresearch.blogspot.com/2015/07/deepdream-code-example-for-visualizing.html "Alexander Mordvintsev, Christopher Olah, Mike Tyka") ([github.com/google/deepdream](https://github.com/google/deepdream))

* [A Neural Algorithm of Artistic Style](http://arxiv.org/abs/1508.06576 "Leon A. Gatys, Alexander S. Ecker, Matthias Bethge")

* [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/ "Andrej Karpathy") ([github.com/karpathy/char-rnn](https://github.com/karpathy/char-rnn))

* [GRUV: Algorithmic Music Generation using Recurrent Neural Networks](http://cs224d.stanford.edu/reports/NayebiAran.pdf "Aran Nayebi, Matt Vitelli") ([github.com/MattVitelli/GRUV](https://github.com/MattVitelli/GRUV))

* [Composing Music With Recurrent Neural Networks](http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/ "Daniel Johnson") ([github.com/hexahedria/biaxial-rnn-music-composition](https://github.com/hexahedria/biaxial-rnn-music-composition))

## Cryptography

* [Crypto-Nets: Neural Networks over Encrypted Data](http://arxiv.org/abs/1412.6181 "Pengtao Xie, Misha Bilenko, Tom Finley, Ran Gilad-Bachrach, Kristin Lauter, Michael Naehrig")

## Distributed Computing

* [Dimension Independent Similarity Computation](http://arxiv.org/abs/1206.2082 "Reza Bosagh Zadeh, Ashish Goel")

  * [Dimension Independent Matrix Square using MapReduce](http://arxiv.org/abs/1304.1467 "Reza Bosagh Zadeh, Gunnar Carlsson")

  * [All-pairs similarity via DIMSUM](https://blog.twitter.com/2014/all-pairs-similarity-via-dimsum "Reza Zadeh")

* [A Fast, Minimal Memory, Consistent Hash Algorithm](http://arxiv.org/abs/1406.2294 "John Lamping, Eric Veach")

## Clustering

* [Convolutional Clustering for Unsupervised Learning](http://arxiv.org/abs/1511.06241 "Aysegul Dundar, Jonghoon Jin, Eugenio Culurciello")

* [Deep clustering: Discriminative embeddings for segmentation and separation](http://arxiv.org/abs/1508.04306 "John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe")

* [Clustering is Easy When ....What?](http://arxiv.org/abs/1510.05336 "Shai Ben-David")

* [Clustering by fast search and find of density peaks](https://dl.dropboxusercontent.com/u/182368464/2014-rodriguez.pdf "Alex Rodriguez, Alessandro Laio")
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/robertsdionne/neural-network-papers

Awesome Lists containing this project

README