{"id":102820,"url":"https://github.com/mlpapers/neural-nets","name":"neural-nets","description":"Awesome papers on Neural Networks and Deep Learning","projects_count":77,"last_synced_at":"2026-06-18T18:00:30.122Z","repository":{"id":301067498,"uuid":"253095429","full_name":"mlpapers/neural-nets","owner":"mlpapers","description":"Awesome papers on Neural Networks and Deep Learning","archived":false,"fork":false,"pushed_at":"2026-02-14T21:38:08.000Z","size":16,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-06-02T02:03:08.096Z","etag":null,"topics":["activation-functions","autoencoders","awesome","awesome-list","bayesian-neural-networks","bnn","deep-learning","gan","lstm","neural-networks","perceptron"],"latest_commit_sha":null,"homepage":"https://mlpapers.org/neural-nets/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlpapers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-04-04T20:45:19.000Z","updated_at":"2026-02-14T21:38:11.000Z","dependencies_parsed_at":"2026-03-02T12:01:06.853Z","dependency_job_id":null,"html_url":"https://github.com/mlpapers/neural-nets","commit_stats":null,"previous_names":["mlpapers/neural-nets"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mlpapers/neural-nets","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Fneural-nets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Fneural-nets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Fneural-nets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Fneural-nets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlpapers","download_url":"https://codeload.github.com/mlpapers/neural-nets/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Fneural-nets/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34501482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"created_at":"2026-01-02T00:00:41.082Z","updated_at":"2026-06-18T18:00:30.123Z","primary_language":null,"list_of_lists":false,"displayable":true,"categories":["Uncategorized","Related Topics"],"sub_categories":["Uncategorized"],"readme":"# Neural networks, deep learning papers\n\n- **Overview**\n  - [Deep Learning](http://www.deeplearningbook.org/) (2016) *Ian Goodfellow, Yoshua Bengio, Aaron Courville*\n  - [Deep Learning in Neural Networks: An Overview](https://arxiv.org/pdf/1404.7828.pdf) (2014) *Jurgen Schmidhuber*\n\n### Feedforward Neural Networks (FNN)\n  - [The perceptron: a probabilistic model for information storage and organization in the brain](https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf) (1958) *F. Rosenblatt*\n  - [Multilayer Feedforward Networks are Universal Approximators](http://cognitivemedium.com/magic_paper/assets/Hornik.pdf) (1989) *K. Hornik*\n  - [Deep Big Simple Neural Nets Excel on Hand-written Digit Recognition](https://arxiv.org/pdf/1003.0358.pdf) (2010) *Dan Claudiu Cireşan, Ueli Meier, Luca Maria Gambardella, Jürgen Schmidhuber*\n- **GMDH** Group method of data handling ([Website](http://gmdh.net/), [Wiki](https://en.wikipedia.org/wiki/Group_method_of_data_handling))\n  - [Polynomial Theory of Complex Systems](http://www.gmdh.net/articles/history/polynomial.pdf) (1971) *Ivakhnenko A.G.*\n  - [The Review of Problems Solvable by Algorithms of the Group Method of Data Handling](http://gmdh.net/articles/review/algorith.pdf) (1995) *Ivakhnenko A.G., Ivakhnenko G.A.*\n- **Binarized Neural Networks**\n  - [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830) (2016) *Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio*\n  - [How to Train a Compact Binary Neural Network with High Accuracy?](https://www.ganghua.org/publication/AAAI17.pdf) (2017) *Wei Tang, Gang Hua, Liang Wang*\n\n### Convolutional Neural Networks (CNN)\n  - One of the papers on convolutional nets - [Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position] (https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) (1980) *K. Fukushima*\n  - [A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects](https://arxiv.org/pdf/2004.02806.pdf) *Zewen Li, Wenjie Yang, Shouheng Peng, Fan Liu*\n  - [Flexible, High Performance ConvolutionalNeural Networks for Image Classification](http://people.idsia.ch/~juergen/ijcai2011.pdf) (2011) *Dan C. Ciresan, Ueli Meier, Jonathan Masci, Luca M. Gambardella, Jurgen Schmidhube*\n\n### Recurrent Neural Networks (RNN)\n- **Boltzmann machines**\n  - [Learning and relearning in Boltzmann machines](https://www.researchgate.net/publication/242509302_Learning_and_relearning_in_Boltzmann_machines) (1986) *G. E. Hinton, T. J. Sejnowski*\n- **LSTM**\n  - [Long Short-term Memory](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory) (1997) *S. Hochreiter, J. Schmidhuber*\n  - [Framewise Phoneme Classification withBidirectional LSTM and Other Neural NetworkArchitectures](https://www.cs.toronto.edu/~graves/nn_2005.pdf) (2005) *Alex Graves, Jurgen Schmidhuber*\n\n### Unsupervised\n- **Competitive learning**\n  - [Feature Discovery by Competitive Learning](http://csjarchive.cogsci.rpi.edu/1985v09/i01/p0075p0112/MAIN.PDF) (1985) *David E. Rumelhart*\n- **Autoencoders**\n  - [Modular learning in neural networks](https://www.aaai.org/Papers/AAAI/1987/AAAI87-050.pdf) (1987) *D.H. Ballard*\n  - [Extracting and composing robust features with denoising autoencoders](https://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf) (2008) *P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol*\n  - From Deep Learning book - [Autoencoders (ch. 14)](http://www.deeplearningbook.org/contents/autoencoders.html) (2016) *Ian Goodfellow, Yoshua Bengio, Aaron Courville*\n  - [An Introduction to Variational Autoencoders](https://arxiv.org/pdf/1906.02691.pdf) (2019) *Diederik P. Kingma, Max Welling*\n  - [Contractive Auto-Encoders: Explicit Invariance During Feature Extraction](https://www.iro.umontreal.ca/~lisa/pointeurs/ICML2011_explicit_invariance.pdf) (2011) *S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio*\n  - [Deep AutoRegressive Networks](https://arxiv.org/pdf/1310.8499.pdf) (2014) *Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, Daan Wierstra*\n- **Denoising Autoencoders**\n- **VAE** Variational autoencoders\n  - [Auto-Encoding Variational Bayes](https://arxiv.org/pdf/1312.6114.pdf) (2014) *Diederik P Kingma, Max Welling*\n  - [Tutorial on Variational Autoencoders](https://arxiv.org/pdf/1606.05908.pdf) (2016) *Carl Doersch*\n  - [Variational Autoencoder for Deep Learning of Images, Labels and Captions](https://papers.nips.cc/paper/6528-variational-autoencoder-for-deep-learning-of-images-labels-and-captions.pdf) (2016) *Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, Lawrence Carin*\n- **SOM** Self-organizing maps\n- **Cresceptron** (Max-Pooling layers)\n  - [Cresceptron: A Self-organizing  Neural Network Which Grows Adaptively](http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf) (1992) *John (Juyang) Weng, Narendra Ahuja, Thomas S. Huang*\n\n### Generative Adversarial Networks (GAN)\n- [Generative Adversarial Networks](https://arxiv.org/pdf/1406.2661v1.pdf) (2014) *Ian J. Goodfellow,  Jean Pouget-Abadie∗, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair†, Aaron Courville, Yoshua Bengio*\n- [Time-series Generative Adversarial Networks](https://papers.nips.cc/paper/8789-time-series-generative-adversarial-networks.pdf) (2019) *J. Yoon, D. Jarrett, M. van der Schaar*\n- **Conditional GAN**\n  - [Probabilistic Forecasting of Sensory Data with Generative Adversarial Networks](https://arxiv.org/abs/1903.12549) (2019) *A. Koochali, P. Schichtel, S. Ahmed, A. Dengel*\n\n### Bayesian Neural Networks (BNN)\n- [A Practical Bayesian Framework for Backpropagation Networks](https://authors.library.caltech.edu/13793/1/MACnc92b.pdf) (1992) *David J. C. MacKay*\n- [Bayesian Learning for Neural Networks](http://www.csri.utoronto.ca/~radford/ftp/thesis.pdf) (1995) *R.M. Neal*\n- [Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks](https://pdfs.semanticscholar.org/3ce9/da2d2182a2fbc4b460bdb56d3c34110b3e39.pdf) (1995) *David J. C. MacKay*\n- [Practical Variational Inference for Neural Networks](https://www.cs.toronto.edu/~graves/nips_2011.pdf) (2011) *Alex Graves*\n- [Weight Uncertainty in Neural Networks](https://arxiv.org/pdf/1505.05424.pdf) (2015) *Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra*\n- [Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning](https://arxiv.org/pdf/1506.02142.pdf) (2016) *Y. Gal, Z. Ghahramani*\n- [Stochastic Gradient Descent as Approximate Bayesian Inference](http://www.cs.columbia.edu/~blei/papers/MandtHoffmanBlei2017.pdf) (2017) *S. Mandt, M.D. Hoffman, D.M. Blei*\n- [Deep neural networks as Gaussian Processes](https://arxiv.org/pdf/1711.00165.pdf) (2018) *Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein*\n- [Noisy Natural Gradient as Variational Inference](https://arxiv.org/pdf/1712.02390.pdf) (2018) *Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse*\n- [Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam](https://arxiv.org/abs/1806.04854) (2018) *Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava*\n- [Understanding Priors in Bayesian Neural Networks at the Unit Level](https://arxiv.org/pdf/1810.05193.pdf) (2019) *Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, Julyan Arbel*\n- [Bayesian Deep Learning and a Probabilistic Perspective of Generalization](https://arxiv.org/pdf/2002.08791.pdf) (2020) *Andrew Gordon Wilson, Pavel Izmailov*\n\n### Weightless Neural Networks (WNN)\n- Based on Random Access Memory (RAM) nodes\n- [Advances in Weightless Neural Systems](https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-7.pdf) (2014) *F.M.G. França, M. De Gregorio, P.M.V. Lima, W.R. de Oliveira*\n- **WiSARD** \n- **PLN** Probabilistic Logic Nodes\n- **GSN** Goal Seeking Neurons\n- **GRAM** \n\n### Activation functions\n- **Sigmoid**\n- **HardSigmoid**\n- **SiLU, dSiLU**\n- **Tanh, HardTanh**\n- **Softmax**\n- **Softplus**\n- **Softsign**\n- **ReLU** Rectified Linear Unit\n  - [Rectified Linear Units Improve Restricted Boltzmann Machines](https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf) (2010) *V. Nair, G.E. Hinton*\n  - [Deep Sparse Rectifier Neural Networks](http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf) (2011) *X. Glorot, A. Bordes, Y. Bengio*\n- **LReLU** Leaky ReLU\n  - [Rectifier Nonlinearities Improve Neural Network Acoustic Models](https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf) (2013) *A.L. Maas, A.Y. Hannun, A.Y. Ng*\n- **PReLU** Parametric ReLU\n  - [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](https://arxiv.org/pdf/1502.01852.pdf) (2015) *Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun*\n- **RReLU** Randomized ReLU\n  - [Empirical Evaluation of Rectified Activations in Convolutional Network](https://arxiv.org/pdf/1505.00853.pdf) (2015) *Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li*\n- **SReLU** \n- **ELU**\n  - [Fast and Accurate Deep Network Learning by Exponential Linear Units](https://arxiv.org/pdf/1511.07289.pdf) (2015) *Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter*\n- **PELU**\n  - [Parametric Exponential Linear Unit forDeep Convolutional Neural Network](https://arxiv.org/pdf/1605.09332v1.pdf) (2016) *L. Trottier, P. Giguère, B. Chaib-draa*\n- **SELU**\n- **Maxout**\n- **Mish**\n  - [Mish: A Self Regularized Non-Monotonic Neural Activation Function](https://arxiv.org/pdf/1908.08681.pdf) (2019) *Diganta Misra*\n- **Swish**\n- **ELiSH**\n- **HardELiSH**\n\n### Inference\n- **Weight guessing**\n- **Vanishing gradient problem** ([Wiki](https://en.wikipedia.org/wiki/Vanishing_gradient_problem))\n- **Double descent**\n  - [Deep Double Descent: Where Bigger Models and More Data Hurt](https://arxiv.org/pdf/1912.02292) (2019) *Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever*\n- **BP** Back-propagation\n  - [Learning representations by back-propagating errors](http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf) (1986) *David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams*\n  - [Backpropagation Applied to Handwritten Zip Code Recognition](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf) (1989) *Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel*\n- **Pruning** - reduces computational cost, improves generalization\n  - [Optimal Brain Damage](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf) (1990) *Yann Le Cun, John S. Denker, Sara A. Solla*\n  - [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/pdf/1506.02626.pdf) (2015) *Song Han, Jeff Pool, John Tran, William J. Dally*\n  - [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/pdf/1611.06440.pdf) (2017) *Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz*\n  - [Learning Sparse Neural Networks through L0 Regularization](https://arxiv.org/pdf/1712.01312.pdf) (2018) *Christos Louizos, Max Welling, Diederik P. Kingma*\n- **Pretraining**\n  - [Why Does Unsupervised Pre-training Help Deep Learning?](http://jmlr.csail.mit.edu/papers/volume11/erhan10a/erhan10a.pdf) (2010) *D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, S. Bengio*\n- **Dropout**\n  - [Improving neural networks by preventing co-adaptation of feature detectors](https://arxiv.org/pdf/1207.0580.pdf) (2012) *G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov*\n  - [Adaptive dropout for training deep neural networks](https://papers.nips.cc/paper/5032-adaptive-dropout-for-training-deep-neural-networks.pdf) (2013) *L.J. Ba, B. Frey*\n  - [The Dropout Learning Algorithm](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3996711/pdf/nihms-570835.pdf) (2014) *P. Baldi, P. Sadowski*\n  - [Fast dropout training](https://nlp.stanford.edu/pubs/sidaw13fast.pdf) (2013) *S.I. Wang, C.D. Manning*\n\n### Compression\n- **Knowledge Distillation**\n  - Large neural networks (teacher networks) transfer knowledge to smaller networks (called student networks)\n- **Neural Network Pruning**\n  - Removing unimportant weights\n- **Quantization**\n  - Reducing the number of bits used to store the weights\n- **Software**\n  - [KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization](https://arxiv.org/pdf/2011.14691.pdf) (2020) *Het Shah, Avishree Khare, Neelay Shah, Khizir Siddiqui*\n\n### Ensembles\n- [Neural Network Ensembles](http://machine-learning.martinsewell.com/ensembles/HansenSalamon1990.pdf) (1990) *L. K. Hansen, P. Salamon*\n- [When Networks Disagree: Ensemble Methods for Hybrid Neural Networks](https://www.researchgate.net/publication/2438296_When_Networks_Disagree_Ensemble_Methods_for_Hybrid_Neural_Networks) (1993) *M.P. Perrone, L.N. Cooper*\n- [Neural Network Ensembles, Cross Validation, and Active Learning](https://papers.nips.cc/paper/1001-neural-network-ensembles-cross-validation-and-active-learning.pdf) (1995) *A. Krogh, J. Vedelsby*\n- [When Ensembling Smaller Models is More Efficient than SingleLarge Models](https://arxiv.org/pdf/2005.00570.pdf) (2020) *D. Kondratyuk, M. Tan, M. Brown, B. Gong*\n\n\n## Related Topics\n- [Optimization](https://mlpapers.org/optimization/)\n- [Ensemble Learning](https://mlpapers.org/ensemble-learning/)\n- [Bayesian Inference](https://mlpapers.org/bayesian-inference/)\n- [Reinforcement Learning](https://mlpapers.org/reinforcement-learning/)\n- [Interpretability](https://mlpapers.org/interpretability/)\n- [NLP](https://mlpapers.org/nlp/)\n- [Computer Vision](https://mlpapers.org/computer-vision/)\n- [Generative Models](https://mlpapers.org/generative-models/)\n- [Graph Neural Networks](https://mlpapers.org/graph-neural-networks/)\n","projects_url":"https://awesome.ecosyste.ms/api/v1/lists/mlpapers%2Fneural-nets/projects"}