Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/harsh306/awesome-nn-optimization
Awesome list for Neural Network Optimization methods.
https://github.com/harsh306/awesome-nn-optimization
List: awesome-nn-optimization
awesome awesome-list bifurcation continuation convergence-analysis convex-optimization curriculum-learning deep-learning dynamical-systems generalization local-minima loss-surface neural-network non-convex-optimization optimization
Last synced: about 1 month ago
JSON representation
Awesome list for Neural Network Optimization methods.
- Host: GitHub
- URL: https://github.com/harsh306/awesome-nn-optimization
- Owner: harsh306
- License: cc-by-4.0
- Created: 2019-08-04T04:28:29.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-03-10T23:23:45.000Z (9 months ago)
- Last Synced: 2024-05-21T08:34:01.248Z (7 months ago)
- Topics: awesome, awesome-list, bifurcation, continuation, convergence-analysis, convex-optimization, curriculum-learning, deep-learning, dynamical-systems, generalization, local-minima, loss-surface, neural-network, non-convex-optimization, optimization
- Homepage:
- Size: 194 KB
- Stars: 71
- Watchers: 3
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-of-awesome-ml - awesome-nn-optimization (by harsh306)
- ultimate-awesome - awesome-nn-optimization - Awesome list for Neural Network Optimization methods. (Other Lists / PowerShell Lists)
README
## Content
#### Popular Optimization algorithms
- SGD [[Book]](https://www.deeplearningbook.org/contents/optimization.html)
- Momentum [[Book]](https://www.deeplearningbook.org/contents/optimization.html)
- RMSProp [[Book]](https://www.deeplearningbook.org/contents/optimization.html)
- AdaGrad [[Link]](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
- ADAM [[Link]](https://arxiv.org/abs/1412.6980)
- AdaBound [[Link]](https://arxiv.org/abs/1902.09843) [[Github]](https://github.com/Luolc/AdaBound)
- ADAMAX [[Link]](https://arxiv.org/abs/1412.6980)
- NADAM [[Link]](https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ)
- ADAMW [[Link]](https://openreview.net/forum?id=rk6qdGgCZ)
- AdaLOMO [Link](https://arxiv.org/pdf/2310.10195.pdf)
- All optimizers list [Awesome-Optimizer](https://github.com/zoq/Awesome-Optimizer)#### Normalization Methods
- BatchNorm [[Link]](https://arxiv.org/abs/1502.03167)
- Weight Norm [[Link]](http://papers.nips.cc/paper/6113-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks)
- Spectral Norm [[Link]](https://arxiv.org/abs/1802.05957)
- Cosine Normalization [[Link]](https://arxiv.org/pdf/1702.05870.pdf)
- L2 Regularization versus Batch and Weight Normalization [Link](https://arxiv.org/pdf/1706.05350.pdf)
- WHY GRADIENT CLIPPING ACCELERATES TRAINING: A THEORETICAL JUSTIFICATION FOR ADAPTIVITY [Link](https://openreview.net/pdf?id=BJgnXpVYwS)#### On Convexity and Generalization of Neural Networks
- Convex Neural Networks [[Link]](http://papers.nips.cc/paper/2800-convex-neural-networks.pdf)
- Breaking the Curse of Dimensionality with Convex Neural Networks [[Link]](http://jmlr.org/papers/volume18/14-546/14-546.pdf)
- UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION [[Link]](https://arxiv.org/pdf/1611.03530.pdf)
- Optimal Control Via Neural Networks: A Convex Approach. [[Link]](https://openreview.net/forum?id=H1MW72AcK7)
- Input Convex Neural Networks [[Link]](https://arxiv.org/pdf/1609.07152.pdf)
- A New Concept of Convex based Multiple Neural Networks Structure. [[Link](http://www.ifaamas.org/Proceedings/aamas2019/pdfs/p1306.pdf)
- SGD Converges to Global Minimum in Deep Learning via Star-convex Path [[Link]](https://arxiv.org/abs/1901.00451)
- A Convergence Theory for Deep Learning via Over-Parameterization [Link](https://arxiv.org/abs/1811.03962)#### Continuation Methods and Curriculum Learning
- Curriculum Learning [[Link]](https://ronan.collobert.com/pub/matos/2009_curriculum_icml.pdf)
- SOLVING RUBIK’S CUBE WITH A ROBOT HAND [Link](https://arxiv.org/pdf/1910.07113.pdf)
- Noisy Activation Function [[Link]](http://proceedings.mlr.press/v48/gulcehre16.pdf)
- Mollifying Networks [[Link]](https://arxiv.org/abs/1608.04980)
- Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks [Link](https://arxiv.org/pdf/1802.03796.pdf) [Talk](https://vimeo.com/287808087)
- Automated Curriculum Learning for Neural Networks [Link](http://proceedings.mlr.press/v70/graves17a/graves17a.pdf)
- On The Power of Curriculum Learning in Training Deep Networks [Link](https://arxiv.org/pdf/1904.03626.pdf)
- On-line Adaptative Curriculum Learning for GANs [Link](https://arxiv.org/abs/1808.00020)
- Parameter Continuation with Secant Approximation for Deep Neural Networks and Step-up GAN [Link](https://digitalcommons.wpi.edu/etd-theses/1256/)
- HashNet: Deep Learning to Hash by Continuation. [[Link]](https://arxiv.org/abs/1702.00758)
- Learning Combinations of Activation Functions. [[Link]](https://arxiv.org/pdf/1801.09403.pdf)
- Learning and development in neural networks: The importance of starting small (1993) [Link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.128.4487&rep=rep1&type=pdf)
- Flexible shaping: How learning in small steps helps [Link](https://www.sciencedirect.com/science/article/pii/S0010027708002850)
- Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning [Link](https://arxiv.org/pdf/2001.06001.pdf)
- RETHINKING CURRICULUM LEARNING WITH INCREMENTAL LABELS AND ADAPTIVE COMPENSATION [Link](https://arxiv.org/pdf/2001.04529.pdf)
- Parameter Continuation Methods for the Optimization of Deep Neural Networks [Link](https://ieeexplore.ieee.org/abstract/document/8999318)
- Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection [Link (https://www.aclweb.org/anthology/W18-6314.pdf)
- Reinforcement Learning based Curriculum Optimization for Neural Machine Translation [Link](https://www.aclweb.org/anthology/N19-1208.pdf)
- EVOLUTIONARY POPULATION CURRICULUM FOR SCALING MULTI-AGENT REINFORCEMENT LEARNING [Link](https://openreview.net/pdf?id=SJxbHkrKDH)
- ENTROPY-SGD: BIASING GRADIENT DESCENT INTO WIDE VALLEYS [Link](https://arxiv.org/pdf/1611.01838.pdf)
- NEIGHBOURHOOD DISTILLATION: ON THE BENEFITS OF NON END-TO-END DISTILLATION [Link](https://arxiv.org/abs/2010.01189)
- LEARNING TO EXECUTE [Link](https://arxiv.org/pdf/1410.4615.pdf)
- Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing [Link](https://arxiv.org/pdf/1903.10145.pdf)
- Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum [Link](https://proceedings.neurips.cc/paper/2019/file/926ffc0ca56636b9e73c565cf994ea5a-Paper.pdf)
- Breaking the Curse of Space Explosion: Towards Effcient NAS with Curriculum Search [Link](http://proceedings.mlr.press/v119/guo20b.html)
- Continuation Methods and Curriculum Learning for Learning to Rank [Link](http://www.dei.unipd.it/~ferro/papers/2018/CIKM2018_FLMP.pdf)#### On Loss Surfaces and Generalization of Deep Neural Networks
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks [Link](https://arxiv.org/abs/1312.6120)
- QUALITATIVELY CHARACTERIZING NEURAL NETWORK OPTIMIZATION PROBLEMS[[Link]](https://arxiv.org/pdf/1412.6544.pdf)
- The Loss Surfaces of Multilayer Networks [[Link]](https://arxiv.org/abs/1412.0233)
- Visualizing the Loss Landscape of Neural Nets [[Link]](https://papers.nips.cc/paper/7875-visualizing-the-loss-landscape-of-neural-nets.pdf)
- The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens [[Link]](https://arxiv.org/pdf/1810.07716.pdf)
- How regularization affects the critical points in linear
networks.[[Link]](http://papers.nips.cc/paper/6844-how-regularization-affects-the-critical-points-in-linear-networks.pdf)
- Local minima in training of neural networks [[Link]](https://arxiv.org/abs/1611.06310)
- Necessary and Sufficient Geometries for Gradient Methods [Link](http://papers.nips.cc/paper/9325-necessary-and-sufficient-geometries-for-gradient-methods)
- Fine-grained Optimization of Deep Neural Networks [Link](http://papers.nips.cc/paper/8425-fine-grained-optimization-of-deep-neural-networks)
- SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS [Link](https://openreview.net/pdf?id=PxTIG12RRHS)#### Dynamics, Bifurcations and RNNs difficulty to train
- Deep Equilibrium Models [Link](http://papers.nips.cc/paper/8358-deep-equilibrium-models.pdf)
- Bifurcations of Recurrent Neural Networks in Gradient Descent Learning [[Link]](https://pdfs.semanticscholar.org/b579/27b713a6f9b73c7941f99144165396483478.pdf)
- On the difficulty of training recurrent neural networks [[Link]](http://proceedings.mlr.press/v28/pascanu13.pdf)
- Understanding and Controlling Memory in Recurrent Neural Networks [[Link]](https://arxiv.org/pdf/1902.07275.pdf)
- Dynamics and Bifurcation of Neural Networks [[Link]](https://pdfs.semanticscholar.org/a413/4a36fef5ef55d0ff7dae029d6b8f55140cf7.pdf)
- Context Aware Machine Learning [[Link]](https://arxiv.org/pdf/1901.03415.pdf)
- The trade-off between long-term memory and smoothness for recurrent networks [[Link]](https://arxiv.org/pdf/1906.08482.pdf)
- Dynamical complexity and computation in recurrent neural networks beyond their fxed point [[Link]](https://www.nature.com/articles/s41598-018-21624-2.pdf)
- Bifurcations in discrete-time neural networks : controlling complex network behaviour with inputs [[Links]](https://pub.uni-bielefeld.de/record/2302580)
- Interpreting Recurrent Neural Networks Behaviour via Excitable Network Attractors [[Link]](https://link.springer.com/article/10.1007/s12559-019-09634-2#Sec11)
- Bifurcation analysis of a neural network model [Link](https://link.springer.com/article/10.1007/BF00203668)
- A Differentiable Physics Engine for Deep Learning in Robotics [Link](https://www.frontiersin.org/articles/10.3389/fnbot.2019.00006/full)
- Deep learning for universal linear embeddings
of nonlinear dynamics [Link](https://arxiv.org/pdf/1712.09707.pdf)
- Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations [Link](http://www.jmlr.org/papers/volume19/18-046/18-046.pdf)
- Analysis of gradient descent learning algorithms for multilayer feedforward neural networks [Link](https://ieeexplore.ieee.org/abstract/document/203921)
- A dynamical model for the analysis and acceleration of learning in feedforward networks [Link](https://www.sciencedirect.com/science/article/abs/pii/S0893608001000521)
- A bio-inspired bistable recurrent cell allows for long-lasting memory [Link](https://arxiv.org/abs/2006.05252)
- Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation [Link (https://www.frontiersin.org/articles/10.3389/fncom.2017.00024/full)#### Poor Local Minima? and Sharp Minima
- Adding One Neuron Can Eliminate All Bad
Local Minima [Link](https://papers.nips.cc/paper/7688-adding-one-neuron-can-eliminate-all-bad-local-minima.pdf)
- Deep Learning without Poor Local Minima [Link](https://papers.nips.cc/paper/6112-deep-learning-without-poor-local-minima.pdf)
- Elimination of All Bad Local Minima in Deep Learning [Link](https://arxiv.org/pdf/1901.00279.pdf)
- How to escape saddle points efficiently. [Link](https://arxiv.org/pdf/1703.00887.pdf)
- Depth with Nonlinearity Creates No Bad Local Minima in ResNets [Link](https://arxiv.org/abs/1810.09038)
- Sharp Minima Can Generalize For Deep Nets [Link](https://arxiv.org/pdf/1703.04933.pdf)
- Asymmetric Valleys: Beyond Sharp and Flat Local
Minima [Link](https://papers.nips.cc/paper/2019/file/01d8bae291b1e4724443375634ccfa0e-Paper.pdf)
- A Reparameterization-Invariant Flatness Measure for Deep Neural Networks [Link](https://arxiv.org/pdf/1912.00058.pdf)
- A Simple Weight Decay Can Improve Generalization [Link](https://papers.nips.cc/paper/1991/file/8eefcfdf5990e441f0fb6f3fad709e21-Paper.pdf)
- Finding Critical and Gradient-Flat Points of Deep Neural Network Loss Functions [Link](https://escholarship.org/content/qt4fw6x5b3/qt4fw6x5b3_noSplash_14ef3ae1644808c863f9b2eb344addcc.pdf?t=qhtt5i)
- The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens [Link](https://arxiv.org/pdf/1810.07716.pdf)
- Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization [Link](https://arxiv.org/pdf/1908.09375.pdf)
- Flatness is a False Friend [Link](https://arxiv.org/pdf/2006.09091.pdf)
- Are_Saddles_Good_Enough_for_Deep_Learning [Link](https://www.researchgate.net/publication/317399405_Are_Saddles_Good_Enough_for_Deep_Learning)#### Initialization of Neural Network
- Deep learning course notes [Link](https://www.deeplearning.ai/ai-notes/initialization/)
- On the importance of initialization and momentum in deep learning [Link](http://proceedings.mlr.press/v28/sutskever13.html)
- The Break-Even Point on Optimization Trajectories of Deep Neural Networks [Link](https://arxiv.org/abs/2002.09572)
- THE EARLY PHASE OF NEURAL NETWORK TRAINING [Link](https://research.fb.com/wp-content/uploads/2020/02/The-Early-Phase-of-Neural-Network-Training.pdf?)
- One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers [Link](http://papers.nips.cc/paper/8739-one-ticket-to-win-them-all-generalizing-lottery-ticket-initializations-across-datasets-and-optimizers.pdf)
- PCA-Initialized Deep Neural Networks Applied To Document Image Analysis [Link](https://arxiv.org/abs/1702.00177)
- Understanding the difficulty of training deep feedforward neural networks [Link](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf?hc_location=ufi])
- Unitary Evolution of RNNs [Link](https://arxiv.org/abs/1511.06464)#### Momentum in Optimization
- RETHINKING THE HYPERPARAMETERS FOR FINE-TUNING [Link](https://openreview.net/pdf?id=B1g8VkHFPH)
- Momentum Residual Neural Networks [Link](https://proceedings.mlr.press/v139/sander21a.html)
- Smooth momentum: improving lipschitzness in gradient descent [Link](https://doi.org/10.1007/s10489-022-04207-7)
- Momentum-based Weight Interpolation of Strong
Zero-Shot Models for Continual Learning [link](https://arxiv.org/pdf/2211.03186.pdf)#### Batch size Optimiation
- ON LARGE-BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMA[Link](https://arxiv.org/pdf/1609.04836.pdf)
- Revisiting Small Batch Training for Deep Neural Networks [Link](https://arxiv.org/abs/1804.07612)
- LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS [Link](https://arxiv.org/pdf/1708.03888.pdf)
- Large Batch Optimization for Deep Learning: Training BERT in 76 minutes [Link](https://arxiv.org/abs/1904.00962)
- DON’T DECAY THE LEARNING RATE, INCREASE THE BATCH SIZE [Link](https://arxiv.org/abs/1711.00489)#### Degeneracy of Neural Networks
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks [Link](https://arxiv.org/pdf/1312.6120.pdf)
- Avoiding pathologies in very deep networks [Link](https://arxiv.org/abs/1402.5836)
- Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice [Link](https://arxiv.org/abs/1711.04735)
- SKIP CONNECTIONS ELIMINATE SINGULARITIES [Link](https://openreview.net/pdf?id=HkwBEMWCZ)
- How degenerate is the parametrization of neural networks with the ReLU activation function? [Link](https://arxiv.org/pdf/1905.09803.pdf)
- Theory of Deep Learning III: explaining the non-overfitting puzzle [Link](https://cbmm.mit.edu/sites/default/files/publications/CBMM-Memo-073v2_0.pdf)
- Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks [Link](https://openreview.net/forum?id=rkgqN1SYvr)
- Understanding Deep Learning: Expected Spanning Dimension and Controlling the Flexibility of Neural Networks [Link](https://www.frontiersin.org/articles/10.3389/fams.2020.572539/full)
- The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens [Link](https://arxiv.org/pdf/1810.07716.pdf)
- PYHESSIAN: Neural Networks Through the Lens of the Hessian [Link](https://arxiv.org/pdf/1912.07145.pdf)#### Convergencec Analysis in Deep Learning
- A CONVERGENCE ANALYSIS OF GRADIENT DESCENT FOR DEEP LINEAR NEURAL NETWORKS [Link](https://openreview.net/pdf?id=SkMQg3C5K7)
- A Convergence Theory for Deep Learning via Over-Parameterization [Link](http://proceedings.mlr.press/v97/allen-zhu19a/allen-zhu19a.pdf)
- Convergence Analysis of Homotopy-SGD for Non-Convex Optimization [Link](https://openreview.net/forum?id=Twf5rUVeU-I)#### Multi-Task Learning with curricula
- Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning. [Link](https://www.aclweb.org/anthology/P16-1013.pdf)
- Learning a Multitask Curriculum for Neural Machine Translation. [Link](https://arxiv.org/pdf/1908.10940.pdf)
- Self-paced Curriculum Learning. [Link](http://www.cs.cmu.edu/~lujiang/camera_ready_papers/AAAI_SPCL_2015.pdf)
- Curriculum Learning of Multiple Tasks. [Link](http://openaccess.thecvf.com/content_cvpr_2015/papers/Pentina_Curriculum_Learning_of_2015_CVPR_paper.pdf)#### Constrained Optimization for Deep Learning
- A Primal-Dual Formulation for Deep Learning with Constraints [Link](https://papers.nips.cc/paper/9385-a-primal-dual-formulation-for-deep-learning-with-constraints.pdf)#### Reinforcement Learning and Curriculum
- Object-Oriented Curriculum Generation for Reinforcement Learning [Link](http://ifaamas.org/Proceedings/aamas2018/pdfs/p1026.pdf)
- Teacher-Student Curriculum Learning [Link](https://arxiv.org/abs/1707.00183)#### Tutorials, Surveys and Blogs
- Curriculum Learning: A Survey [Link](https://arxiv.org/pdf/2101.10382.pdf)
- A Comprehensive Survey on Curriculum Learning [Link](https://arxiv.org/pdf/2010.13166.pdf)
- https://www.offconvex.org/
- An overview of gradient descent optimization algorithms [[Link]](https://arxiv.org/pdf/1609.04747.pdf)
- Review of second-order optimization techniques in artificial neural networks backpropagation [Link](https://iopscience.iop.org/article/10.1088/1757-899X/495/1/012003/pdf#:~:text=Second%2Dorder%20optimization%20technique%20is,training%20phase%20of%20neural%20network.)
- Linear Algebra and data [Link](https://github.com/harsh306/ML_Notes/blob/master/linear_algebra.md)
- Why Momentum really works?[[Blog]](https://distill.pub/2017/momentum/)
- Optimization [[Book]](https://www.deeplearningbook.org/contents/optimization.html)
- Optimization for deep learning: theory and algorithms [Link](https://arxiv.org/pdf/1912.08957.pdf)
- Generalization Error in Deep Learning [Link](https://arxiv.org/pdf/1808.01174.pdf)
- Automatic Differentiation in Machine Learning: a Survey [Link](https://arxiv.org/pdf/1502.05767.pdf)
- Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey [Link](https://arxiv.org/pdf/2003.04960.pdf)
- Automatic Curriculum Learning For Deep RL: A Short Survey [Link](https://arxiv.org/abs/2003.04664)
- The Generalization Mystery: Sharp vs Flat Minima [Link](https://www.inference.vc/sharp-vs-flat-minima-are-still-a-mystery-to-me/)#### Contributing
If you've found any informative resources that you think belong here, be sure to submit a pull request or create an issue!##### If you find this helpful, I can enjoy a coffee donation :)
- [![ko-fi](https://www.ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/F1F02R7JR)
- Or send me 2-4 dollars on my venmo account [@HARSHNILESH-PATHAK](https://venmo.com/HARSHNILESH-PATHAK)