{"id":17312944,"url":"https://github.com/digantamisra98/library","last_synced_at":"2026-01-06T12:44:22.398Z","repository":{"id":131277936,"uuid":"278243795","full_name":"digantamisra98/Library","owner":"digantamisra98","description":"Paper reading list","archived":false,"fork":false,"pushed_at":"2020-10-05T15:47:07.000Z","size":48,"stargazers_count":16,"open_issues_count":0,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-01T07:16:22.765Z","etag":null,"topics":["abstract-algebra","computer-vision","continual-learning","deep-learning","machine-learning","neural-networks","nonlinear-dynamics","optimization","theory"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/digantamisra98.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-09T02:35:01.000Z","updated_at":"2025-01-10T00:06:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"950efe9c-bc02-48b4-83f7-08fc22f5baf0","html_url":"https://github.com/digantamisra98/Library","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digantamisra98%2FLibrary","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digantamisra98%2FLibrary/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digantamisra98%2FLibrary/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digantamisra98%2FLibrary/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/digantamisra98","download_url":"https://codeload.github.com/digantamisra98/Library/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245762635,"owners_count":20668117,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abstract-algebra","computer-vision","continual-learning","deep-learning","machine-learning","neural-networks","nonlinear-dynamics","optimization","theory"],"created_at":"2024-10-15T12:45:10.622Z","updated_at":"2026-01-06T12:44:22.197Z","avatar_url":"https://github.com/digantamisra98.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Library\n\n### Deep Learning: \n\n- [LCA: Loss Change Allocation for Neural Network Training](https://arxiv.org/abs/1909.01440)\n- [Asymptotics of Wide Networks from Feynman Diagrams](https://arxiv.org/abs/1909.11304)\n- [Neural networks and physical systems with emergent collective computational abilities](https://www.pnas.org/content/79/8/2554)\n- [Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes](https://arxiv.org/abs/1903.08778)\n- [Adversarial Robustness Through Local Lipschitzness](https://arxiv.org/abs/2003.02460)\n- [Lagrangian Neural Networks](https://arxiv.org/abs/2003.04630)\n- [Inherent Weight Normalization in Stochastic Neural Networks](https://openreview.net/forum?id=H1xDPEBx8r)\n- [Neural Arithmetic Units](https://openreview.net/forum?id=H1gNOeHKPS)\n- [Information Theory, Inference and Learning Algorithms](https://books.google.co.in/books/about/Information_Theory_Inference_and_Learnin.html?id=AKuMj4PN_EMC\u0026printsec=frontcover\u0026source=kp_read_button\u0026redir_esc=y#v=onepage\u0026q\u0026f=false)\n- [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199)\n- [An Effective and Efficient Initialization Scheme for Training Multi-layer Feedforward Neural Networks](https://arxiv.org/abs/2005.08027)\n- [Rigging the Lottery: Making All Tickets Winners](https://arxiv.org/abs/1911.11134)\n\n#### Mean Field Theory/ EOC/ Dynamic Isometry:\n\n- [Deep Information Propagation](https://arxiv.org/abs/1611.01232)\n- [Exponential expressivity in deep neural networks through transient chaos](https://arxiv.org/abs/1606.05340)\n- [Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice](https://arxiv.org/abs/1711.04735)\n- [Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function](https://arxiv.org/pdf/1809.08848v3.pdf)\n- [Mean Field Residual Networks: On the Edge of Chaos](https://arxiv.org/abs/1712.08969)\n- [Mean Field Theory of Activation Functions in Deep Neural Networks](https://arxiv.org/abs/1805.08786)\n- [Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks](https://arxiv.org/abs/1806.05393)\n- [On the Impact of the Activation Function on Deep Neural Networks Training](https://arxiv.org/abs/1902.06853)\n- [Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks](https://arxiv.org/abs/1806.05394)\n- [Disentangling trainability and generalization in deep learning](https://arxiv.org/abs/1912.13053)\n- [A Mean Field View of the Landscape of Two-Layers Neural Networks](https://arxiv.org/abs/1804.06561)\n- [A Mean Field Theory of Batch Normalization](https://openreview.net/forum?id=SyMDXnCcF7)\n- [Statistical field theory for neural networks](https://arxiv.org/abs/1901.10416)\n- [A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth](https://arxiv.org/abs/2003.05508)\n- [Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods](https://openreview.net/forum?id=H1gza2NtwH)\n\n#### Optimization/ Line Search/ Wolfe's Theorem:\n\n- [Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration](https://arxiv.org/pdf/1807.06766.pdf)\n- [Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence](https://arxiv.org/pdf/2002.10542.pdf)\n- [Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates](https://arxiv.org/pdf/1905.09997.pdf)\n- [Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak- Lojasiewicz Condition](https://arxiv.org/pdf/1608.04636.pdf)\n- [On the distance between two neural networks and the stability of learning](https://arxiv.org/abs/2002.03432)\n- [The large learning rate phase of deep learning: the catapult mechanism](https://arxiv.org/pdf/2003.02218.pdf)\n- [Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates](https://papers.nips.cc/paper/8630-painless-stochastic-gradient-interpolation-line-search-and-convergence-rates.pdf)\n- [A Fine-Grained Spectral Perspective on Neural Networks](https://arxiv.org/abs/1907.10599)\n- [The Geometry of Sign Gradient Descent](https://arxiv.org/abs/2002.08056)\n- [The Break-Even Point on Optimization Trajectories of Deep Neural Networks](https://arxiv.org/abs/2002.09572)\n- [Quasi-hyperbolic momentum and Adam for deep learning](https://openreview.net/forum?id=S1fUpoR5FQ)\n- [A new regret analysis for Adam-type algorithms](http://arxiv-export-lb.library.cornell.edu/abs/2003.09729)\n- [Disentangling Adaptive Gradient Methods from Learning Rates](https://arxiv.org/abs/2002.11803)\n- [Stochastic Flows and Geometric Optimization on the Orthogonal Group](https://arxiv.org/abs/2003.13563)\n- [Adaptive Multi-level Hyper-gradient Descent](https://arxiv.org/abs/2008.07277)\n\n##### Bonus:\n\n- [Convex optimization: Gradient Methods and Online Learning](https://sites.cs.ucsb.edu/~yuxiangw/classes/CS292A-2019Spring/)\n\n#### Non-Linear Dynamics: \n\n- [Regularizing activations in neural networks via distribution matching with the Wasserstein metric](https://openreview.net/pdf?id=rygwLgrYPB)\n- [Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem](https://arxiv.org/abs/1912.04378)\n- [Effect of Activation Functions on the Training of Overparametrized Neural Nets](https://arxiv.org/abs/1908.05660v4)\n- [Implicit Neural Representations with Periodic Activation Functions](https://arxiv.org/abs/2006.09661)\n- [Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem](https://arxiv.org/pdf/1812.05720.pdf)\n- [Small nonlinearities in activation functions create bad local minima in neural networks](https://openreview.net/forum?id=rke_YiRct7)\n- [Tempered Sigmoid Activations for Deep Learning with Differential Privacy](https://arxiv.org/abs/2007.14191)\n- [Neural Networks Fail to Learn Periodic Functions and How to Fix It](https://arxiv.org/abs/2006.08195)\n\n### Computer Vision:\n\n- [Making Convolutional Networks Shift-Invariant Again](https://arxiv.org/abs/1904.11486)\n- [GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing](https://arxiv.org/abs/1908.03245)\n- [Butterfly Transform: An Efficient FFT Based Neural Architecture Design](https://arxiv.org/abs/1906.02256)\n- [ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network](https://arxiv.org/pdf/2007.00992.pdf)\n- [Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains](https://arxiv.org/abs/2006.10739)\n- [Learning One Convolutional Layer with Overlapping Patches](https://arxiv.org/abs/1802.02547)\n- [Batch-Shaping for Learning Conditional Channel Gated Networks](https://arxiv.org/abs/1907.06627)\n- [Convolutional Networks with Adaptive Inference Graphs](https://arxiv.org/abs/1711.11503)\n- [The Singular Values of Convolutional Layers](https://openreview.net/forum?id=rJevYoA9Fm)\n- [Rendering Natural Camera Bokeh Effect with Deep Learning](https://arxiv.org/abs/2006.05698)\n- [Towards Learning Convolutions from Scratch](https://arxiv.org/abs/2007.13657)\n- [Feature Products Yield Efficient Networks](https://arxiv.org/abs/2008.07930)\n\n### Incremental Learning/ Continual Learning/ Lifelong Learning:\n\n- [Conditional Channel Gated Networks for Task-Aware Continual Learning](https://arxiv.org/abs/2004.00070)\n- [Supermasks in Superposition](https://arxiv.org/pdf/2006.14769.pdf)\n\n### Mathematics (Mostly Abstract Algebra/ Topology/ Statistical Mechanics): \n\n- [Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning](https://www.cis.upenn.edu/~jean/math-deep.pdf)\n- [ALGEBRA](https://solisinvicti.com/books/TheOlympiad/Books/AlgebraArtin.pdf)\n- [Contemporary Abstract Algebra](https://people.clas.ufl.edu/cmcyr/files/Abstract-Algebra-Text_Gallian-e8.pdf)\n- [Statistical Mechanics of Deep Learning](https://www.annualreviews.org/doi/full/10.1146/annurev-conmatphys-031119-050745)\n- [Linear Algebra](http://joshua.smcvt.edu/linearalgebra/)\n- [Linear Algebra Done Right](https://link.springer.com/book/10.1007/978-3-319-11080-6)\n\n\n### Immediate:\n\n- [A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/abs/2002.05709)\n- [Self-supervised Label Augmentation via Input Transformations](https://arxiv.org/abs/1910.05872)\n- [On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them](https://arxiv.org/abs/2006.08403)\n- [Structured Convolutions for Efficient Neural Network Design](https://deepai.org/publication/structured-convolutions-for-efficient-neural-network-design)\n- [Tensor Programs III: Neural Matrix Laws](https://arxiv.org/abs/2009.10685)\n- [An Investigation into Neural Net Optimization via Hessian Eigenvalue Density](https://arxiv.org/abs/1901.10159)\n- [The Hardware Lottery](https://arxiv.org/abs/2009.06489)\n- [Tensor Programs II: Neural Tangent Kernel for Any Architecture](https://arxiv.org/abs/2006.14548)\n- [PareCO: Pareto-aware Channel Optimization for Slimmable Neural Networks](https://arxiv.org/abs/2007.11752)\n- [Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators](https://arxiv.org/abs/2006.11469)\n- [Hypersolvers: Toward Fast Continuous-Depth Models](https://arxiv.org/abs/2007.09601)\n- [Residual Feature Distillation Network for Lightweight Image Super-Resolution](https://arxiv.org/abs/2009.11551)\n- [SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness](https://arxiv.org/abs/2009.10195)\n- [HyperNetworks](https://arxiv.org/abs/1609.09106)\n- [Understanding the Role of Individual Units in a Deep Neural Network](https://arxiv.org/abs/2009.05041)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigantamisra98%2Flibrary","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdigantamisra98%2Flibrary","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigantamisra98%2Flibrary/lists"}