{"id":43519990,"url":"https://github.com/daturkel/learning-papers","last_synced_at":"2026-02-03T14:11:42.346Z","repository":{"id":38396317,"uuid":"225688461","full_name":"daturkel/learning-papers","owner":"daturkel","description":"Landmark Papers in Machine Learning","archived":false,"fork":false,"pushed_at":"2025-08-14T17:50:09.000Z","size":78,"stargazers_count":664,"open_issues_count":1,"forks_count":47,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-08-14T19:33:48.163Z","etag":null,"topics":["machine-learning","papers"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daturkel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-12-03T18:26:45.000Z","updated_at":"2025-08-14T17:50:12.000Z","dependencies_parsed_at":"2025-08-14T19:17:35.188Z","dependency_job_id":"225cca98-fb82-4b7f-81cf-f322921da8c6","html_url":"https://github.com/daturkel/learning-papers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/daturkel/learning-papers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daturkel%2Flearning-papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daturkel%2Flearning-papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daturkel%2Flearning-papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daturkel%2Flearning-papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daturkel","download_url":"https://codeload.github.com/daturkel/learning-papers/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daturkel%2Flearning-papers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29047500,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T10:09:22.136Z","status":"ssl_error","status_checked_at":"2026-02-03T10:09:16.814Z","response_time":96,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","papers"],"created_at":"2026-02-03T14:11:41.753Z","updated_at":"2026-02-03T14:11:42.336Z","avatar_url":"https://github.com/daturkel.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"### Landmark Papers in Machine Learning\n\n*This document attempts to collect the papers which developed important techniques in machine learning. Research is a collaborative process, discoveries are made independently, and the difference between the original version and a precursor can be subtle, but I’ve done my best to select the papers that I think are novel or significant.*\n\n*My opinions are by no means the final word on these topics. Please create an issue or pull request if you have a suggestion.*\n\n- [Landmark Papers in Machine Learning](#landmark-papers-in-machine-learning)\n  - [Key](#key)\n  - [Association Rule Learning](#association-rule-learning)\n  - [Datasets](#datasets)\n    - [Enron](#enron)\n    - [ImageNet](#imagenet)\n  - [Decision Trees](#decision-trees)\n  - [Deep Learning](#deep-learning)\n    - [AlexNet (image classification CNN)](#alexnet-image-classification-cnn)\n    - [Convolutional Neural Network](#convolutional-neural-network)\n    - [DeepFace (facial recognition)](#deepface-facial-recognition)\n    - [Generative Adversarial Network](#generative-adversarial-network)\n    - [GPT](#gpt)\n    - [Inception (classification/detection CNN)](#inception-classificationdetection-cnn)\n    - [Long Short-Term Memory (LSTM)](#long-short-term-memory-lstm)\n    - [Residual Neural Network (ResNet)](#residual-neural-network-resnet)\n    - [Transformer (sequence to sequence modeling)](#transformer-sequence-to-sequence-modeling)\n    - [U-Net (image segmentation CNN)](#u-net-image-segmentation-cnn)\n    - [VGG (image recognition CNN)](#vgg-image-recognition-cnn)\n  - [Ensemble Methods](#ensemble-methods)\n    - [AdaBoost](#adaboost)\n    - [Bagging](#bagging)\n    - [Gradient Boosting](#gradient-boosting)\n    - [Random Forest](#random-forest)\n  - [Games](#games)\n    - [AlphaGo](#alphago)\n    - [Deep Blue](#deep-blue)\n  - [Optimization](#optimization)\n    - [Adam](#adam)\n    - [Expectation Maximization](#expectation-maximization)\n    - [Stochastic Gradient Descent](#stochastic-gradient-descent)\n  - [Miscellaneous](#miscellaneous)\n    - [Non-negative Matrix Factorization](#non-negative-matrix-factorization)\n    - [PageRank](#pagerank)\n    - [DeepQA (Watson)](#deepqa-watson)\n  - [Natural Language Processing](#natural-language-processing)\n    - [Latent Dirichlet Allocation](#latent-dirichlet-allocation)\n    - [Latent Semantic Analysis](#latent-semantic-analysis)\n    - [Word2Vec](#word2vec)\n  - [Neural Network Components](#neural-network-components)\n    - [Autograd](#autograd)\n    - [Back-propagation](#back-propagation)\n    - [Batch Normalization](#batch-normalization)\n    - [Dropout](#dropout)\n    - [Gated Recurrent Unit](#gated-recurrent-unit)\n    - [Perceptron](#perceptron)\n  - [Recommender Systems](#recommender-systems)\n    - [Collaborative Filtering](#collaborative-filtering)\n    - [Matrix Factorization](#matrix-factorization)\n    - [Implicit Matrix Factorization](#implicit-matrix-factorization)\n  - [Regression](#regression)\n    - [Elastic Net](#elastic-net)\n    - [Lasso](#lasso)\n  - [Software](#software)\n    - [MapReduce](#mapreduce)\n    - [TensorFlow](#tensorflow)\n    - [Torch](#torch)\n  - [Supervised Learning](#supervised-learning)\n    - [k-Nearest Neighbors](#k-nearest-neighbors)\n    - [Support Vector Machine](#support-vector-machine)\n  - [Statistics](#statistics)\n    - [The Bootstrap](#the-bootstrap)\n- [Credits](#credits)\n\n#### Key\n\n| Icon |                                                              |\n| ---- | ------------------------------------------------------------ |\n| 🔒    | Paper behind paywall. In some cases, I provide an alternative link to the paper *if* it comes directly from one of the authors. |\n| 🔑    | Freely available version of paywalled paper, directly from the author. |\n| 💽    | Code associated with the paper.                              |\n| 🏛️    | Precursor or historically relevant paper. This may be a fundamental breakthrough that paved the way for the concept in question to be developed. |\n| 🔬    | Iteration, advancement, elaboration, or major popularization of a technique. |\n| 📔    | Blog post or something other than a formal publication.      |\n| 🌐    | Website associated with the paper.                           |\n| 🎥    | Video associated with the paper.                             |\n| 📊    | Slides or images associated with the paper.                  |\n\nPapers proceeded by “See also” indicate either additional historical context or else major developments, breakthroughs, or applications.\n\n#### Association Rule Learning\n\n- **Scalable Algorithms for Association Mining (2000)**. Zaki, [@IEEE](https://ieeexplore.ieee.org/document/846291/metrics#metrics) 🔒.\n\n- **Mining Frequent Patterns without Candidate Generation (2000)**. Han, Pei, and Yin, [@acm](https://dl.acm.org/doi/10.1145/335191.335372) .\n\n- **Mining Association Rules between Sets of Items in Large Databases (1993)**, Agrawal, Imielinski, and Swami, [@CiteSeerX](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.6984) 🏛️.\n\n- See also: **The GUHA method of automatic hypotheses determination (1966)**, Hájek, Havel, and Chytil, [@Springer](https://link.springer.com/article/10.1007/BF02345483) 🔒 🏛️.\n\n#### Datasets\n\n##### Enron\n\n- **The Enron Corpus: A New Dataset for Email Classification Research (2004)**, Klimt and Yang, [@Springer](https://link.springer.com/chapter/10.1007/978-3-540-30115-8_22) 🔒 / [@author](https://bklimt.com/papers/2004_klimt_ecml.pdf) 🔑.\n- See also: **Introducing the Enron Corpus (2004)**, Klimt and Yang, [@author](https://bklimt.com/papers/2004_klimt_ceas.pdf).\n\n##### ImageNet\n\n- **ImageNet: A large-scale hierarchical image database (2009)**, Deng et al., [@IEEE](https://ieeexplore.ieee.org/document/5206848) 🔒 / [@author](http://www.image-net.org/papers/imagenet_cvpr09.pdf) 🔑.\n- See also: **ImageNet Large Scale Visual Recognition Challenge (2015)**, [@Springer](https://link.springer.com/article/10.1007/s11263-015-0816-y) 🔒 / [@arXiv](https://arxiv.org/abs/1409.0575) 🔑 + [@author](http://www.image-net.org/challenges/LSVRC/) 🌐.\n\n#### Decision Trees\n\n- **Induction of Decision Trees (1986)**, Quinlan, [@Springer](https://link.springer.com/article/10.1007/BF00116251).\n\n#### Deep Learning\n\n##### AlexNet (image classification CNN)\n\n- **ImageNet Classification with Deep Convolutional Neural Networks (2012)**, [@NIPS](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks).\n\n##### Convolutional Neural Network\n\n- **Gradient-based learning applied to document recognition (1998)**, LeCun, Bottou, Bengio, and Haffner, [@IEEE](https://ieeexplore.ieee.org/document/726791/) 🔒 / [@author](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf) 🔑.\n- See also: **Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position (1980)**, Fukushima, [@Springer](https://link.springer.com/article/10.1007/BF00344251) 🏛️.\n- See also: **Phoneme recognition using time-delay neural networks (1989)**, Waibel, Hanazawa, Hinton, Shikano, and Lang, [@IEEE](https://ieeexplore.ieee.org/document/21701) 🏛️.\n- See also: **Fully Convolutional Networks for Semantic Segmentation (2014)**, Long, Shelhamer, and Darrell, [@arXiv](https://arxiv.org/abs/1411.4038).\n\n##### DeepFace (facial recognition)\n\n- **DeepFace: Closing the Gap to Human-Level Performance in Face Verification (2014)**, Taigman, Yang, Ranzato, and Wolf, [Facebook Research](https://research.fb.com/publications/deepface-closing-the-gap-to-human-level-performance-in-face-verification/).\n\n##### Generative Adversarial Network\n\n- **Generative Adversarial Nets (2014)**, Goodfellow et al., [@NIPS](https://papers.nips.cc/paper/5423-generative-adversarial-nets) + [@Github](https://github.com/goodfeli/adversarial) 💽.\n\n##### GPT\n\n- **Improving Language Understanding by Generative Pre-Training (2018)** *aka* GPT, Radford, Narasimhan, Salimans, and Sutskever, [@OpenAI](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) + [@Github](https://github.com/openai/finetune-transformer-lm) 💽 + [@OpenAI](https://openai.com/blog/language-unsupervised/) 📔.\n- See also: **Language Models are Unsupervised Multitask Learners (2019)** *aka* GPT-2, Radford, Wu, Child, Luan, Amodei, and Sutskever, [@OpenAI](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) 🔬 + [@Github](https://github.com/openai/gpt-2) 💽 + [@OpenAI](https://openai.com/blog/better-language-models/) 📔.\n- See also: **Language Models are Few-Shot Learners (2020)** *aka* GPT-3, Brown et al., [@arXiv](https://arxiv.org/abs/2005.14165) + [@OpenAI](https://openai.com/blog/openai-api/) 📔.\n\n##### Inception (classification/detection CNN)\n\n- **Going Deeper with Convolutions (2014)**, Szegedy et al., [@ai.google](https://ai.google/research/pubs/pub43022) + [@Github](https://github.com/google/inception) 💽.\n- See also: **Rethinking the Inception Architecture for Computer Vision (2016)**, Szegedy, Vanhoucke, Ioffe, Shlens, and Wojna, [@ai.google](https://ai.google/research/pubs/pub44903) 🔬.\n- See also: **Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning (2016)**, Szegedy, Ioffe, Vanhoucke, and Alemi, [@ai.google](https://ai.google/research/pubs/pub45169) 🔬.\n\n##### Long Short-Term Memory (LSTM)\n\n- **Long Short-term Memory (1997)**, Hochreiter and Schmidhuber, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.634).\n\n##### Residual Neural Network (ResNet)\n\n- **Deep Residual Learning for Image Recognition (2015)**, He, Zhang, Ren, and Sun, [@arXiv](https://arxiv.org/abs/1512.03385).\n\n##### Transformer (sequence to sequence modeling)\n\n- **Attention Is All You Need (2017)**, Vaswani et al., [@NIPS](http://papers.nips.cc/paper/7181-attention-is-all-you-need).\n\n##### U-Net (image segmentation CNN)\n\n- **U-Net: Convolutional Networks for Biomedical Image Segmentation (2015)**, Ronneberger, Fischer, Brox, [@Springer](https://link.springer.com/chapter/10.1007/978-3-319-24574-4_28) 🔒 / [@arXiv](https://arxiv.org/abs/1505.04597) 🔑.\n\n##### VGG (image recognition CNN)\n\n- **Very Deep Convolutional Networks for Large-Scale Image Recognition (2015)**, Simonyan and Zisserman, [@arXiv](https://arxiv.org/abs/1409.1556) + [@author](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) 🌐 + [@ICLR](https://iclr.cc/archive/www/lib/exe/fetch.php%3Fmedia=iclr2015:simonyan-iclr2015.pdf) 📊 + [@YouTube](https://www.youtube.com/watch?v=OQe-9P51Z0s) 🎥.\n\n#### Ensemble Methods\n\n##### AdaBoost\n\n- **A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting (1997—published as abstract in 1995)**, Freund and Schapire, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.8918).\n\n- See also: **Experiments with a New Boosting Algorithm (1996)**, Freund and Schapire, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.1040) 🔬.\n\n##### Bagging\n\n- **Bagging Predictors (1996)**, Breiman, [@Springer](https://link.springer.com/article/10.1023/A:1018054314350).\n\n##### Gradient Boosting\n\n- **Greedy function approximation: A gradient boosting machine (2001)**, Friedman, [@Project Euclid](https://projecteuclid.org/euclid.aos/1013203451).\n- See also: **XGBoost: A Scalable Tree Boosting System (2016)**, Chen and Guestrin, [@arXiv](https://arxiv.org/abs/1603.02754) 🔬 + [@GitHub](https://github.com/dmlc/xgboost) 💽.\n\n##### Random Forest\n\n- **Random Forests (2001)**, Breiman and Schapire, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.5395).\n\n#### Games\n\n##### AlphaGo\n\n- **Mastering the game of Go with deep neural networks and tree search (2016)**, Silver et al., [@Nature](https://www.nature.com/articles/nature16961).\n\n##### Deep Blue\n\n- **IBM's deep blue chess grandmaster chips (1999)**, Hsu, [@IEEE](https://ieeexplore.ieee.org/abstract/document/755469) 🔒.\n- See also: **Deep Blue (2002)**, Campbell, Hoane, and Hsu, [@ScienceDirect](https://www.sciencedirect.com/science/article/pii/S0004370201001291?via%3Dihub) 🔒.\n\n#### Optimization\n\n##### Adam\n\n- **Adam: A Method for Stochastic Optimization (2015)**, Kingma and Ba, [@arXiv](https://arxiv.org/abs/1412.6980).\n\n##### Expectation Maximization\n\n- **Maximum likelihood from incomplete data via the EM algorithm (1977)**, Dempster, Laird, and Rubin, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.4884).\n\n##### Stochastic Gradient Descent\n\n- **Stochastic Estimation of the Maximum of a Regression Function (1952)**, Kiefer and Wolfowitz, [@ProjectEuclid](https://projecteuclid.org/euclid.aoms/1177729392).\n- See also: **A Stochastic Approximation Method (1951)**, Robbins and Monro, [@ProjectEuclid](https://projecteuclid.org/euclid.aoms/1177729586) 🏛️.\n\n#### Miscellaneous\n\n##### Non-negative Matrix Factorization\n\n- **Learning the parts of objects by non-negative matrix factorization (1999)**, Lee and Seung, [@Nature](https://www.nature.com/articles/44565) 🔒.\n\n##### PageRank\n\n- **The PageRank Citation Ranking: Bringing Order to the Web (1998)**, Page, Brin, Motwani, and Winograd, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.5427).\n\n##### DeepQA (Watson)\n\n- **Building Watson: An Overview of the DeepQA Project (2010)**, Ferrucci et al., [@AAAI](https://www.aaai.org/ojs/index.php/aimagazine/article/view/2303).\n\n#### Natural Language Processing\n\n##### Latent Dirichlet Allocation\n\n- **Latent Dirichlet Allocation (2003)**, Blei, Ng, and Jordan, [@JMLR](http://jmlr.csail.mit.edu/papers/v3/blei03a.html)\n\n##### Latent Semantic Analysis\n\n- **Indexing by latent semantic analysis (1990)**, Deerwater, Dumais, Furnas, Landauer, and Harshman, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.8490).\n\n##### Word2Vec\n\n- **Efficient Estimation of Word Representations in Vector Space (2013)**, Mikolov, Chen, Corrado, and Dean, [@arXiv](https://arxiv.org/abs/1301.3781) + [@Google Code](https://code.google.com/archive/p/word2vec/) 💽.\n\n#### Neural Network Components\n\n##### Autograd\n\n- **Autograd: Effortless Gratients in Numpy (2015)**, [@ICML](https://indico.ijclab.in2p3.fr/event/2914/contributions/6483/subcontributions/180/attachments/6060/7185/automl-short.pdf) +  [@ICML](https://indico.ijclab.in2p3.fr/event/2914/contributions/6483/subcontributions/180/attachments/6059/7184/talk.pdf) 📊 + [@Github](https://github.com/HIPS/autograd) 💽.\n\n##### Back-propagation\n\n- **Learning representations by back-propagating errors (1986)**, Rumelhart, Hinton, and Williams, [@Nature](https://www.nature.com/articles/323533a0) 🔒.\n- See also: **Backpropagation Applied to Handwritten Zip Code Recognition (1989)**, LeCun et al., [@IEEE](https://ieeexplore.ieee.org/document/6795724) 🔒🔬 / [@author](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf) 🔑.\n\n##### Batch Normalization\n\n- **Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)**, Ioffe and Szegedy [@ICML via PMLR](http://proceedings.mlr.press/v37/ioffe15.html).\n\n##### Dropout\n\n- **Dropout: A Simple Way to Prevent Neural Networks from Overfitting (2014)**, Srivastava, Hinton, Krizhevsky, Sutskever, and Salakhutdinov, [@JMLR](http://jmlr.org/papers/v15/srivastava14a.html).\n\n##### Gated Recurrent Unit\n\n- **Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (2014)**, Cho et al, [@arXiv](https://arxiv.org/abs/1406.1078).\n\n##### Perceptron\n\n- **The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain (1958)**, Rosenblatt, [@CiteSeerX](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.588.3775).\n\n#### Recommender Systems\n\n##### Collaborative Filtering\n\n- **Using collaborative filtering to weave an information tapestry (1992)**, Goldberg, Nichols, Oki, and Terry, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.3739).\n\n##### Matrix Factorization\n\n- **Application of Dimensionality Reduction in Recommender System - A Case Study (2000)**, Sarwar, Karypis, Konstan, and Riedl, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.8381).\n- See also: **Learning Collaborative Information Filters (1998)**, Billsus and Pazzani, [@CiteSeerX](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.487.3789) 🏛️.\n- See also: **Netflix Update: Try This at Home (2006)**, Funk, [@author](https://sifter.org/~simon/journal/20061211.html) 📔 🔬.\n\n##### Implicit Matrix Factorization\n\n- **Collaborative Filtering for Implicit Feedback Datasets (2008)**, Hu, Koren, and Volinsky, [@IEEE](https://ieeexplore.ieee.org/document/4781121) 🔒 / [@author](http://yifanhu.net/PUB/cf.pdf) 🔑.\n\n#### Regression\n\n##### Elastic Net\n\n- **Regularization and variable selection via the Elastic Net (2005)**, Zou and Hastie, [@CiteSeer](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.124.4696).\n\n##### Lasso\n\n- **Regression Shrinkage and Selection Via the Lasso (1994)**, Tibshirani, [@CiteSeerX](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.7574).\n- See also: **Linear Inversion of Band-Limited Reflection Seismograms (1986)**, Santosa and Symes, [@SIAM](https://epubs.siam.org/doi/10.1137/0907087) 🏛️.\n\n#### Software\n\n##### MapReduce\n\n- **MapReduce: Simplified Data Processing on Large Clusters (2004)**, Dean and Ghemawat, [@ai.google](https://ai.google/research/pubs/pub62).\n\n##### TensorFlow\n\n- **TensorFlow: A system for large-scale machine learning (2016)**, Abadi et al., [@ai.google](https://ai.google/research/pubs/pub45381) + [@author](https://www.tensorflow.org/) 🌐.\n\n##### Torch\n\n- **Torch: A Modular Machine Learning Software Library (2002)**, Collobert, Bengio and Mariéthoz, [@Idiap](http://publications.idiap.ch/index.php/publications/show/712) + [@author](http://torch.ch/) 🌐.\n- See also: **Automatic differentiation in PyTorch (2017)**, Paszke et al., [@OpenReview](https://openreview.net/forum?id=BJJsrmfCZ) 🔬+ [@Github](https://github.com/pytorch/pytorch) 💽.\n\n#### Supervised Learning\n\n##### k-Nearest Neighbors\n\n- **Nearest neighbor pattern classification (1967)**, Cover and Hart, [@IEEE](https://ieeexplore.ieee.org/abstract/document/1053964) 🔒.\n- See also: **E. Fix and J.L. Hodges (1951): An Important Contribution to Nonparametric Discriminant Analysis and Density Estimation (1989)**, Silverman and Jones, [@JSTOR](https://www.jstor.org/stable/1403796?seq=1) 🔒.\n\n##### Support Vector Machine\n\n- **Support Vector Networks (1995)**, Cortes and Vapnik, [@Springer](https://link.springer.com/article/10.1023/A:1022627411411).\n\n#### Statistics\n\n##### The Bootstrap\n\n- **Bootstrap Methods: Another Look at the Jackknife (1979)**, Efron, [@Project Euclid](https://projecteuclid.org/euclid.aos/1176344552).\n- See also: **Problems in Plane Sampling (1949)**, Quenouille, [@Project Euclid](https://projecteuclid.org/euclid.aoms/1177729989) 🏛️.\n- See also: **Notes on Bias Estimation (1958)**, Quenouille, [@JSTOR](https://www.jstor.org/stable/2332914?seq=1) 🏛️.\n- See also: **Bias and Confidence in Not-quite Large Samples (1958)**, Tukey, [@Project Euclid](https://projecteuclid.org/euclid.aoms/1177706647) 🔬.\n\n### Credits\n\nA special thanks to Alexandre Passos for his comment on [this Reddit thread](https://www.reddit.com/r/MachineLearning/comments/hj4cx/classic_papers_in_machine_learning/c1vt6ny/), as well as the responders to [this Quora post](https://www.quora.com/What-are-some-of-the-best-research-papers-or-books-for-Machine-learning). They provided many great papers to get this list off to a great start.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaturkel%2Flearning-papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaturkel%2Flearning-papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaturkel%2Flearning-papers/lists"}