Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bradleyboehmke/data-science-learning-resources

A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)
https://github.com/bradleyboehmke/data-science-learning-resources

data-science machine-learning

Last synced: 27 days ago
JSON representation

A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)

Awesome Lists containing this project

README

        

# Data Science Learning Resources

- [Programming](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/README.md#programming)
- [Machine Learning](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/README.md#machine-learning)
- [Leadership & Strategy](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/README.md#leadership--strategy)


## Programming

### General

- [The Pragmatic Programmer](https://www.amazon.com/Pragmatic-Programmer-Journeyman-Master/dp/020161622X/ref=sr_1_1?ie=UTF8&qid=1544653964&sr=8-1&keywords=the+pragmatic+programmer) (Book)
- [Clean Code](https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882) (Book)
- [Architecture Playbook](https://nocomplexity.com/documents/arplaybook/index.html) (Online guide)

### Python

- [A Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/index.html) (Book)
- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)
- [Python Tricks](https://www.amazon.com/Python-Tricks-Buffet-Awesome-Features/dp/1775093301) (Book)
- [Learning Python](https://www.amazon.com/Learning-Python-5th-Mark-Lutz/dp/1449355730) (Book)
- [Effective Python](https://effectivepython.com/) (Book)

### R

- [R for Data Science](https://r4ds.had.co.nz/) (Book)
- [Advanced R](https://adv-r.hadley.nz/) (Book)
- [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) (Book)
- [bookdown: Authoring Books and Technical Documents with R Markdown](https://bookdown.org/yihui/bookdown/) (Book)
- [Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving](https://www.amazon.com/Data-Science-Approach-Computational-Reasoning/dp/1482234815/ref=sr_1_4?keywords=data+science+in+r&qid=1578174035&s=books&sr=1-4) (Book)
- [Automated Data Collection with R](https://www.amazon.com/Automated-Data-Collection-Practical-Scraping-ebook/dp/B014T25K5O/ref=sr_1_2?keywords=Automated+Data+Collection+with+R&qid=1578174122&s=books&sr=1-2) (Book)
- [Introduction to Data Science](https://rafalab.github.io/dsbook/) (Book)

### Spark

- [Spark: The Definitive Guide: Big Data Processing Made Simple](https://www.amazon.com/Spark-Definitive-Guide-Processing-Simple/dp/1491912219/ref=sr_1_1?keywords=spark+oreilly&qid=1578175560&s=books&sr=1-1) (Book)
- [Learning Spark: Lightning-Fast Big Data Analysis](https://www.amazon.com/Learning-Spark-Lightning-Fast-Data-Analysis/dp/1449358624/ref=sr_1_2?keywords=spark+oreilly&qid=1578175595&s=books&sr=1-2) (Book)
- [Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling](https://www.amazon.com/Mastering-Spark-Complete-Large-Scale-Analysis/dp/149204637X/ref=sr_1_1?crid=3TYKU59XRH5HV&keywords=mastering+spark+with+r&qid=1578175640&s=books&sprefix=mastering+spark+%2Cstripbooks%2C152&sr=1-1) (Book)

### Command Line

- [The Missing Semester of Your CS Education](https://missing.csail.mit.edu/) (Online course)
- [Learning the bash Shell](https://www.amazon.com/Learning-bash-Shell-Programming-Nutshell/dp/0596009658/ref=sr_1_1?crid=3F1N8U9IGYOWA&keywords=learning+the+bash+shell%2C+3rd+edition&qid=1581944136&sprefix=learning+the+basdh%2Caps%2C345&sr=8-1) (Book)
- [The Art of the Command Line](https://github.com/jlevy/the-art-of-command-line) (GitHub resources)
- [explainshell.com](https://explainshell.com/) (Online help)

### Containers

- [Docker tips & tricks or just useful commands](https://medium.com/@clasikas/docker-tips-tricks-or-just-useful-commands-6e1fd8220450) (Online article)
- [Rocker: R configurations for Docker](https://github.com/rocker-org/rocker) (GitHub resources)
- [Docker and Python: making them play nicely and securely for Data Science and ML](https://us.pycon.org/2020/schedule/presentation/175/) (PyCon Talk)

### Functional Programming

- [An Introduction to the Basic Principles of Functional Programming](https://medium.freecodecamp.org/an-introduction-to-the-basic-principles-of-functional-programming-a2c2a15c84) (Online article)
- [R for Data Science, Ch. 21](https://r4ds.had.co.nz/iteration.html) (Book)
- [Advanced R, Ch. 9](https://adv-r.hadley.nz/functionals.html) (Book)
- [Jenny Bryan's purrr tutorials](https://jennybc.github.io/purrr-tutorial/) (Online tutorial)
- [Foundations of Functional Programming with purrr](https://www.datacamp.com/courses/foundations-of-functional-programming-with-purrr) (DataCamp)
- [Intermediate Functional Programming with purrr](https://www.datacamp.com/courses/intermediate-functional-programming-with-purrr) (DataCamp)

### Version Control

- [Excuse me, do you have a moment to talk about version control?](https://peerj.com/preprints/3159/) (Paper)
- [Happy Git and GitHub for the useR](http://happygitwithr.com/) (Book)
- [Learn Git](https://www.atlassian.com/git/tutorials/learn-git-with-bitbucket-cloud) (Online tutorial)
- [Introduction to Git In 16 Minutes](https://vickyikechukwu.hashnode.dev/introduction-to-git-in-16-minutes) (Online tutorial)
- [Git Commit Message Style Guide](http://udacity.github.io/git-styleguide/) (Online guide)

### Code Packaging

- [Python Packaging Authority](https://www.pypa.io/en/latest/)
- [Python Packaging User Guide](https://packaging.python.org/)
- [R Packages](https://r-pkgs.org/)

### Style Guide, Readability, Best Practices

- [The Art of Readable Code](https://www.amazon.com/Art-Readable-Code-Practical-Techniques-ebook/dp/B0064CZ1XE) (Book)
- [The Tidyverse Style Guide](https://style.tidyverse.org/) (Online book)
- [PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/) (Online guide)
- [Guidelines for code reviews](https://github.com/lyst/MakingLyst/tree/master/code-reviews) (README)
- [Code Review Best Practices](https://www.kevinlondon.com/2015/05/05/code-review-best-practices.html) (Blog post)

### Testing

- [Testing R Code](https://www.amazon.com/Testing-Code-Chapman-Hall-CRC/dp/1498763650) (Book)
- [Python Testing with pytest](https://www.amazon.com/Python-Testing-pytest-Effective-Scalable/dp/1680502409/ref=sr_1_3?keywords=Python+Testing+with+pytest&qid=1578174634&s=books&sr=1-3) (Book)
- [Multiply your Testing Effectiveness with Parameterized Testing](https://us.pycon.org/2020/schedule/presentation/172/) (PyCon Talk)
- [Test-Driven Development](https://www.amazon.com/Test-Driven-Development-Kent-Beck/dp/0321146530) (Book)

## Machine Learning

### General

- [Introduction to Statistical Learning](https://www-bcf.usc.edu/~gareth/ISL/) (Book)
- [Applied Predictive Modeling](http://appliedpredictivemodeling.com/) (Book)
- [Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/) (Book)
- [Computer Age of Statistical Inference](https://www.amazon.com/Computer-Age-Statistical-Inference-Mathematical/dp/1107149894) (Book)
- [Statistical Modeling: The Two Cultures](http://www2.math.uu.se/~thulin/mm/breiman.pdf) (Paper)
- [Deep Learning](https://www.amazon.com/slp/deep-learning/qh2mz33y5875zb7) (Book)
- Hands-On Machine Learning with Scikit-Learn & TensorFlow ([Book](https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291/ref=sr_1_3?crid=R2UBOSHHEJPM&keywords=hands+on+machine+learning+with+scikit+learn+and+tensorflow&qid=1558957194&s=books&sprefix=hands+on+machine%2Cstripbooks%2C135&sr=1-3) | [GitHub](https://github.com/ageron/handson-ml))
- [Hands-On Machine Learning with R](https://bradleyboehmke.github.io/HOML/) (Book)
- [Google's Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course) (MOOC)
- [Rules of Machine Learning: Best Practices for ML Engineering](http://bit.ly/313AP5H ) (Article)
- [How to Write Design Docs for Machine Learning Systems](https://eugeneyan.com/writing/ml-design-docs/?utm_source=convertkit&utm_medium=email&utm_campaign=Writing+Design+Docs+for+Machine+Learning+Systems%20-%205455802) (Article)

### Unsupervised Modeling

- [ISLR: Ch. 10.3 Clustering Methods](http://www-bcf.usc.edu/~gareth/ISL/) (Book chapter)
- [A K-Means Clustering Algorithm](https://www.jstor.org/stable/pdf/2346830.pdf?casa_token=DyTW0ZLNC4gAAAAA:VNX2TGwDfcs5foMa96ZxnOM2mjaQU1WuCOLL8qF6iDBWp6ClU8-i2-OSXKbtO1uHm6_1oda_2egpvgYCvaix8UxUqUryqZj-Pw3G4m771Ev5-4kL46Y) (Paper)
- [Generalized Low Rank Models](https://stanford.edu/~boyd/papers/pdf/glrm.pdf) (Paper)
- [Deep Learning Ch. 15 Autoencoders](https://www.amazon.com/slp/deep-learning/qh2mz33y5875zb7) (Book chapter)
- Hands-On Mach. Learning with Scikit-Learn Ch. 15 Autoencoders ([Book chapter](https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291/ref=sr_1_3?crid=R2UBOSHHEJPM&keywords=hands+on+machine+learning+with+scikit+learn+and+tensorflow&qid=1558957194&s=books&sprefix=hands+on+machine%2Cstripbooks%2C135&sr=1-3) | [GitHub resource](https://github.com/ageron/handson-ml/blob/master/15_autoencoders.ipynb))
- [Sparse autoencoder](https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf) (Andrew Ng CS294A lecture notes)

### A/B Testing

- [Lessons from Running Thoursands of A/B Tests](https://exp-platform.com/Documents/2014-10-11MITCodeKohaviExP.pdf) (Online presentation with many references)
- [Online Controlled Experiments at Large Scale](http://www.academia.edu/download/40088649/Online_Controlled_Experiments_at_Large_S20151116-5472-1cpgarq.pdf) (Paper)
- [Peaking at A/B Tests](http://library.usc.edu.ph/ACM/KKD%202017/pdfs/p1517.pdf) (Paper)
- [Multi-armed Bandit](https://support.google.com/analytics/answer/2844870?hl=en&ref_topic=1745207) (Online tutorial)
- [A Modern Bayesian Look at the Multi-armed Bandit](https://pdfs.semanticscholar.org/0323/0c3c83dbb013c3e610702c6f650307f0ce5c.pdf) (Paper behind above online tutorial)
- [Predicting Search Satisfaction Metrics with Interleaved Comparisons](https://www.microsoft.com/en-us/research/publication/predicting-search-satisfaction-metrics-with-interleaved-comparisons/) (Paper)
- [Evaluating Retrieval Performance using Clickthrough Data](https://www.cs.cornell.edu/people/tj/publications/joachims_02b.pdf) (Paper)

### Multivariate Adaptive Regression Splines
- [Multivariate Adaptive Regression Splines](https://projecteuclid.org/download/pdf_1/euclid.aos/1176347963) (Friedman's original paper)
- [APM: Ch. 7.2 Multivariate Adaptive Regression Splines](http://appliedpredictivemodeling.com/) (Book chapter)
- [ESL: Ch. 9.4 Multivariate Adaptive Regression Splines](https://web.stanford.edu/~hastie/ElemStatLearn/) (Book chapter)
- [Notes on the __earth__ package](http://www.milbo.org/doc/earth-notes.pdf) (Paper)

### K-Nearest Neighbor
- [k-Nearest neighbour classifiers](https://www.researchgate.net/profile/Sarah_Delany/publication/228686398_k-Nearest_neighbour_classifiers/links/0fcfd50d0c1d1f41ad000000/k-Nearest-neighbour-classifiers.pdf) (Paper)
- [APM: Ch. 7.4 & 13.5 K-Nearest Neighbors](http://appliedpredictivemodeling.com/) (Book chapter)
- [ESL: Ch. 13.3 k-Nearest-Neighbor Classifiers](https://web.stanford.edu/~hastie/ElemStatLearn/) (Book chapter)

### Random Forests

- [An Introduction to Recursive Partitioning Using the RPART Routines](https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf) (Paper)
- [Random Forests - Leo Breiman's original research paper](https://link.springer.com/content/pdf/10.1023/A:1010933404324.pdf) (Paper)

### Gradient Boosting Machines
- [How to explain gradient boosting](https://explained.ai/gradient-boosting/) (Online tutorial)
- [Trevor Hastie - Gradient Boosting & Random Forests at H2O World 2014](https://koalaverse.github.io/machine-learning-in-R/%20//www.youtube.com/watch?v=wPqtzj5VZus&index=16&list=PLNtMya54qvOFQhSZ4IKKXRbMkyL%20Mn0caa) (YouTube)
- [Trevor Hastie - Data Science of GBM (2013)](http://www.slideshare.net/0xdata/gbm-27891077) (slides)
- [Mark Landry - Gradient Boosting Method and Random Forest at H2O World 2015](https://www.youtube.com/watch?v=9wn1f-30_ZY) (YouTube)
- [Peter Prettenhofer - Gradient Boosted Regression Trees in scikit-learn at PyData London 2014](https://www.youtube.com/watch?v=IXZKgIsZRm0) (YouTube)
- [Alexey Natekin1 and Alois Knoll - Gradient boosting machines, a tutorial](http://journal.frontiersin.org/article/10.3389/fnbot.2013.00021/full) (Paper)
- [How to Train XGBoost With Spark](https://databricks.com/blog/2020/11/16/how-to-train-xgboost-with-spark.html) (Blog)
- [Training XGBoost4J-Spark with PySpark](https://databricks.com/notebooks/xgboost/xgboost4j-spark-example.html) (Tutorial notebook)
- [Use XGBoost on Databricks](https://docs.databricks.com/applications/machine-learning/train-model/xgboost.html) (Tutorial notebooks)

### Deep Learning
- [Deep Learning with R](https://www.amazon.com/Deep-Learning-R-Francois-Chollet/dp/161729554X/ref=sr_1_3?keywords=deep+learning+with+r&qid=1573422070&sr=8-3) (Book)
- [Deep Learning with Python](https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438/ref=sr_1_3?crid=1C568FMWA1B0C&keywords=deep+learning+with+python&qid=1573422053&sprefix=deep+learning+with+%2Caps%2C154&sr=8-3) (Book)
- [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) (MOOC)
- [keras.rstudio.com](https://keras.rstudio.com/) (Online articles & tutorials)
- [blogs.rstudio.com/tensorflow](https://blogs.rstudio.com/tensorflow/) (Online articles & tutorials)
- [Illustrated Guide to Recurrent Neural Networks](http://www.kurious.pub/blog/Illustrated-Guide-to-Recurrent-Neural-Networks-4) (Blog)
- [Illustrated Guide on Vanishing Gradients](http://www.kurious.pub/blog/Illustrated-Guide-on-Vanishing-Gradients-5) (Blog)
- [Illustrated Guide to LSTMs and GRUs](http://www.kurious.pub/blog/Illustrated-Guide-to-LSTMs-and-GRUs-A-step-by-step-explanation-6) (Blog)
- [Understanding LSTMs](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) (Blog)
- [Rohan & Lenny: Recurrent Neural Networks & LSTMs](https://ayearofai.com/rohan-lenny-3-recurrent-neural-networks-10300100899b) (Blog)
- [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) (Blog)
- [Revisiting Small Batch Training for Deep Neural Networks](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/resources/Revisiting%20Small%20Batch%20Training%20for%20Deep%20Neural%20Networks.pdf) (Paper)
- [On Loss Functions for Deep Neural Networks in Classification](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/resources/On%20Loss%20Functions%20for%20Deep%20Neural%20Networks%20in%20Classification.pdf) (Paper)
- [Practical Recommendations for Gradient-Based Training of Deep Architectures](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/resources/Practical-Recommendations-for-Gradient-Based-Training-of-Deep-Architectures.pdf) (Paper)
- [Efficient BackProp](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/resources/efficient-backprop-lecun.pdf) (Paper)
- [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/resources/delving-deep-into-rectifiers-he-2015.pdf) (Paper)
- [Cyclical Learning Rates for Training Neural Networks](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/resources/Cyclical-Learning-Rates-for-Training-Neural-Networks.pdf) (Paper)
- [A Disciplined Approach to Neural Network Hyperparameters: Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/resources/nn-hyperparameter-tuning.pdf) (Paper)
- [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/pdf/1506.01497.pdf) (Paper)

### Ensembles / Model Stacking / Super Learners
- [Ensemble Methods in Machine Learning](http://web.engr.oregonstate.edu/~tgd/publications/mcs-ensembles.pdf) (Paper)
- [Stacked Regressions](http://statistics.berkeley.edu/sites/default/files/tech-reports/367.pdf) (Paper)
- [Super Learner](https://www.degruyter.com/view/j/sagmb.2007.6.issue-1/sagmb.2007.6.1.1309/sagmb.2007.6.1.1309.xml) (Paper)

### Natural Language Processing / Text Mining

- [Text Mining with R](https://www.tidytextmining.com/) (Book)
- [Probabilistic Topic Models](http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf) (Paper)
- [The Illustrated Word2vec](http://jalammar.github.io/illustrated-word2vec/) (Online tutorial)
- [Sebastian Ruder's series on Word Embeddings](https://ruder.io/word-embeddings-1/index.html) (Online articles & tutorials)
- [Neural Models for Information Retrieval](https://arxiv.org/pdf/1705.01509.pdf) (Paper)
- [Why do we use word embeddings in NLP?](https://towardsdatascience.com/why-do-we-use-embeddings-in-nlp-2f20e1b632d2) (Blog)

### Recommendation Systems

- [Collaborative Filters for Recommendation Systems](https://course.fast.ai/videos/?lesson=6) (Fast.ai Deep Learning Lesson, starts at 1:25:00)
- [How to Measure and Mitigate Position Bias](https://eugeneyan.com/writing/position-bias/) (Blog)
- [Counterfactual Evaluation for Recommendation Systems](https://eugeneyan.com/writing/counterfactual-evaluation/) (Blog)

### Tuning

- [Deep Learning Tuning Playbook](https://github.com/google-research/tuning_playbook) (Github repo README)
- [Hyperparameters and Tuning Strategies for Random Forest](https://arxiv.org/pdf/1804.03515.pdf) (Paper)
- [Tunability: Importance of Hyperparameters of Machine Learning Algorithms](https://arxiv.org/pdf/1802.09596.pdf) (Paper)
- [Machine Learning Benchmarks and Random Forest Regression](https://cloudfront.escholarship.org/dist/prd/content/qt35x3v9t4/qt35x3v9t4.pdf) (Paper)
- [Random Search for Hyperparameter Optimization](http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf) (Paper)
- [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html) (Paper)
- [Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures](https://proceedings.mlr.press/v28/bergstra13.html) (Paper)

### Feature Engineering

- [Feature Engineering for Machine Learning](http://shop.oreilly.com/product/0636920049081.do) (Book)
- [Feature Engineering and Selection: A Practical Approach for Predictive Models](http://www.feat.engineering/) (Book)
- [Feature Stores - A Hierarchy of Needs](https://eugeneyan.com/writing/feature-stores/?utm_source=convertkit&utm_medium=email&utm_campaign=What+You+Need+to+Know+About+the+Feature+Store+Hierarchy+of+Needs%20-%205369173) (Article)

### Feature Selection

- [Feature Selection with the Boruta Package](https://www.jstatsoft.org/article/view/v036i11/v36i11.pdf) (Paper)
- [APM: Ch. 19 An Introduction to Feature Selection](http://appliedpredictivemodeling.com/) (Book chapter)

### Machine Learning Interpretability

- [Scott Lundberg's presentation on SHAP](https://youtu.be/B-c8tIgchu0)
- [H2O.ai Machine Learning Interpretability Resources](https://github.com/h2oai/mli-resources) (GitHub resources)
- [Patrick Hall's Awesome Machine Learning Interpretability Resources](https://github.com/jphall663/awesome-machine-learning-interpretability) (GitHub resources)
- [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/) (Book)
- [Visualizing the Feature Importance for Black Box Models](https://arxiv.org/abs/1804.06620) (Paper)
- [A Simple and Effective Model-Based Variable Importance Measure](https://arxiv.org/abs/1805.04755) (Paper)
- [Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation](https://arxiv.org/abs/1309.6392) (Paper)
- [pdp: An R Package for Constructing Partial Dependence Plots](https://journal.r-project.org/archive/2017/RJ-2017-016/RJ-2017-016.pdf) (Paper)
- ["Why Should I Trust You?": Explaining the Predictions of Any Classifier](https://arxiv.org/abs/1602.04938) (Paper)
- [A Unified Approach to Interpreting Model Predictions](https://arxiv.org/abs/1705.07874) (Paper)
- [Consistent Individualized Feature Attribution for Tree Ensembles](https://arxiv.org/abs/1802.03888) (Paper)
- [On the Art and Science of Machine Learning Explanations](https://arxiv.org/pdf/1810.02909) (Paper)
- [Explanation in artificial intelligence: Insights from the social sciences](https://arxiv.org/pdf/1706.07269) (Paper)
- [Please Stop Permuting Features: An Explanation and Alternatives](https://arxiv.org/abs/1905.03151) (Paper)
- [A Stratification Approach to Partial Dependence for Codependent Variables](https://arxiv.org/abs/1907.06698) (Paper)
- [Explaining Machine Learning Classifiers through Diverse Counterfactual Examples](https://www.microsoft.com/en-us/research/publication/explaining-machine-learning-classifiers-through-diverse-counterfactual-examples/) (Paper)

### Auto ML

- [A Review of Automatic Selection Methods for Machine Learning Algorithms and Hyperparameter Values](http://pages.cs.wisc.edu/~gangluo/automatic_selection_review.pdf) (Paper)
- [Learning Multiple Defaults for Machine Learning Algorithms](https://arxiv.org/abs/1811.09409) (Paper)

### Benchmarking

- [The Design and Analysis of Benchmark Experiments](https://www.jstor.org/stable/pdf/27594139.pdf) (Paper)
- [Szilard Pafka's ML Benchmarking Research](https://github.com/szilard/benchm-ml) (GitHub resources)
- [Data-driven advice for applying machine learning to bioinformatics problems](https://arxiv.org/pdf/1708.05070.pdf) (Paper)

### Resampling Procedures

- [Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning](https://arxiv.org/abs/1811.12808) (Paper)
- [Futility Analysis in the Cross-Validation of Machine Learning Models](https://arxiv.org/pdf/1405.6974.pdf) (Paper)
- [Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out, and Bootstrap](https://github.com/bradleyboehmke/data-science-learning-resources/blob/master/resources/resampling-comparison.pdf) (Paper)

### Productionalization

- [150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com](https://dl.acm.org/doi/10.1145/3292500.3330744) (Paper)
- [Hidden Technical Debt in Machine Learning Systems](https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf) (Paper)
- [Deep Learning in Production](https://github.com/ahkarami/Deep-Learning-in-Production) (Github resources)
- [Building Riviera: A Declarative Real-Time Feature Engineering Framework](https://doordash.engineering/2021/03/04/building-a-declarative-real-time-feature-engineering-framework/?utm_source=convertkit&utm_medium=email&utm_campaign=Writing+Design+Docs+for+Machine+Learning+Systems%20-%205455802) (Blog - DoorDash)
- [Software Engineering for Machine Learning: A Case Study](https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf) (Paper - Microsoft)
- [The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction](https://research.google/pubs/pub46555/) (Paper - Google)
- [Designing Machine Learning Systems](https://a.co/d/fcDGU9O) (Book)
- [Machine Learning Operations (MLOps): Overview, Definition, and Architecture](https://arxiv.org/abs/2205.02302) (Paper)

### Model Monitoring

- [Data Management for ML-based Analytics and Beyond](https://ddkang.github.io/papers/2023/jds.pdf) (Paper)

## Leadership & Strategy

### Management & Leadership

- [Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist](https://multithreaded.stitchfix.com/blog/2019/03/11/FullStackDS-Generalists/) (Blog - Stitch Fix)
- [Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department](https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/) (Blog - Stitch Fix)
- [The Engineering/Manager Pendulum](https://charity.wtf/2017/05/11/the-engineer-manager-pendulum/) (Blog)
- [Lessons learned managing the GitLab Data team](https://about.gitlab.com/blog/2020/02/10/lessons-learned-as-data-team-manager/) (Blog - GitLab)
- [The Manager's Path: A Guide for Tech Leaders Navigating Growth and Change](https://www.amazon.com/Managers-Path-Leaders-Navigating-Growth/dp/1491973897/ref=sr_1_1?crid=1SPIBADHM1RXO&dchild=1&keywords=The+Manager%27s+Path%3A+A+Guide+for+Tech+Leaders+Navigating+Growth+and+Change&qid=1609688389&sprefix=recessed+lighting+ang%2Caps%2C185&sr=8-1) (Book)
- [Who is fit to lead data science?](https://www.kdnuggets.com/2021/02/fit-lead-data-science.html) (Blog - KDnuggets)
- [Platform Revolution](https://www.amazon.com/Platform-Revolution-Networked-Markets-Transforming/dp/0393354350/ref=asc_df_0393354350/?tag=hyprod-20&linkCode=df0&hvadid=312175933381&hvpos=&hvnetw=g&hvrand=1035572653413444981&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9015829&hvtargid=pla-303822642427&psc=1&tag=&ref=&adgrpid=60258871817&hvpone=&hvptwo=&hvadid=312175933381&hvpos=&hvnetw=g&hvrand=1035572653413444981&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9015829&hvtargid=pla-303822642427) (Book)
- [No Rules Rules: Netflix and the Culture of Reinvention](https://www.amazon.com/No-Rules-Netflix-Culture-Reinvention/dp/1984877860/ref=sr_1_2?dchild=1&keywords=no+rules+rules&qid=1603384965&s=books&sr=1-2) (Book)
- [Living by the Code](https://www.amazon.com/Living-Code-First-Developers-Innovators/dp/1942878826/ref=sr_1_1?dchild=1&keywords=living+by+the+code&qid=1614096281&sr=8-1) (Book)
- [The Best of Both Worlds: Unlocking the Potential of Hybrid Work for Software Engineers](https://www.microsoft.com/en-us/research/publication/the-best-of-both-worlds-unlocking-the-potential-of-hybrid-work-for-software-engineers/) (Paper)

### Cloud Strategy

- [The Cost of Cloud, a Trillion Dollar Paradox](https://a16z.com/2021/05/27/cost-of-cloud-paradox-market-cap-cloud-lifecycle-scale-growth-repatriation-optimization/) (Blog - Andreessen Horowitz)
- [From Cloud Computing to Sky Computing](https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s02-stoica.pdf) (Paper)

### Product

- [The Influential Product Manager: How to Lead and Launch Successful Technology Products](https://www.amazon.com/Influential-Product-Manager-Successful-Technology-ebook/dp/B07Y3ZG5NV/ref=sr_1_1_sspa?crid=2GQMTCRZIC8KP&dchild=1&keywords=the+influential+product+manager&qid=1603384998&sprefix=The+influential+product%2Cstripbooks%2C180&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyQVUxOVQ5RjRQRENVJmVuY3J5cHRlZElkPUEwMzIzNDAzQjlCOEdFWjZaVjA3JmVuY3J5cHRlZEFkSWQ9QTA4NTU1NzhYQzhWWUhaSUE1NFMmd2lkZ2V0TmFtZT1zcF9hdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl) (Book)
- [Mastering Product Management: A Step-by-Step Guide](https://www.amazon.com/Mastering-Product-Management-Step-Step/dp/1733839003/ref=sr_1_3?crid=PYM2K7DWJSNC&dchild=1&keywords=mastering+product+management+a+step-by-step+guide&qid=1607368736&sprefix=Mastering+Product%2Caps%2C202&sr=8-3) (Book)

### Performance Reviews

- [Preparing for performance reviews ahead of time](https://newsletter.pragmaticengineer.com/p/preparing-for-performance-reviews) (Blog)
- [Get your work recognized: write a brag document](https://jvns.ca/blog/brag-documents/) (Blog + template)
- [Don't do invisible work](https://www.youtube.com/watch?v=HiF83i1OLOM&list=PLYXaKIsOZBsu3h2SSKEovRn7rGy7wkUAV&index=30) (Presentation)
- [Work log template for Software Engineers](https://docs.google.com/document/d/10szdym6SXJ_3GH6-nefOOfzEX1lYlpSu74MAFbpX2t0/edit) (Template)
- [Sending weekly 5-15 updates](https://lethain.com/weekly-updates/) (Blog)