awesome-machine-learning-engineer
🤓 A curated awesome list of Machine Learning Engineering resources. Feel free to contribute!
https://github.com/superlinear-ai/awesome-machine-learning-engineer
Last synced: 3 days ago
JSON representation
-
Communication
- Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
- Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
- Bike-shedding: how mature are you as an engineer? - How to avoid and call out bike-shedding (5 min)
- E-mail like a boss - How to write better e-mails (5 min)
- Stop Swiss Cheesing your calendar - How to manage your calendar so you can focus (15 min)
- Presentation Rules - How to create a great slide deck (30 min)
- SMART criteria - How to define goals (15 min)
- MECE principle - How to fully decompose a problem into a structured list (15 min)
- SCQA: What is it, how does it work, and how can it help me? - How to structure your presentations, proposals, and sales outlines (15 min)
- No More Misunderstandings - How to avoid miscommunication by paraphrasing (15 min)
- Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
- The Halo effect - How to recognize and use the Halo effect to your advantage (15 min)
- Four-sides model - How to communicate effectively by considering how the receiver interprets your message (30 min)
- BLUF: The Military Standard That Can Make Your Writing More Powerful - How to make your communication more powerful (5 min)
- Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
-
Software Engineering
-
API design
- Semantic Versioning - How to bump the version of your apps and packages (15 min)
- `__all__` and wild imports in Python - How `__all__` defines the public API of your Python packages (15 min)
- APIs for Machine Learning - How to design RESTful APIs for Machine Learning applications (30 min)
- FastAPI docs - How to build RESTful APIs that correspond one-to-one with an OpenAPI specification (1 day)
- The Rule of Three - When to build reusable components and when not (15 min)
- Falsehoods programmers believe about names - How to avoid common pitfalls about names (15 min)
- Command Line Interface Guidelines - How to write great CLIs (1 hour)
- Zalando's RESTful API guidelines - How to design RESTful APIs (1 day)
-
Workflow
- The seven rules of a great Git commit message - How to write great Git commit messages (15 min)
- Learn Git Branching - Practice Git from beginner to advanced (1 hour)
- Keep a Changelog - How to keep a changelog for your apps and packages (30 min)
- Conventional Commits - How to prefix your commit messages to automate [Semantic Versioning](https://semver.org/) and [Keep a Changelog](https://keepachangelog.com/) (15 min)
- A successful Git branching model - How to release software with Git (15 min)
- Code Health: Respectful Reviews == Useful Reviews - How to communicate code review comments respectfully (15 min)
- The Code Review Pyramid - What to look for and what to automate when reviewing a Pull Request (15 min)
- Poetry workspace plugin - How to create and manage a Poetry-based monorepo (15 min)
-
Python patterns
- The Definitive Guide to Python import Statements - How to write import statements (30 min)
- Understanding Python's logging module - How to use the `logging` module effectively (30 min)
- Don't run code at import time - Why you shouldn't run code at import time
- Please fix your decorators - Why you should probably use [`wrapt`](https://github.com/GrahamDumpleton/wrapt) to write your decorators (30 min)
- Do not log - What you should be doing instead of logging (30 min)
- The Little Book of Python Anti-Patterns - A collectiong of Python anti-patterns (X hours)
- Effective Python - A collection of Python idioms (X hours)
- SOLID - A standard set of software architecture patterns (1 hour)
- What the f*ck Python! - How to master Python by understanding its edge cases (1 day)
-
Typing
- The Comprehensive Guide to mypy - How to write type annotations in Python (1 hour)
- Mypy generics - How to use `TypeVar`s to write generic types such as `List[T]` (30 min)
- Mypy protocols - How to use `Protocol`s to define interfaces such as `Iterable` (30 min)
- Enums - How to write `Enum`s in Python instead of type-unsafe magic values (15 min)
-
Curated Python packages
- cookiecutter - Scaffold new Python packages or apps quickly with a Cookiecutter template
- cruft - Update a Python package's underlying Cookiecutter scaffolding
- commitizen - Check that commit messages satisfy [Conventional Commits](https://www.conventionalcommits.org/) and automate [Semantic Versioning](https://semver.org/) and [Keep a Changelog](https://keepachangelog.com/)
- poetry - Manage the packaging and dependencies of your Python project
- poe - Define and run tasks in a Poetry project with Poe the Poet
- black - Automatically format your code
- isort - Automatically sort your import statements
- pre-commit - Automatically run code quality checks on commit
- bandit - Find common security issues
- darglint - Check that your docstrings match your function signature
- flake8 - Check your code for bugs and that your code style is [PEP8](https://peps.python.org/pep-0008/)-compliant
- flake8 extensions - An awesome list of Flake8 extensions
- mypy - Check the type-correctness of your code
- pre-commit hooks - A collection of [pre-commit](https://pre-commit.com/) hooks that check file quality
- pydocstyle - Check that your code is documented
- pygrep hooks - A collection of [pre-commit](https://pre-commit.com/) hooks that check for common Python code smells
- pytest-recording - Record and play back HTTP requests in your pytest tests
- pyupgrade - Check that your code is written using the latest Python language features
- safety - Check that your dependencies don't have any known security vulnerabilities
- shellcheck - Check the quality of your shell scripts
- coverage.py - Check your code's test coverage
- hypothesis - Write tests that automatically look for edge cases that break your code
- fastapi - Create RESTful APIs based on type annotations
- typer - Create CLIs based on type annotations
- streamlit - Create web apps with a single Python file
- bump2version - Release a new version of your package
- coloredlogs - Increase your logs' readability with colour
- hvplot - Create interactive plots from pandas dataframes
- mkdocs - Create developer documentation for your project
- pdoc - Generate API documentation for your code
- birdseye - Graphically debug your Python code
- scalene - Profile your code's CPU and memory usage by line
- viztracer - Vizualize your code's performance with a [flamegraph](https://www.brendangregg.com/flamegraphs.html)
- tqdm - Easily add progress bars to long-running jobs
-
-
Machine Learning
-
Practical theory
- Bias-variance tradeoff - How a model's total error is the sum of bias and variance (30 min)
- The two different uses of cross-validation - How to use nested cross-validation to combine the two different uses of cross-validation (30 min)
- Modes, Medians and Means: A Unifying Perspective - Why minimizing the Mean Absolute Error (MAE) is more robust than minimizing the Mean Squared Error (MSE) (30 min)
- Backpropagation is the chain rule to compute the gradient - How backpropagation is an algorithm to compute the objective function's gradient (30 min)
- Stacked generalization - How to stack models (30 min)
- What is the .632+ rule? - How to measure generalization performance with bootstrapping (30 min)
- Data Distribution Shifts and Monitoring - How to detect and address the different types of data shift (1 hour)
- Backprop is not just the chain rule - How backpropagation relates to Lagrange multipliers (30 min)
- Why ML algorithms are hard to tune - Optimize multiple objectives when the Pareto front is concave (30min)
- Deep learning model compression - How quantization, pruning, and distillation can be used to compress models (30 min)
-
Explainability
- SHAP: SHapley Additive exPlanations - How to explain a model's output with Shapley values (30 min)
- Intro to Shapley and SHAP - How Shapley values are approximated by SHAP (30 min)
-
Unsupervised
- UMAP: Uniform Manifold Approximation and Projection - How to reduce dimensionality for visualization and modelling (30 min)
- PyNNDescent - How to find nearest neighbours in huge datasets (15 min)
-
Classification
- Precision and recall - How precision and recall measure a classifier's performance (30 min)
- Probability calibration - How and for which model types you should calibrate the model's output scores into probabilities (30 min)
- You're all calculating churn rates wrong - Correctly define what churn is (30 min)
-
Regression
- Gaussian processes - From scratch - How to build probabilistic regression models with Gaussian Processes (1 hour)
-
Computer Vision
- Microsoft's Document Image Transformer - A self-supervised pre-trained model that achieves SotA performance on [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) and can be used for various downstream tasks (30 min)
-
Natural Language Processing
- Awesome Sentence Embedding - A curated list of pretrained sentence and word embedding models (15 min)
-
Time Series Analysis
- The Prophet model - How Meta's Prophet model decomposes a time series into a trend, seasonality, and holiday components (30 min)
- Darts - Time Series Made Easy in Python - How to build forecasting models with `darts` (1 hour)
-
Recommender Systems
- Microsoft Recommenders - A comparison of recommender system models (30 min)
-
Pandas
- Modern Pandas series (Part 1 - 7) - Write idiomatic pandas (1 hour)
- Awesome Pandas - An awesome list of Pandas resources (1 hour)
-
Sci-kit learn
- Using scikit-learn Pipelines and FeatureUnions - How to use `Pipeline`s to build end-to-end models (30 min)
- Transforming target in regression - How to transform the target to build more robust models (15 min)
- Hyperparameter optimization with successive halving - How to optimize hyperparameters with the most computationally efficient method (30 min)
-
Labelling
- Doccano - A tool for labelling text (30 min)
- CVAT: Computer Vision Annotation Tool - A tool for labelling images (30 min)
- Awesome Data Labelling - An awesome list of data labelling tools (30 min)
-
-
DevOps
-
CI/CD
- invoke - How to implement common tasks you run on your project as a CLI (30 min)
-
Environment and dependency management
- Modern Python Environments - dependency and workspace management - A comparison between pyenv, venv + pip, venv + pip-tools, poetry, pipenv, and conda (30 min)
- Conda: Myths and Misconceptions - Common misconceptions about Conda (15 min)
-
Docker
- Docker Curriculum - How to use Docker (4 hours)
- Awesome Docker - An awesome list of Docker resources (30 min)
-
Data pipelines
- Great Expectations - How to test and document your data and data pipelines (30 min)
-
Shell
- Cron best practices - How to best use cron to schedule tasks (30 min)
- A visual guide to SSH tunnels - How to forward ports and create tunnels with SSH (30 min)
- Safe ways to do things in bash - How to write safe and robust shell scripts (1 hour)
- Your terminal is not a terminal: An Introduction to Streams - How your terminal is a tool to manipulate streams (30 min)
- Bash Heredoc - How to pass multiline arguments to commands with a heredoc (30 min)
-
Terraform
- An Introduction to Terraform - How to use Terraform (1 hour)
- Terraform best practices - Terraform best practices (1 hour)
- Terraform pre-commit hooks collection - How to automate Terraform code quality checks with pre-commit (1 hour)
- Awesome Terraform - An awesome list of Terraform resources (30 min)
- Terraform Tutorial - How to get started with Terraform (1 hour)
-
Infrastructure
- Using Redis In-Memory Storage for your Python Applications - How to use Redis as an in-memory cache for your Python application (30 min)
- Python Kafka Consumers: at-least-once, at-most-once, exactly-once - How to write different types of Kafka consumers in Python (30 min)
- Kafka Exactly-Once-Semantics - How to produce and consume messages exactly once (1 hour)
- ZeroMQ: a socket library with message queue primitives - ZeroMQ is a lightweight messaging system without a message broker (8 hours)
-
-
Curated by Superlinear
-
Infrastructure
- Superlinear - based Machine Learning company.
-
Programming Languages
Categories
Sub Categories
Curated Python packages
34
Practical theory
10
Python patterns
9
API design
8
Workflow
8
Infrastructure
5
Terraform
5
Shell
5
Typing
4
Sci-kit learn
3
Labelling
3
Classification
3
Environment and dependency management
2
Unsupervised
2
Time Series Analysis
2
Docker
2
Pandas
2
Explainability
2
Computer Vision
1
Data pipelines
1
Recommender Systems
1
Natural Language Processing
1
Regression
1
CI/CD
1
Keywords
python
27
linter
8
python3
6
machine-learning
6
deep-learning
5
awesome-list
5
awesome
5
data-science
4
pre-commit
4
terraform
3
documentation
3
annotation-tool
2
cli
2
security
2
developer-tools
2
static-code-analysis
2
git
2
formatter
2
documentation-tool
2
flake8
2
data-labeling
2
docstrings
2
terminal
2
poetry
2
infrastructure-as-code
2
cookiecutter
2
visualization
2
logging
2
debugging
2
keras
1
meter
1
pandas
1
parallel
1
progress
1
progress-bar
1
progressbar
1
jupyter
1
gui
1
discord
1
console
1
closember
1
pdoc
1
documentation-generator
1
docstring
1
docs
1
api-documentation
1
api
1
python-debugger
1
debugger
1
birdseye
1