awesome-machine-learning-engineer

🤓 A curated awesome list of Machine Learning Engineering resources. Feel free to contribute!
https://github.com/superlinear-ai/awesome-machine-learning-engineer

Last synced: 3 days ago
JSON representation

Communication
- Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
- Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
- Bike-shedding: how mature are you as an engineer? - How to avoid and call out bike-shedding (5 min)
- E-mail like a boss - How to write better e-mails (5 min)
- Stop Swiss Cheesing your calendar - How to manage your calendar so you can focus (15 min)
- Presentation Rules - How to create a great slide deck (30 min)
- SMART criteria - How to define goals (15 min)
- MECE principle - How to fully decompose a problem into a structured list (15 min)
- SCQA: What is it, how does it work, and how can it help me? - How to structure your presentations, proposals, and sales outlines (15 min)
- No More Misunderstandings - How to avoid miscommunication by paraphrasing (15 min)
- Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
- The Halo effect - How to recognize and use the Halo effect to your advantage (15 min)
- Four-sides model - How to communicate effectively by considering how the receiver interprets your message (30 min)
- BLUF: The Military Standard That Can Make Your Writing More Powerful - How to make your communication more powerful (5 min)
- Nonviolent communication - How to deliver constructive feedback in difficult situations (15 min)
Software Engineering
- API design
  - Semantic Versioning - How to bump the version of your apps and packages (15 min)
  - `__all__` and wild imports in Python - How `__all__` defines the public API of your Python packages (15 min)
  - APIs for Machine Learning - How to design RESTful APIs for Machine Learning applications (30 min)
  - FastAPI docs - How to build RESTful APIs that correspond one-to-one with an OpenAPI specification (1 day)
  - The Rule of Three - When to build reusable components and when not (15 min)
  - Falsehoods programmers believe about names - How to avoid common pitfalls about names (15 min)
  - Command Line Interface Guidelines - How to write great CLIs (1 hour)
  - Zalando's RESTful API guidelines - How to design RESTful APIs (1 day)
- Workflow
  - The seven rules of a great Git commit message - How to write great Git commit messages (15 min)
  - Learn Git Branching - Practice Git from beginner to advanced (1 hour)
  - Keep a Changelog - How to keep a changelog for your apps and packages (30 min)
  - Conventional Commits - How to prefix your commit messages to automate [Semantic Versioning](https://semver.org/) and [Keep a Changelog](https://keepachangelog.com/) (15 min)
  - A successful Git branching model - How to release software with Git (15 min)
  - Code Health: Respectful Reviews == Useful Reviews - How to communicate code review comments respectfully (15 min)
  - The Code Review Pyramid - What to look for and what to automate when reviewing a Pull Request (15 min)
  - Poetry workspace plugin - How to create and manage a Poetry-based monorepo (15 min)
- Python patterns
  - The Definitive Guide to Python import Statements - How to write import statements (30 min)
  - Understanding Python's logging module - How to use the `logging` module effectively (30 min)
  - Don't run code at import time - Why you shouldn't run code at import time
  - Please fix your decorators - Why you should probably use [`wrapt`](https://github.com/GrahamDumpleton/wrapt) to write your decorators (30 min)
  - Do not log - What you should be doing instead of logging (30 min)
  - The Little Book of Python Anti-Patterns - A collectiong of Python anti-patterns (X hours)
  - Effective Python - A collection of Python idioms (X hours)
  - SOLID - A standard set of software architecture patterns (1 hour)
  - What the f*ck Python! - How to master Python by understanding its edge cases (1 day)
- Typing
  - The Comprehensive Guide to mypy - How to write type annotations in Python (1 hour)
  - Mypy generics - How to use `TypeVar`s to write generic types such as `List[T]` (30 min)
  - Mypy protocols - How to use `Protocol`s to define interfaces such as `Iterable` (30 min)
  - Enums - How to write `Enum`s in Python instead of type-unsafe magic values (15 min)
- Curated Python packages
  - cookiecutter - Scaffold new Python packages or apps quickly with a Cookiecutter template
  - cruft - Update a Python package's underlying Cookiecutter scaffolding
  - commitizen - Check that commit messages satisfy [Conventional Commits](https://www.conventionalcommits.org/) and automate [Semantic Versioning](https://semver.org/) and [Keep a Changelog](https://keepachangelog.com/)
  - poetry - Manage the packaging and dependencies of your Python project
  - poe - Define and run tasks in a Poetry project with Poe the Poet
  - black - Automatically format your code
  - isort - Automatically sort your import statements
  - pre-commit - Automatically run code quality checks on commit
  - bandit - Find common security issues
  - darglint - Check that your docstrings match your function signature
  - flake8 - Check your code for bugs and that your code style is [PEP8](https://peps.python.org/pep-0008/)-compliant
  - flake8 extensions - An awesome list of Flake8 extensions
  - mypy - Check the type-correctness of your code
  - pre-commit hooks - A collection of [pre-commit](https://pre-commit.com/) hooks that check file quality
  - pydocstyle - Check that your code is documented
  - pygrep hooks - A collection of [pre-commit](https://pre-commit.com/) hooks that check for common Python code smells
  - pytest-recording - Record and play back HTTP requests in your pytest tests
  - pyupgrade - Check that your code is written using the latest Python language features
  - safety - Check that your dependencies don't have any known security vulnerabilities
  - shellcheck - Check the quality of your shell scripts
  - coverage.py - Check your code's test coverage
  - hypothesis - Write tests that automatically look for edge cases that break your code
  - fastapi - Create RESTful APIs based on type annotations
  - typer - Create CLIs based on type annotations
  - streamlit - Create web apps with a single Python file
  - bump2version - Release a new version of your package
  - coloredlogs - Increase your logs' readability with colour
  - hvplot - Create interactive plots from pandas dataframes
  - mkdocs - Create developer documentation for your project
  - pdoc - Generate API documentation for your code
  - birdseye - Graphically debug your Python code
  - scalene - Profile your code's CPU and memory usage by line
  - viztracer - Vizualize your code's performance with a [flamegraph](https://www.brendangregg.com/flamegraphs.html)
  - tqdm - Easily add progress bars to long-running jobs
Machine Learning
- Practical theory
  - Bias-variance tradeoff - How a model's total error is the sum of bias and variance (30 min)
  - The two different uses of cross-validation - How to use nested cross-validation to combine the two different uses of cross-validation (30 min)
  - Modes, Medians and Means: A Unifying Perspective - Why minimizing the Mean Absolute Error (MAE) is more robust than minimizing the Mean Squared Error (MSE) (30 min)
  - Backpropagation is the chain rule to compute the gradient - How backpropagation is an algorithm to compute the objective function's gradient (30 min)
  - Stacked generalization - How to stack models (30 min)
  - What is the .632+ rule? - How to measure generalization performance with bootstrapping (30 min)
  - Data Distribution Shifts and Monitoring - How to detect and address the different types of data shift (1 hour)
  - Backprop is not just the chain rule - How backpropagation relates to Lagrange multipliers (30 min)
  - Why ML algorithms are hard to tune - Optimize multiple objectives when the Pareto front is concave (30min)
  - Deep learning model compression - How quantization, pruning, and distillation can be used to compress models (30 min)
- Explainability
  - SHAP: SHapley Additive exPlanations - How to explain a model's output with Shapley values (30 min)
  - Intro to Shapley and SHAP - How Shapley values are approximated by SHAP (30 min)
- Unsupervised
  - UMAP: Uniform Manifold Approximation and Projection - How to reduce dimensionality for visualization and modelling (30 min)
  - PyNNDescent - How to find nearest neighbours in huge datasets (15 min)
- Classification
  - Precision and recall - How precision and recall measure a classifier's performance (30 min)
  - Probability calibration - How and for which model types you should calibrate the model's output scores into probabilities (30 min)
  - You're all calculating churn rates wrong - Correctly define what churn is (30 min)
- Regression
  - Gaussian processes - From scratch - How to build probabilistic regression models with Gaussian Processes (1 hour)
- Computer Vision
  - Microsoft's Document Image Transformer - A self-supervised pre-trained model that achieves SotA performance on [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) and can be used for various downstream tasks (30 min)
- Natural Language Processing
  - Awesome Sentence Embedding - A curated list of pretrained sentence and word embedding models (15 min)
- Time Series Analysis
  - The Prophet model - How Meta's Prophet model decomposes a time series into a trend, seasonality, and holiday components (30 min)
  - Darts - Time Series Made Easy in Python - How to build forecasting models with `darts` (1 hour)
- Recommender Systems
  - Microsoft Recommenders - A comparison of recommender system models (30 min)
- Pandas
  - Modern Pandas series (Part 1 - 7) - Write idiomatic pandas (1 hour)
  - Awesome Pandas - An awesome list of Pandas resources (1 hour)
- Sci-kit learn
  - Using scikit-learn Pipelines and FeatureUnions - How to use `Pipeline`s to build end-to-end models (30 min)
  - Transforming target in regression - How to transform the target to build more robust models (15 min)
  - Hyperparameter optimization with successive halving - How to optimize hyperparameters with the most computationally efficient method (30 min)
- Labelling
  - Doccano - A tool for labelling text (30 min)
  - CVAT: Computer Vision Annotation Tool - A tool for labelling images (30 min)
  - Awesome Data Labelling - An awesome list of data labelling tools (30 min)
DevOps
- CI/CD
  - invoke - How to implement common tasks you run on your project as a CLI (30 min)
- Environment and dependency management
  - Modern Python Environments - dependency and workspace management - A comparison between pyenv, venv + pip, venv + pip-tools, poetry, pipenv, and conda (30 min)
  - Conda: Myths and Misconceptions - Common misconceptions about Conda (15 min)
- Docker
  - Docker Curriculum - How to use Docker (4 hours)
  - Awesome Docker - An awesome list of Docker resources (30 min)
- Data pipelines
  - Great Expectations - How to test and document your data and data pipelines (30 min)
- Shell
  - Cron best practices - How to best use cron to schedule tasks (30 min)
  - A visual guide to SSH tunnels - How to forward ports and create tunnels with SSH (30 min)
  - Safe ways to do things in bash - How to write safe and robust shell scripts (1 hour)
  - Your terminal is not a terminal: An Introduction to Streams - How your terminal is a tool to manipulate streams (30 min)
  - Bash Heredoc - How to pass multiline arguments to commands with a heredoc (30 min)
- Terraform
  - An Introduction to Terraform - How to use Terraform (1 hour)
  - Terraform best practices - Terraform best practices (1 hour)
  - Terraform pre-commit hooks collection - How to automate Terraform code quality checks with pre-commit (1 hour)
  - Awesome Terraform - An awesome list of Terraform resources (30 min)
  - Terraform Tutorial - How to get started with Terraform (1 hour)
- Infrastructure
  - Using Redis In-Memory Storage for your Python Applications - How to use Redis as an in-memory cache for your Python application (30 min)
  - Python Kafka Consumers: at-least-once, at-most-once, exactly-once - How to write different types of Kafka consumers in Python (30 min)
  - Kafka Exactly-Once-Semantics - How to produce and consume messages exactly once (1 hour)
  - ZeroMQ: a socket library with message queue primitives - ZeroMQ is a lightweight messaging system without a message broker (8 hours)
Curated by Superlinear
- Infrastructure
  - Superlinear - based Machine Learning company.

Programming Languages

Python 37 Jupyter Notebook 2 JavaScript 2 HCL 1 Shell 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-machine-learning-engineer

Communication

Software Engineering

API design

Workflow

Python patterns

Typing

Curated Python packages

Machine Learning

Practical theory

Explainability

Unsupervised

Classification

Regression

Computer Vision

Natural Language Processing

Time Series Analysis

Recommender Systems

Pandas

Sci-kit learn

Labelling

DevOps

CI/CD

Environment and dependency management

Docker

Data pipelines

Shell

Terraform

Infrastructure

Curated by Superlinear

Infrastructure