{"id":14111173,"url":"https://github.com/d18s/awesome-machine-learning-engineering","last_synced_at":"2025-08-01T12:31:34.931Z","repository":{"id":66455144,"uuid":"136335923","full_name":"d18s/awesome-machine-learning-engineering","owner":"d18s","description":"A curated list of articles, papers and tools for managing the building and deploying of machine learning models, aka machine learning engineering.","archived":false,"fork":false,"pushed_at":"2018-09-27T10:47:34.000Z","size":7,"stargazers_count":18,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-05-22T21:04:27.464Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/d18s.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-06-06T13:53:34.000Z","updated_at":"2023-06-22T20:18:54.000Z","dependencies_parsed_at":"2023-02-20T20:00:29.093Z","dependency_job_id":null,"html_url":"https://github.com/d18s/awesome-machine-learning-engineering","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d18s%2Fawesome-machine-learning-engineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d18s%2Fawesome-machine-learning-engineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d18s%2Fawesome-machine-learning-engineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d18s%2Fawesome-machine-learning-engineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/d18s","download_url":"https://codeload.github.com/d18s/awesome-machine-learning-engineering/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228375765,"owners_count":17910280,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-14T10:03:10.730Z","updated_at":"2024-12-05T21:30:57.704Z","avatar_url":"https://github.com/d18s.png","language":null,"funding_links":[],"categories":["Other Lists"],"sub_categories":["TeX Lists"],"readme":"# awesome-machine-learning-engineering\nA curated list of articles, papers and tools for managing the building and deploying of machine learning models, aka machine learning engineering.\n\n- [Where to start](#where-to-start)\n- [Data](#data)\n- [Best practice](#best-practice)\n- [Example pipelines](#example-pipelines)\n- [Conference tracks and workshops](#conference-tracks-and-workshops)\n- [Big data on a single machine / on the command line](#big-data-on-a-single-machine--on-the-command-line)\n- [Software](#software)\n    - [Managing building and deploying models](#managing-building-and-deploying-models)\n    - [Managing building models](#managing-building-models)\n    - [Deploying models](#deploying-models)\n    - [Serialising and transpiling models](#serialising-and-transpiling-models)\n    - [Monitoring models](#monitoring-models)\n    - [AWS](#aws)\n    - [Google Cloud](#google-cloud)\n    - [Azure](#azure)\n- [Related awesome lists](#related-awesome-lists)\n\n## Where to start\n\n* [A Few Useful Things to Know about Machine Learning](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)\n* [Machine Learning glossary](https://developers.google.com/machine-learning/crash-course/glossary)\n\n## Data\n\n* [The Unreasonable Effectiveness of Data](https://ai.google/research/pubs/pub35179)\n* [Revisiting the Unreasonable Effectiveness of Data](https://ai.googleblog.com/2017/07/revisiting-unreasonable-effectiveness.html)\n* [Why you need to improve your training data, and how to do it](https://petewarden.com/2018/05/28/why-you-need-to-improve-your-training-data-and-how-to-do-it/)\n\n## Best practice\n\n* [Rules of Machine Learning: Best Practices for ML Engineering](https://developers.google.com/machine-learning/rules-of-ml/)\n* [What’s your ML test score? A rubric for ML production systems](https://ai.google/research/pubs/pub45742)\n* [Machine Learning: The High Interest Credit Card of Technical Debt](https://ai.google/research/pubs/pub43146)\n* [Introducing the Facebook Field Guide to Machine Learning video series](https://research.fb.com/the-facebook-field-guide-to-machine-learning-video-series/)\n* [Patterns for Research in Machine Learning](http://arkitus.com/patterns-for-research-in-machine-learning/)\n* [Production Data Science](https://github.com/Satalia/production-data-science)\n* [Making Netflix Machine Learning Algorithms Reliable](https://www.slideshare.net/justinbasilico/making-netflix-machine-learning-algorithms-reliable)\n* [Scaling Knowledge at Airbnb](https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091)\n\n## Example pipelines\n\n* [Ad Click Prediction: a View from the Trenches](https://ai.google/research/pubs/pub41159)\n* [Learning a Personalized Homepage](https://medium.com/netflix-techblog/learning-a-personalized-homepage-aa8ec670359a)\n* [Distributed Time Travel for Feature Generation](https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907)\n\n## Conference tracks and workshops\n\n* [Reliable Machine Learning in the Wild NIPS 2016 workshop](https://sites.google.com/site/wildml2016nips/)\n* [Reliable Machine Learning in the Wild ICML 2017 workshop](https://sites.google.com/site/wildml2017icml/)\n* [KDD 2017 Applied Data Science](http://www.kdd.org/kdd2017/applied-data-science-invited-talks)\n* [KDD 2018 Applied Data Science](http://www.kdd.org/kdd2018/applied-data-science-invited-talks)\n* [ECMLPKDD 2016 Industrial track](http://www.ecmlpkdd2016.org/program.html#accepted-industrial)\n* [ECMLPKDD 2017 Applied Data Science track](http://ecmlpkdd2017.ijs.si/program.html#AppDSTab)\n* [ECMLPKDD 2018](http://www.ecmlpkdd2018.org/accepted-papers-by-track-2/#tab-id-3)\n* [WWW 2018 Industry track](https://www2018.thewebconf.org/program/industry-track/)\n\n## Big data on a single machine / on the command line\n\n* [Command-line Tools can be 235x Faster than your Hadoop Cluster](https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html)\n* [Big Data, Small Machine](https://adamdrake.com/big-data-small-machine.html)\n* [Dask](https://github.com/dask/dask)\n* [Unix for poets](https://web.stanford.edu/class/cs124/kwc-unix-for-poets.pdf)\n* [Data Science at the Command Line](https://www.datascienceatthecommandline.com)\n* [Data hacks](https://github.com/bitly/data_hacks) command line utilities\n* [Split command](https://linux.die.net/man/1/split)\n* [Parallel command](https://linux.die.net/man/1/parallel)\n* [Xargs command parallel flag](https://www.gnu.org/software/findutils/manual/html_node/find_html/Controlling-Parallelism.html)\n\n## Software\n\n### Managing building and deploying models\n\n* [kubeflow](https://github.com/kubeflow/kubeflow) Machine Learning Toolkit for Kubernetes (kubeflow)\n* [ModelDB](https://github.com/mitdbg/modeldb) A system to manage machine learning models (MIT)\n* [mlflow](https://github.com/databricks/mlflow) Open source platform for the complete machine learning lifecycle (Databricks)\n* [datmo](https://github.com/datmo/datmo) Open source model tracking tool for data scientists\n\n### Managing building models\n\n* [Luigi](https://github.com/spotify/luigi) is a Python module that helps you build complex pipelines of batch jobs. (Spotify)\n* [Airflow](https://github.com/apache/incubator-airflow) is a platform to programmatically author, schedule, and monitor workflows (Netflix)\n* [Azkaban](https://github.com/azkaban/azkaban) workflow manager (LinkedIn)\n* [Pinball](https://github.com/pinterest/pinball) is a scalable workflow manager (pinterest)\n\n### Deploying models\n\n* [Serving](https://github.com/tensorflow/serving) A flexible, high-performance serving system for machine learning models (Google)\n* [deepdetect](https://github.com/jolibrain/deepdetect) Deep Learning API and Server in C++11 with Python bindings and support for Caffe, Tensorflow, XGBoost and TSNE (deepdetect)\n* [clipper](https://github.com/ucbrise/clipper) A low-latency prediction-serving system (Berkeley)\n* [MLeap](https://github.com/combust/mleap) Deploy Spark Pipelines to Production (combust.ml)\n* [openscoring](https://github.com/openscoring/openscoring) REST web service for the true real-time scoring (\u003c1 ms) of R, Scikit-Learn and Apache Spark models (openscoring)\n* [mxnet-model-server](https://github.com/awslabs/mxnet-model-server) Model Server for Apache MXNet is a tool for serving neural net models for inference (AWS)\n* [hydro-serving](https://github.com/Hydrospheredata/hydro-serving) ML FaaS - Machine Learning Serving cluster (hydrosphere.io)\n\n### Serialising and transpiling models\n\n* [Predictive Model Markup Language](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) (PMML)\n* [jpmml-sklearn](https://github.com/jpmml/jpmml-sklearn) Java library and command-line application for converting Scikit-Learn pipelines to PMML\n* [sklearn2pmml](https://github.com/jpmml/sklearn2pmml) Python library for converting Scikit-Learn pipelines to PMML\n* [sklearn-porter](https://github.com/nok/sklearn-porter) Transpile trained scikit-learn estimators to C, Java, JavaScript and others\n\n### Monitoring models\n\n* [Knowledge Repo](https://github.com/airbnb/knowledge-repo) A next-generation curated knowledge sharing platform for data scientists and other technical professions.\n\n### AWS\n\n* [Data Pipeline](https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html) \"is a web service that you can use to automate the movement and transformation of data\"\n* [Glue](https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html) \"is a fully managed ETL (extract, transform, and load) service\"\n* [Simple Workflow](https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-welcome.html) \"makes it easy to build applications that coordinate work across distributed components\"\n* [Batch](https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html) \"enables you to run batch computing workloads on the AWS Cloud\"\n* [Machine Learning](https://docs.aws.amazon.com/machine-learning/latest/dg/what-is-amazon-machine-learning.html) \"cloud-based service that makes it easy for developers of all skill levels to use machine learning technology\"\n* [Sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) \"is a fully managed machine learning service\"\n\n### Google Cloud\n\n* [Dataflow](https://cloud.google.com/dataflow/docs/) \"is a unified programming model and a managed service for developing and executing a wide variety of data processing patterns\"\n* [ML Engine](https://cloud.google.com/ml-engine/docs/) \"brings the power and flexibility of TensorFlow, scikit-learn and XGBoost to the cloud\"\n\n### Azure\n\n* [Batch AI](https://docs.microsoft.com/en-us/azure/batch-ai/) \"helps you experiment with your AI models using any framework and then train them at scale across GPU and CPU clusters\"\n* [Machine Learning services](https://docs.microsoft.com/en-us/azure/machine-learning/service/) \"enable building, deploying, and managing machine learning and AI models using any Python tools and libraries\"\n* [Machine Learning Studio](https://docs.microsoft.com/en-us/azure/machine-learning/studio/) \"is a collaborative, drag-and-drop tool you can use to build, test, and deploy predictive analytics solutions on your data\"\n\n## Related awesome lists\n\n* [awesome-machine-learning](https://github.com/josephmisiti/awesome-machine-learning)\n* [awesome-etl](https://github.com/pawl/awesome-etl)\n* [awesome-pipeline](https://github.com/pditommaso/awesome-pipeline)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd18s%2Fawesome-machine-learning-engineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fd18s%2Fawesome-machine-learning-engineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd18s%2Fawesome-machine-learning-engineering/lists"}