{"id":13737350,"url":"https://github.com/eugeneyan/ml-design-docs","last_synced_at":"2026-02-11T13:32:41.299Z","repository":{"id":48409543,"uuid":"347150753","full_name":"eugeneyan/ml-design-docs","owner":"eugeneyan","description":"📝  Design doc template \u0026 examples for machine learning systems (requirements, methodology, implementation, etc.)","archived":false,"fork":false,"pushed_at":"2023-03-16T01:09:17.000Z","size":26,"stargazers_count":617,"open_issues_count":2,"forks_count":106,"subscribers_count":14,"default_branch":"main","last_synced_at":"2025-08-29T15:10:50.900Z","etag":null,"topics":["design","design-docs","machine-learning"],"latest_commit_sha":null,"homepage":"https://eugeneyan.com/writing/ml-design-docs/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eugeneyan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-03-12T17:49:56.000Z","updated_at":"2025-08-20T10:18:07.000Z","dependencies_parsed_at":"2024-01-27T23:40:39.595Z","dependency_job_id":"39a56758-5204-48a9-a917-b6d8ec70d784","html_url":"https://github.com/eugeneyan/ml-design-docs","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eugeneyan/ml-design-docs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugeneyan%2Fml-design-docs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugeneyan%2Fml-design-docs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugeneyan%2Fml-design-docs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugeneyan%2Fml-design-docs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eugeneyan","download_url":"https://codeload.github.com/eugeneyan/ml-design-docs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugeneyan%2Fml-design-docs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29333516,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T12:42:24.625Z","status":"ssl_error","status_checked_at":"2026-02-11T12:41:23.344Z","response_time":97,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["design","design-docs","machine-learning"],"created_at":"2024-08-03T03:01:43.315Z","updated_at":"2026-02-11T13:32:41.283Z","avatar_url":"https://github.com/eugeneyan.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# ml-design-doc\n\nA template for design docs for machine learning systems based on this [post](https://eugeneyan.com/writing/ml-design-docs/).\n\nNote: This template is a guideline / checklist and is **not meant to be exhaustive**. The intent of the design doc is to help you think better (about the problem and design) and get feedback. Adopt whichever sections—and add new sections—to meet this goal. View other templates, examples [here](#other-templates-examples-etc).\n\n---\n## 1. Overview\n\nA summary of the doc's purpose, problem, solution, and desired outcome, usually in 3-5 sentences.\n\n## 2. Motivation\nWhy the problem is important to solve, and why now.\n\n## 3. Success metrics\nUsually framed as business goals, such as increased customer engagement (e.g., CTR, DAU), revenue, or reduced cost.\n\n## 4. Requirements \u0026 Constraints\nFunctional requirements are those that should be met to ship the project. They should be described in terms of the customer perspective and benefit. (See [this](https://eugeneyan.com/writing/ml-design-docs/#the-why-and-what-of-design-docs) for more details.)\n\nNon-functional/technical requirements are those that define system quality and how the system should be implemented. These include performance (throughput, latency, error rates), cost (infra cost, ops effort), security, data privacy, etc.\n\nConstraints can come in the form of non-functional requirements (e.g., cost below $`x` a month, p99 latency \u003c `y`ms)\n\n### 4.1 What's in-scope \u0026 out-of-scope?\nSome problems are too big to solve all at once. Be clear about what's out of scope.\n\n## 5. Methodology\n\n### 5.1. Problem statement\n\nHow will you frame the problem? For example, fraud detection can be framed as an unsupervised (outlier detection, graph cluster) or supervised problem (e.g., classification).\n\n### 5.2. Data\n\nWhat data will you use to train your model? What input data is needed during serving?\n\n### 5.3. Techniques\n\nWhat machine learning techniques will you use? How will you clean and prepare the data (e.g., excluding outliers) and create features?\n\n### 5.4. Experimentation \u0026 Validation\n\nHow will you validate your approach offline? What offline evaluation metrics will you use?\n\nIf you're A/B testing, how will you assign treatment and control (e.g., customer vs. session-based) and what metrics will you measure? What are the success and [guardrail](https://medium.com/airbnb-engineering/designing-experimentation-guardrails-ed6a976ec669) metrics?\n\n### 5.5. Human-in-the-loop\n\nHow will you incorporate human intervention into your ML system (e.g., product/customer exclusion lists)?\n\n## 6. Implementation\n\n### 6.1. High-level design\n\n![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/Data-flow-diagram-example.svg/1280px-Data-flow-diagram-example.svg.png)\n\nStart by providing a big-picture view. [System-context diagrams](https://en.wikipedia.org/wiki/System_context_diagram) and [data-flow diagrams](https://en.wikipedia.org/wiki/Data-flow_diagram) work well.\n\n### 6.2. Infra\n\nHow will you host your system? On-premise, cloud, or hybrid? This will define the rest of this section\n\n### 6.3. Performance (Throughput, Latency)\n\nHow will your system meet the throughput and latency requirements? Will it scale vertically or horizontally?\n\n### 6.4. Security\n\nHow will your system/application authenticate users and incoming requests? If it's publicly accessible, will it be behind a firewall?\n\n### 6.5. Data privacy\n\nHow will you ensure the privacy of customer data? Will your system be compliant with data retention and deletion policies (e.g., [GDPR](https://gdpr.eu/what-is-gdpr/))?\n\n### 6.6. Monitoring \u0026 Alarms\n\nHow will you log events in your system? What metrics will you monitor and how? Will you have alarms if a metric breaches a threshold or something else goes wrong?\n\n### 6.7. Cost\nHow much will it cost to build and operate your system? Share estimated monthly costs (e.g., EC2 instances, Lambda, etc.)\n\n### 6.8. Integration points\n\nHow will your system integrate with upstream data and downstream users?\n\n### 6.9. Risks \u0026 Uncertainties\n\nRisks are the known unknowns; uncertainties are the unknown unknows. What worries you and you would like others to review?\n\n## 7. Appendix\n\n### 7.1. Alternatives\n\nWhat alternatives did you consider and exclude? List pros and cons of each alternative and the rationale for your decision.\n\n### 7.2. Experiment Results\n\nShare any results of offline experiments that you conducted.\n\n### 7.3. Performance benchmarks\n\nShare any performance benchmarks you ran (e.g., throughput vs. latency vs. instance size/count).\n\n### 7.4. Milestones \u0026 Timeline\n\nWhat are the key milestones for this system and the estimated timeline?\n\n### 7.5. Glossary\n\nDefine and link to business or technical terms.\n\n### 7.6. References\n\nAdd references that you might have consulted for your methodology.\n\n---\n## Other templates, examples, etc\n- [A Software Design Doc](https://www.industrialempathy.com/posts/design-doc-a-design-doc/) `Google`\n- [Design Docs at Google](https://www.industrialempathy.com/posts/design-docs-at-google/) `Google`\n- [Product Spec of Emoji Reactions on Twitter Messages](https://docs.google.com/document/d/1sUX-sm5qZ474PCQQUpvdi3lvvmWPluqHOyfXz3xKL2M/edit#heading=h.554u12gw2xpd) `Twitter`\n- [Design Docs, Markdown, and Git](https://caitiem.com/2020/03/29/design-docs-markdown-and-git/) `Microsoft`\n- [Technical Decision-Making and Alignment in a Remote Culture](https://multithreaded.stitchfix.com/blog/2020/12/07/remote-decision-making/) `Stitchfix`\n- [Design Documents for Chromium](https://www.chromium.org/developers/design-documents) `Chromium`\n- [PRD Template](https://works.hashicorp.com/articles/prd-template) and [RFC Template](https://works.hashicorp.com/articles/rfc-template) (example RFC: [Manager Charter](https://works.hashicorp.com/articles/manager-charter)) `HashiCorp`\n- [Pitch for To-Do Groups and Group Notifications](https://basecamp.com/shapeup/1.5-chapter-06#examples) `Basecamp`\n- [The Anatomy of a 6-pager](https://writingcooperative.com/the-anatomy-of-an-amazon-6-pager-fc79f31a41c9) and an [example](https://docs.google.com/document/d/1LPh1LWx1z67YFo67DENYUGBaoKk39dtX7rWAeQHXzhg/edit) `Amazon`\n- [Writing for Distributed Teams](http://veekaybee.github.io/2021/07/17/p2s/), [How P2 Changed Automattic](https://ma.tt/2009/05/how-p2-changed-automattic/) `Automattic`\n- [Writing Technical Design Docs](https://medium.com/machine-words/writing-technical-design-docs-71f446e42f2e), [Writing Technical Design Docs, Revisited](https://medium.com/machine-words/writing-technical-design-docs-revisited-850d36570ec) `AWS`\n- [How to write a good software design doc](https://www.freecodecamp.org/news/how-to-write-a-good-software-design-document-66fcf019569c/) `Plaid`\n\nContributions [welcome](https://github.com/eugeneyan/ml-design-docs/pulls)!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feugeneyan%2Fml-design-docs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feugeneyan%2Fml-design-docs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feugeneyan%2Fml-design-docs/lists"}