{"id":13935418,"url":"https://github.com/thomasj02/DeepLearningProjectWorkflow","last_synced_at":"2025-07-19T20:32:32.424Z","repository":{"id":72606598,"uuid":"83942939","full_name":"thomasj02/DeepLearningProjectWorkflow","owner":"thomasj02","description":"Machine Learning Workflow, from Andrew Ng's lecture at Deep Learning Summer School 2016","archived":false,"fork":false,"pushed_at":"2017-03-05T19:08:25.000Z","size":2,"stargazers_count":410,"open_issues_count":0,"forks_count":62,"subscribers_count":35,"default_branch":"master","last_synced_at":"2024-08-08T23:20:54.741Z","etag":null,"topics":["deep-learning"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomasj02.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-03-05T03:52:30.000Z","updated_at":"2024-05-08T02:39:22.000Z","dependencies_parsed_at":"2023-05-12T21:30:21.372Z","dependency_job_id":null,"html_url":"https://github.com/thomasj02/DeepLearningProjectWorkflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasj02%2FDeepLearningProjectWorkflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasj02%2FDeepLearningProjectWorkflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasj02%2FDeepLearningProjectWorkflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasj02%2FDeepLearningProjectWorkflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomasj02","download_url":"https://codeload.github.com/thomasj02/DeepLearningProjectWorkflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226676966,"owners_count":17665998,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning"],"created_at":"2024-08-07T23:01:43.267Z","updated_at":"2024-11-27T03:30:34.749Z","avatar_url":"https://github.com/thomasj02.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# Deep Learning Project Workflow\n\nThis document attempts to summarize Andrew Ng's recommended machine learning workflow from his [\"Nuts and Bolts of Applying Deep Learning\"](https://www.youtube.com/watch?v=F1ka6a13S9I\u0026t=4192s) talk at Deep Learning Summer School 2016. Any errors or misinterpretations are my own.\n\n\n# Start Here\n1. Measure Human-level performance on your task ([More](#measuring-human-level-performance))\n2.  Do your training and test data come from the same distribution?\n    * [Yes](#if-your-training-and-test-data-are-from-the-same-distribution)\n    * [No](#if-your-training-and-test-data-are-not-from-the-same-distribution)\n\n***\n\n# Measuring Human Level Performance\nThe real goal of measuring human-level performance is to estimate the [Bayes Error Rate](https://en.wikipedia.org/wiki/Bayes_error_rate). Knowing your Bayes Error Rate helps you figure out if your model is underfitting or overfitting your training data. More specifically, it will let us measure 'Bias' (as Ng defines it), which we use later in the workflow.\n***\n# If Your Training and Test Data Are From the Same Distribution\n\n#### 1. Shuffle and split your data into Train / Dev / Test Sets\nNg recommends a Train / Dev / Test split of approximately 70% / 15% / 15%.\n\n#### 2. Measure Your Training Error and Dev Set Error, and Calculate Bias and Variance\nCalculate your bias and variance as:\n* Bias = (Training Set Error) - (Human Error)\n* Variance = (Dev Set Error) - (Training Set Error)\n\n#### 3. Do You Have High Bias? [Fix This First.](#how-to-fix-high-bias)\nAn example of high bias:\n\nError Type | Error Rate\n----|----\nHuman Error | 1%\nTraining Set Error | 5%\nDev Set Error | 6%\n\n[Fix high bias](#how-to-fix-high-bias) before going on to the next step.\n\n#### 4. Do You Have High Variance? [Fix High Variance](#how-to-fix-high-variance).\n\nAn example of high variance:\n\nError Type | Error Rate\n----|----\nHuman Error | 1%\nTraining Set Error | 2%\nDev Set Error | 6%\n\nOnce you [fix your high variance](#how-to-fix-high-variance) then you're done!\n\n***\n# If Your Training and Test Data Are Not From the Same Distribution\n\n#### 1. Split Your Data\n\nIf your train and test data come from different distributions, make sure at least your dev and test sets are from the same distribution. You can do this by taking your test set and using half as dev and half as test. \n\nCarve out a small portion of your training set (call this _Train-Dev_) and split your Test data into _Dev_ and _Test_:\n```\n|---------------------------------|-----------------------|\n|     Train (Distribution 1)      | Test (Distribution 2) |\n|---------------------------------|-----------------------|\n|  Train              | Train-Dev |  Dev      |    Test   |\n|---------------------------------|-----------------------|\n\n```\n\n#### 2. Measure Your Errors, and Calculate the Relevant Metrics\n\nCalculate these metrics to help know where to focus your efforts:\n\nError Type | Formula\n----|----\nBias | (Training Error) - (Human Error)\nVariance | (Train-Dev Error) - (Training Error)\nTrain/Test Mismatch | (Dev Error) - (Train-Dev Error)\nOverfitting of Dev | (Test Error) - (Dev Error)\n\n#### 3. Do you have High Bias? [Fix Your High Bias](#how-to-fix-high-bias).\n\nAn example of high bias:\n\nError Type | Error Rate\n----|----\nHuman Error | 1%\nTraining Set Error | 10%\nTrain-Dev Set Error | 10.1%\nDev Set Error | 10.2%\n\n[Fix high bias](#how-to-fix-high-bias) before going on to the next step.\n\n\n#### 4. Do You Have High Variance? [Fix Your High Variance](#how-to-fix-high-variance).\n\nAn example of high variance:\n\nError Type | Error Rate\n----|----\nHuman Error | 1%\nTraining Set Error | 2%\nTrain-Dev Set Error | 10.1%\nDev Set Error | 10.2%\n\n[Fix your high variance](#how-to-fix-high-variance) before going on to the next step.\n\n\n#### 4. Do You Have Train/Test Mismatch? [Fix Your Train/Test Mismatch](#how-to-fix-traintest-mismatch).\n\nAn example of train/test mismatch:\n\nError Type | Error Rate\n----|----\nHuman Error | 1%\nTraining Set Error | 2%\nTrain-Dev Set Error | 2.1%\nDev Set Error | 10%\n\n\n[Fix your train/test mismatch](#how-to-fix-traintest-mismatch) before going on to the next step.\n\n\n#### 5. Are you Overfitting Your Dev Set? [Fix Your Overfitting](#how-to-fix-overfitting-of-your-dev-set)\n\nAn example of overfitting your dev set:\n\nError Type | Error Rate\n----|----\nHuman Error | 1%\nTraining Set Error | 2%\nTrain-Dev Set Error | 2.1%\nDev Set Error | 2.2%\nTest Error | 10%\n\nOnce you [fix your dev set overfitting](#how-to-fix-overfitting-of-your-dev-set), you're done!\n\n***\n\n# How to Fix High Bias\nNg suggests these ways for fixing a model with high bias:\n* **Try a bigger model**\n* Try training longer\n* Try a new model architecture (this can be hard)\n\n# How to Fix High Variance\nNg suggests these ways for fixing a model with high variance:\n* **Get more data**\n  * This includes data synthesis and data augmentation\n* Try adding regularization\n* Try early stopping\n* Try new model architecture (this can be hard)\n\n\n\n# How to Fix Train/Test Mismatch\n\nNg suggests these ways for fixing a model with high train/test mismatch:\n* Try to get more data similar to your test data\n* Try data synthesis and data augmentation\n* Try new model architecture (this can be hard)\n\n# How to Fix Overfitting of Your Dev Set\n\nNg suggests only one way of fixing dev set overfitting:\n* Get more dev data\n\nPresumably this would include data synthesis and data augmentation as well.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasj02%2FDeepLearningProjectWorkflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomasj02%2FDeepLearningProjectWorkflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasj02%2FDeepLearningProjectWorkflow/lists"}