{"id":13513740,"url":"https://github.com/justmarkham/scikit-learn-videos","last_synced_at":"2025-05-14T14:09:11.063Z","repository":{"id":29916756,"uuid":"33462633","full_name":"justmarkham/scikit-learn-videos","owner":"justmarkham","description":"Jupyter notebooks from the scikit-learn video series","archived":false,"fork":false,"pushed_at":"2024-03-05T19:22:42.000Z","size":1548,"stargazers_count":3700,"open_issues_count":2,"forks_count":2540,"subscribers_count":347,"default_branch":"master","last_synced_at":"2025-04-11T06:14:49.148Z","etag":null,"topics":["data-science","jupyter-notebook","machine-learning","python","scikit-learn","tutorial"],"latest_commit_sha":null,"homepage":"https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/justmarkham.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"patreon":"dataschool"}},"created_at":"2015-04-06T02:08:42.000Z","updated_at":"2025-03-31T10:23:37.000Z","dependencies_parsed_at":"2024-09-29T06:18:13.409Z","dependency_job_id":null,"html_url":"https://github.com/justmarkham/scikit-learn-videos","commit_stats":{"total_commits":54,"total_committers":3,"mean_commits":18.0,"dds":0.03703703703703709,"last_synced_commit":"8545c74961398def7724501648fd504dbf061b41"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justmarkham%2Fscikit-learn-videos","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justmarkham%2Fscikit-learn-videos/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justmarkham%2Fscikit-learn-videos/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justmarkham%2Fscikit-learn-videos/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/justmarkham","download_url":"https://codeload.github.com/justmarkham/scikit-learn-videos/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254160226,"owners_count":22024567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","jupyter-notebook","machine-learning","python","scikit-learn","tutorial"],"created_at":"2024-08-01T05:00:36.892Z","updated_at":"2025-05-14T14:09:06.054Z","avatar_url":"https://github.com/justmarkham.png","language":"Jupyter Notebook","readme":"# Introduction to Machine Learning with scikit-learn\n\nThis video series will teach you how to solve Machine Learning problems using Python's popular scikit-learn library. There are **10 video tutorials** totaling 4.5 hours, each with a corresponding **Jupyter notebook**.\n\nYou can [watch the entire series](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A) on YouTube and [view all of the notebooks](http://nbviewer.jupyter.org/github/justmarkham/scikit-learn-videos/tree/master/) using nbviewer.\n\nThe series is also available as a [free online course](https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn) that includes updated content, quizzes, and a certificate of completion.\n\n[![Watch the first tutorial video](images/youtube.png)](https://www.youtube.com/watch?v=elojMnjn4kk\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=1 \"Watch the first tutorial video\")\n\n**Note:** The notebooks in this repository have been updated to use Python 3.9.1 and scikit-learn 0.23.2. The original notebooks (shown in the video) used Python 2.7 and scikit-learn 0.16, and can be downloaded from the [archive branch](https://github.com/justmarkham/scikit-learn-videos/tree/archive). You can read about how I updated the code in this [blog post](https://www.dataschool.io/how-to-update-your-scikit-learn-code-for-2018/).\n\n## Table of Contents\n\n1. What is Machine Learning, and how does it work? ([video](https://www.youtube.com/watch?v=elojMnjn4kk\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=1), [notebook](01_machine_learning_intro.ipynb))\n    - What is Machine Learning?\n    - What are the two main categories of Machine Learning?\n    - What are some examples of Machine Learning?\n    - How does Machine Learning \"work\"?\n\n2. Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook ([video](https://www.youtube.com/watch?v=IsXXlYVBt1M\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=2), [notebook](02_machine_learning_setup.ipynb))\n    - What are the benefits and drawbacks of scikit-learn?\n    - How do I install scikit-learn?\n    - How do I use the Jupyter Notebook?\n    - What are some good resources for learning Python?\n\n3. Getting started in scikit-learn with the famous iris dataset ([video](https://www.youtube.com/watch?v=hd1W4CyPX58\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=3), [notebook](03_getting_started_with_iris.ipynb))\n    - What is the famous iris dataset, and how does it relate to Machine Learning?\n    - How do we load the iris dataset into scikit-learn?\n    - How do we describe a dataset using Machine Learning terminology?\n    - What are scikit-learn's four key requirements for working with data?\n\n4. Training a Machine Learning model with scikit-learn ([video](https://www.youtube.com/watch?v=RlQuVL6-qe8\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=4), [notebook](04_model_training.ipynb))\n    - What is the K-nearest neighbors classification model?\n    - What are the four steps for model training and prediction in scikit-learn?\n    - How can I apply this pattern to other Machine Learning models?\n\n5. Comparing Machine Learning models in scikit-learn ([video](https://www.youtube.com/watch?v=0pP4EwWJgIU\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=5), [notebook](05_model_evaluation.ipynb))\n    - How do I choose which model to use for my supervised learning task?\n    - How do I choose the best tuning parameters for that model?\n    - How do I estimate the likely performance of my model on out-of-sample data?\n\n6. Data science pipeline: pandas, seaborn, scikit-learn ([video](https://www.youtube.com/watch?v=3ZWuPVWq7p4\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=6), [notebook](06_linear_regression.ipynb))\n    - How do I use the pandas library to read data into Python?\n    - How do I use the seaborn library to visualize data?\n    - What is linear regression, and how does it work?\n    - How do I train and interpret a linear regression model in scikit-learn?\n    - What are some evaluation metrics for regression problems?\n    - How do I choose which features to include in my model?\n\n7. Cross-validation for parameter tuning, model selection, and feature selection ([video](https://www.youtube.com/watch?v=6dbrR-WymjI\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=7), [notebook](07_cross_validation.ipynb))\n    - What is the drawback of using the train/test split procedure for model evaluation?\n    - How does K-fold cross-validation overcome this limitation?\n    - How can cross-validation be used for selecting tuning parameters, choosing between models, and selecting features?\n    - What are some possible improvements to cross-validation?\n\n8. Efficiently searching for optimal tuning parameters ([video](https://www.youtube.com/watch?v=Gol_qOgRqfA\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=8), [notebook](08_grid_search.ipynb))\n    - How can K-fold cross-validation be used to search for an optimal tuning parameter?\n    - How can this process be made more efficient?\n    - How do you search for multiple tuning parameters at once?\n    - What do you do with those tuning parameters before making real predictions?\n    - How can the computational expense of this process be reduced?\n\n9. Evaluating a classification model ([video](https://www.youtube.com/watch?v=85dtiMz9tSo\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=9), [notebook](09_classification_metrics.ipynb))\n    - What is the purpose of model evaluation, and what are some common evaluation procedures?\n    - What is the usage of classification accuracy, and what are its limitations?\n    - How does a confusion matrix describe the performance of a classifier?\n    - What metrics can be computed from a confusion matrix?\n    - How can you adjust classifier performance by changing the classification threshold?\n    - What is the purpose of an ROC curve?\n    - How does Area Under the Curve (AUC) differ from classification accuracy?\n\n10. Building a Machine Learning workflow ([video](https://www.youtube.com/watch?v=irHhDMbw3xo\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=10), [notebook](10_categorical_features.ipynb))\n    - Why should you use a Pipeline?\n    - How do you encode categorical features with OneHotEncoder?\n    - How do you apply OneHotEncoder to selected columns with ColumnTransformer?\n    - How do you build and cross-validate a Pipeline?\n    - How do you make predictions on new data using a Pipeline?\n    - Why should you use scikit-learn (rather than pandas) for preprocessing?\n\n## Bonus Video\n\nAt the PyCon 2016 conference, I taught a **3-hour tutorial** that builds upon this video series and focuses on **text-based data**. You can watch the [tutorial video](https://www.youtube.com/watch?v=ZiKMIuYidY0\u0026list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A\u0026index=11) on YouTube.\n\nHere are the topics I covered:\n\n1. Model building in scikit-learn (refresher)\n2. Representing text as numerical data\n3. Reading a text-based dataset into pandas\n4. Vectorizing our dataset\n5. Building and evaluating a model\n6. Comparing models\n7. Examining a model for further insight\n8. Practicing this workflow on another dataset\n9. Tuning the vectorizer (discussion)\n\nVisit this [GitHub repository](https://github.com/justmarkham/pycon-2016-tutorial) to access the tutorial notebooks and many other recommended resources.\n","funding_links":["https://patreon.com/dataschool"],"categories":["AI, ML, DL","Jupyter Notebook","Python","Machine learning"],"sub_categories":["General-Purpose Machine Learning"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustmarkham%2Fscikit-learn-videos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjustmarkham%2Fscikit-learn-videos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustmarkham%2Fscikit-learn-videos/lists"}