https://github.com/justmarkham/scikit-learn-tips

:robot::zap: 50 scikit-learn tips
https://github.com/justmarkham/scikit-learn-tips

data-school data-science machine-learning python scikit-learn

Last synced: 3 months ago
JSON representation

:robot::zap: 50 scikit-learn tips

Host: GitHub
URL: https://github.com/justmarkham/scikit-learn-tips
Owner: justmarkham
Created: 2020-03-26T13:36:57.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-09-05T14:51:34.000Z (almost 3 years ago)
Last Synced: 2025-04-01T05:37:26.903Z (3 months ago)
Topics: data-school, data-science, machine-learning, python, scikit-learn
Language: Jupyter Notebook
Homepage: https://scikit-learn.tips
Size: 282 KB
Stars: 1,729
Watchers: 117
Forks: 435
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # 🤖⚡ scikit-learn tips

New tips are posted on [LinkedIn](https://www.linkedin.com/in/justmarkham/), [Twitter](https://twitter.com/justmarkham), and [Facebook](https://www.facebook.com/DataScienceSchool/).

👉 [Sign up to receive 2 video tips by email every week!](https://scikit-learn.tips) 👈

## List of all tips

Click  to discuss the tip on **LinkedIn**, click  to view the **Jupyter notebook** for a tip, or click  to watch the tip video on **YouTube:**

\# | Description | Links

--- | --- | ---

1 | Use `ColumnTransformer` to apply different preprocessing to different columns |   

2 | Seven ways to select columns using `ColumnTransformer` |   

3 | What is the difference between "fit" and "transform"? |  

4 | Use "fit_transform" on training data, but "transform" (only) on testing/new data |  

5 | Four reasons to use scikit-learn (not pandas) for ML preprocessing |  

6 | Encode categorical features using `OneHotEncoder` or `OrdinalEncoder` |   

7 | Handle unknown categories with `OneHotEncoder` by encoding them as zeros |   

8 | Use `Pipeline` to chain together multiple steps |   

9 | Add a missing indicator to encode "missingness" as a feature |   

10 | Set a "random_state" to make your code reproducible |   

11 | Impute missing values using `KNNImputer` or `IterativeImputer` |   

12 | What is the difference between `Pipeline` and `make_pipeline`? |   

13 | Examine the intermediate steps in a `Pipeline` |   

14 | `HistGradientBoostingClassifier` natively supports missing values |   

15 | Three reasons not to use drop='first' with `OneHotEncoder` |  

16 | Use `cross_val_score` and `GridSearchCV` on a `Pipeline` |   

17 | Try `RandomizedSearchCV` if `GridSearchCV` is taking too long |   

18 | Display `GridSearchCV` or `RandomizedSearchCV` results in a DataFrame |   

19 | Important tuning parameters for `LogisticRegression` |  

20 | Plot a confusion matrix |   

21 | Compare multiple ROC curves in a single plot |   

22 | Use the correct methods for each type of `Pipeline` |  

23 | Display the intercept and coefficients for a linear model |   

24 | Visualize a decision tree two different ways |   

25 | Prune a decision tree to avoid overfitting |   

26 | Use stratified sampling with `train_test_split` |   

27 | Two ways to impute missing values for a categorical feature |   

28 | Save a model or `Pipeline` using joblib |   

29 | Vectorize two text columns in a `ColumnTransformer` |   

30 | Four ways to examine the steps of a `Pipeline` |   

31 | Shuffle your dataset when using `cross_val_score` |   

32 | Use AUC to evaluate multiclass problems |   

33 | Use `FunctionTransformer` to convert functions into transformers |   

34 | Add feature selection to a `Pipeline` |   

35 | Don't use `.values` when passing a pandas object to scikit-learn |   

36 | Most parameters should be passed as keyword arguments |   

37 | Create an interactive diagram of a `Pipeline` in Jupyter |   

38 | Get the feature names output by a `ColumnTransformer` |   

39 | Load a toy dataset into a DataFrame |   

40 | Estimators only print parameters that have been changed |   

41 | Drop the first category from binary features (only) with `OneHotEncoder` |   

42 | Passthrough some columns and drop others in a `ColumnTransformer` |   

43 | Use `OrdinalEncoder` instead of `OneHotEncoder` with tree-based models |   

44 | Speed up `GridSearchCV` using parallel processing |   

45 | Create feature interactions using `PolynomialFeatures` |   

46 | Ensemble multiple models using `VotingClassifer` or `VotingRegressor` |   

47 | Tune the parameters of a `VotingClassifer` or `VotingRegressor` |   

48 | Access part of a `Pipeline` using slicing |   

49 | Tune multiple models simultaneously with `GridSearchCV` |   

50 | Adapt this pattern to solve many Machine Learning problems |   

You can interact with all of these notebooks online using **Binder:** 

**Note:** Some of the tips do not include any code, and can only be viewed on LinkedIn.

## Who creates these tips?

Hi! I'm Kevin Markham, the founder of [Data School](https://www.dataschool.io). I've been teaching data science in Python since 2014. I create these tips because I love using scikit-learn and I want to help others use it more effectively.

## How can I get better at scikit-learn?

I teach three courses:

- **Course 1:** [Introduction to Machine Learning in Python with scikit-learn](https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn) (4 hours, free)

- **Course 2:** [Building an Effective Machine Learning Workflow with scikit-learn](https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn) (8 hours, paid)

- **Course 3:** [Machine Learning with Text in Python](https://www.dataschool.io/learn/) (14 hours, paid)

👉 [Find out which course is right for you!](https://www.dataschool.io/ml-courses/) 👈

## Do you have any other tips?

Yes! In 2019, I posted [100 pandas tricks](https://www.dataschool.io/python-pandas-tips-and-tricks/). I also created a video featuring my [top 25 pandas tricks](https://www.dataschool.io/python-pandas-tricks/).

*© 2020-2021 [Data School](https://www.dataschool.io). All rights reserved.*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/justmarkham/scikit-learn-tips

Awesome Lists containing this project

README