An open API service indexing awesome lists of open source software.

https://github.com/ricardorobledo/next_level_data_science


https://github.com/ricardorobledo/next_level_data_science

matplotlib numpy pandas python3 scikit-learn

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

          

# Next Level Data Science Notebook

This notebook is based on the Next Level Data Science book from [machinelearningmastery.com](https://machinelearningmastery.com/) and covers advanced data science concepts with a focus on regression, feature engineering, and tree-based models.

## Key Topics Covered

### I Exploring Data with Regression Models
- Supervised learning: classification vs. regression
- Train-test split vs. cross-validation techniques
- Sequential feature selection for housing price prediction
- Managing model complexity with numeric features
- One-hot encoding for categorical data
- Interpreting coefficients in linear regression models
- Polynomial regression for capturing non-linear relationships

### II Skills for Better Modeling
- Building and using pipelines for data transformation
- Detecting and overcoming perfect multicollinearity using Lasso regression
- Feature scaling and optimizing penalized regression models
- Comparative guide to imputation techniques: simple, iterative, and kNN imputation

### III Tree-Based Models in Data Science
- Overview of tree-based regression models with visualization
- Practical use of ordinal encoding with decision trees
- Ensemble methods: bagging, random forests, and gradient boosting
- Handling missing data and categorical features with XGBoost
- Exploring LightGBM with leaf-wise growth strategies
- Building robust home price prediction systems with CatBoost

### IV The Data Science Mindset
- Planning data science projects: understanding data and defining goals
- Feature selection and engineering for robust models
- Evaluating and interpreting model performance
- Communicating results through hypothesis testing and data narratives

This notebook provides a comprehensive pathway for advancing data science skills, especially in regression and tree-based modeling, emphasizing practical implementation and model interpretation.