https://github.com/ricardorobledo/next_level_data_science
https://github.com/ricardorobledo/next_level_data_science
matplotlib numpy pandas python3 scikit-learn
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/ricardorobledo/next_level_data_science
- Owner: RicardoRobledo
- Created: 2025-07-18T03:47:49.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-07-18T03:49:42.000Z (7 months ago)
- Last Synced: 2025-07-18T07:31:25.239Z (7 months ago)
- Topics: matplotlib, numpy, pandas, python3, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 975 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Next Level Data Science Notebook
This notebook is based on the Next Level Data Science book from [machinelearningmastery.com](https://machinelearningmastery.com/) and covers advanced data science concepts with a focus on regression, feature engineering, and tree-based models.
## Key Topics Covered
### I Exploring Data with Regression Models
- Supervised learning: classification vs. regression
- Train-test split vs. cross-validation techniques
- Sequential feature selection for housing price prediction
- Managing model complexity with numeric features
- One-hot encoding for categorical data
- Interpreting coefficients in linear regression models
- Polynomial regression for capturing non-linear relationships
### II Skills for Better Modeling
- Building and using pipelines for data transformation
- Detecting and overcoming perfect multicollinearity using Lasso regression
- Feature scaling and optimizing penalized regression models
- Comparative guide to imputation techniques: simple, iterative, and kNN imputation
### III Tree-Based Models in Data Science
- Overview of tree-based regression models with visualization
- Practical use of ordinal encoding with decision trees
- Ensemble methods: bagging, random forests, and gradient boosting
- Handling missing data and categorical features with XGBoost
- Exploring LightGBM with leaf-wise growth strategies
- Building robust home price prediction systems with CatBoost
### IV The Data Science Mindset
- Planning data science projects: understanding data and defining goals
- Feature selection and engineering for robust models
- Evaluating and interpreting model performance
- Communicating results through hypothesis testing and data narratives
This notebook provides a comprehensive pathway for advancing data science skills, especially in regression and tree-based modeling, emphasizing practical implementation and model interpretation.