An open API service indexing awesome lists of open source software.

https://github.com/mch-fauzy/data-science

Repository containing portfolio of data science and machine learning projects. Presented in the form of iPython Notebooks
https://github.com/mch-fauzy/data-science

data-analysis data-science data-visualization ipython-notebooks machine-learning natural-language-processing portfolio

Last synced: about 1 month ago
JSON representation

Repository containing portfolio of data science and machine learning projects. Presented in the form of iPython Notebooks

Awesome Lists containing this project

README

          

# **Data Science Handbook**
Repository containing portfolio of Data Science and Machine Learning projects.

It is presented in the form of iPython Notebooks and PDF.

## **Notes**

### Fundamentals
| No | Notebook | Description |
|:---:|:---:|:---:|
| 1 | [NumPy Overview](https://colab.research.google.com/drive/1YRWMpcxp5iU-Mb0wYRuLc2D2VOxQMl2X?usp=sharing) | Overview of how to use numpy |
| 2 | [Pandas Overview](https://colab.research.google.com/drive/1HPGrvVt8_PKE5EULP5QfzFPwKJUQaHnl?usp=sharing) | Overview of how to use pandas |
| 3 | [Matplotlib Overview](https://colab.research.google.com/drive/1b9uwlxBeYkqujx9HvmXrC94AmbUaNGuJ?usp=sharing) | Overview how to use matplotlib data visualization|
| 4 | [Seaborn Overview](https://colab.research.google.com/drive/16b9piRc-L8xl9LAbnVqQEyEWJFz2nK0M?usp=sharing) | Overview of how to use seaborn data visualization |

### EDA - Data Preparation and Preprocessing
| No | Notebook | Description |
|:---:|:---:|:---:|
| 1 | [Feature Engineering: Variable Types & Characteristics](https://colab.research.google.com/drive/1EvJm2lAtO4y2HWjYaTjGBgCj5SnZpJ-E?usp=sharing) | Collections of variables type and characteristics, such as MNAR, MCAR, MAR, cardinality, distributions, linear model assumptions, outliers, and variable magnitude |
| 2 | [Feature Engineering: Univariate Missing Data Imputation](https://colab.research.google.com/drive/1IY3DdzPE5rlJWBfrxkBwGCH_TfrVOzxo?usp=sharing) | Collections of univariate missing data imputation technique, such as mean median mode, aribitrary, end of distribution, random sample, and many more |
| 3 | [Feature Engineering: Multivariate Missing Data Imputation](https://colab.research.google.com/drive/13P_R6Bn5n38vbxjSbXeQGef1xe_14jz1?usp=sharing) | KNN and MICE multivariate missing data imputation |
| 4 | [Feature Engineering: Categorical Encoding](https://colab.research.google.com/drive/1xWjH3ZsfDdefdygz6lwL8luK0MeyTTtL?usp=sharing) | Collection of categorical encoding techniques, such as rare label encoding, one hot encoding, woe encoding, and other monotonic relationship encoding |
| 5 | [Feature Engineering: Variable Transformation](https://colab.research.google.com/drive/13v0lvNMU9kU-5IzyvYo1XKXERIQmKCEB?usp=sharing) | Collection of variable transformation techniques to transform non-gaussian distribution for linear model, such as log transformer, box-cox transformer, yeo-johnson transformer |
| 6 | [Feature Engineering: Discretization](https://colab.research.google.com/drive/1Lw999xtz6yEkKhjF20RWB4F3K_IiblVR?usp=sharing) | Collection of discretization methods, such as equal width discretization, equal frequency discretization, K-means discretization, and many more |
| 7 | [Feature Selection: Filter Methods](https://colab.research.google.com/drive/1x-jmUbMIcQSXA4TUu1E2musdy9rf669t?usp=sharing) | Collection of feature selection filter methods, such as constant, quasi-constant, duplicated features pair, multi-collinearity, mutual information, ANOVA, and many more |

### Modelling and Analysis
| No | Notebook | Report | Dasbhoard | Description |
|:---:|:---:|:---:|:---:|:---:|
| 1 | [E-Commerce Sales Performance and Customer RFM Behavior Analysis](https://drive.google.com/file/d/1o1xw8RVmyya1EXCEEAY_jp8e8ltm1M7X/view?usp=share_link) | [PDF](https://drive.google.com/file/d/1n8EE7ny4KHZzYr_D2jc0I_ZlZQnOI-Kb/view?usp=share_link) | [Tableau Dashboard Story](https://public.tableau.com/views/PakistanE-CommerceSalesPerformanceandCustomerRFMBehaviour/DashboardStory?:language=en-US&:display_count=n&:origin=viz_share_link) | E-Commerce companies want to know sales performance and customer behavior. This analysis goals are to understand customer behavior and what recommendations can be made to increase sales and customer satisfaction|
| 2 | [Credit Default Risk_Home Credit_Light GBM](https://drive.google.com/file/d/1MYM8XTz-SFDcz27IkZlyuS7wOJ980g1R/view?usp=sharing) | [PDF](https://drive.google.com/file/d/1aU826Opix76xGCgEUkqe7myB2cGGaWSm/view?usp=sharing) |-| Credit Default Risk classification and Debtors Grading with SHAP model explainability using Light GBM|
| 3 | [Book Recommendation System_Content and Item-based Collaborative Filtering](https://colab.research.google.com/drive/11qaT_C3FFN3symuzyTYOE48Cq50BNC_d?usp=sharing) | [PDF](https://drive.google.com/file/d/1gsjt_2edyhbof2Lh6h093KeVp0RP-np_/view?usp=sharing) |-| Build a book recommendation system to help users choose their books based on the books they have purchased|
| 4 | [Article Topic Classification_Kumparan_Light GBM](https://drive.google.com/file/d/1ybXWQpMZVKzLur2fqz35h9Ta7gtyU9kz/view?usp=sharing) | [PDF](https://drive.google.com/file/d/1gYbTRkt3xad5pGgHwz0p4Q4ToFOcoW_l/view?usp=sharing) |-| Build a model to classify article topics based on their content using TF-IDF vectorization|
| 5 | [Airplane Passengers_SARIMA Forecasting](https://drive.google.com/file/d/1hl33KG6d7VRllXVqJCOe1m1wTo049I2v/view?usp=sharing) | - | - | Number of plane passengers seasonal forecasting using Walk-Forward Validation|
| 6 | [Sales Advertising_Linear Regression](https://colab.research.google.com/drive/1NXfY9ZrG4B0MeOeiOS2lSR1M3kU_Tdgi?usp=sharing) | - | - | Sales prediction based on advertising amount |