{"id":23918129,"url":"https://github.com/neemiasbsilva/case-study-data-science","last_synced_at":"2025-04-11T19:21:24.862Z","repository":{"id":111466794,"uuid":"527663587","full_name":"neemiasbsilva/case-study-data-science","owner":"neemiasbsilva","description":"Welcome to some case study of data science projects - (Personal Projects).","archived":false,"fork":false,"pushed_at":"2025-01-15T20:39:52.000Z","size":11644,"stargazers_count":16,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T15:14:20.960Z","etag":null,"topics":["anomaly-detection","case-study-data-science","census-income","churn-prediction","data-science","data-science-projects","decision-tree","healthcare","house-price-prediction","logistic-regression","machine-learning","pyspark","pyspark-mllib","rag-chatbot","spaceship-titanic"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neemiasbsilva.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-22T17:20:17.000Z","updated_at":"2025-01-29T23:27:56.000Z","dependencies_parsed_at":"2023-03-22T11:04:51.989Z","dependency_job_id":null,"html_url":"https://github.com/neemiasbsilva/case-study-data-science","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neemiasbsilva%2Fcase-study-data-science","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neemiasbsilva%2Fcase-study-data-science/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neemiasbsilva%2Fcase-study-data-science/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neemiasbsilva%2Fcase-study-data-science/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neemiasbsilva","download_url":"https://codeload.github.com/neemiasbsilva/case-study-data-science/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248465347,"owners_count":21108244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","case-study-data-science","census-income","churn-prediction","data-science","data-science-projects","decision-tree","healthcare","house-price-prediction","logistic-regression","machine-learning","pyspark","pyspark-mllib","rag-chatbot","spaceship-titanic"],"created_at":"2025-01-05T13:12:32.222Z","updated_at":"2025-04-11T19:21:24.857Z","avatar_url":"https://github.com/neemiasbsilva.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Case Studies: Data Science Projects \n![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge\u0026logo=numpy\u0026logoColor=white)\n![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge\u0026logo=pandas\u0026logoColor=white)\n![Plotly](https://img.shields.io/badge/Plotly-%233F4F75.svg?style=for-the-badge\u0026logo=plotly\u0026logoColor=white)\n![scikit-learn](https://img.shields.io/badge/scikit--learn-%23F7931E.svg?style=for-the-badge\u0026logo=scikit-learn\u0026logoColor=white)\n![SciPy](https://img.shields.io/badge/SciPy-%230C55A5.svg?style=for-the-badge\u0026logo=scipy\u0026logoColor=%white)\n![Keras](https://img.shields.io/badge/Keras-%23D00000.svg?style=for-the-badge\u0026logo=Keras\u0026logoColor=white)\n![TensorFlow](https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?style=for-the-badge\u0026logo=TensorFlow\u0026logoColor=white)\n![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge\u0026logo=PyTorch\u0026logoColor=white)\n\n## Table of Content\n\n- [About](#about);\n- [Artificial Neural Network Approach in Churn Analysis using PyTorch](#artificial-neural-network-approach-in-churn-analysis-using-pytorch);\n- [House Price Prediction](#house-price-prediction);\n- [Healthcare Stroke Using PySpark](#healthcare-stroke-using-pyspark).\n\n## About\n\nWelcome to some case studies of data science projects - (Personal Projects). The goal of this respository is to show some projects I developed in my career over the years.\n\n## Artificial Neural Network Approach in Churn Analysis using PyTorch\n\nChurn analytics is the process of measure the rate at which customers will quit the company (or producty). In this case tutorial we'll use a customer bank\ndataset for estimate the churn rate. For understand better this case study, please check the follow [link](https://github.com/neemiasbsilva/case-study-data-science/tree/main/churn_analysis).\n\n- Exploratory Data Analysis - [EDA](https://github.com/neemiasbsilva/case-study-data-science/blob/main/churn_analysis/data_analysis.ipynb);\n- [Data Visualization](https://github.com/neemiasbsilva/case-study-data-science/blob/main/churn_analysis/data_analysis.ipynb);\n- [Preprocessing](https://github.com/neemiasbsilva/case-study-data-science/blob/main/churn_analysis/preprocessing.ipynb);\n- [ANN model](https://github.com/neemiasbsilva/case-study-data-science/blob/main/churn_analysis/ann_pytorch_model.ipynb);\n\n## House Price Prediction\n\nHouse Price Prediction is a case study based on the \"Hands-on of Machine Learning book\" and has the goal for create a Machine Learning model for predicted the correct price for a particular house.. For undestand better this case study, please check the follow [link](https://github.com/neemiasbsilva/case-study-data-science/tree/main/house_price_prediction).\n\n- Exploratory Data Analysis - [EDA](https://github.com/neemiasbsilva/case-study-data-science/blob/main/house_price_prediction/end_to_end_ml_project_regression.ipynb);\n- [Preprocessing](https://github.com/neemiasbsilva/case-study-data-science/blob/main/house_price_prediction/end_to_end_ml_project_regression.ipynb);\n- [Linear Regression](https://github.com/neemiasbsilva/case-study-data-science/blob/main/house_price_prediction/end_to_end_ml_project_regression.ipynb); - [Decision Tree](https://github.com/neemiasbsilva/case-study-data-science/blob/main/house_price_prediction/end_to_end_ml_project_regression.ipynb); \n- [Random Forest](https://github.com/neemiasbsilva/case-study-data-science/blob/main/house_price_prediction/end_to_end_ml_project_regression.ipynb); \n- [Suport Vector Machine](https://github.com/neemiasbsilva/case-study-data-science/blob/main/house_price_prediction/end_to_end_ml_project_regression.ipynb);\n- [ANN](https://github.com/neemiasbsilva/case-study-data-science/blob/main/house_price_prediction/end_to_end_ml_project_regression.ipynb).\n\n## Healthcare Stroke Using PySpark\n\nThis study case has the purpose for implementing a classification using PySpark for estimate the probability of smoke and not smoke. To understando more about the features and the case study, please check this [link](https://github.com/neemiasbsilva/case-study-data-science/tree/main/data_analysis_using_pyspark)\n\n- [PySpark Configuration](https://github.com/neemiasbsilva/case-study-data-science/blob/main/data_analysis_using_pyspark/data_analysis.ipynb);\n- [Exploratory Data Analysis](https://github.com/neemiasbsilva/case-study-data-science/blob/main/data_analysis_using_pyspark/data_analysis.ipynb);\n- [Data Visualization](https://github.com/neemiasbsilva/case-study-data-science/blob/main/data_analysis_using_pyspark/data_analysis.ipynb);\n- [Data Preprocessing](https://github.com/neemiasbsilva/case-study-data-science/blob/main/data_analysis_using_pyspark/healthcare_logistic_regression_pyspark.ipynb);\n- [Feature Engineer](https://github.com/neemiasbsilva/case-study-data-science/blob/main/data_analysis_using_pyspark/healthcare_logistic_regression_pyspark.ipynb);\n- [Model Selection](https://github.com/neemiasbsilva/case-study-data-science/blob/main/data_analysis_using_pyspark/healthcare_logistic_regression_pyspark.ipynb);\n- [Evaluation](https://github.com/neemiasbsilva/case-study-data-science/blob/main/data_analysis_using_pyspark/healthcare_logistic_regression_pyspark.ipynb).\n\n## Counting and Aggregating M\u0026Ms using PySpark\n\nThis case study has the puporse for use Pyspark for counting and aggregating M\u0026Ms. For check the implementation, please check this [link](https://github.com/neemiasbsilva/case-study-data-science/tree/main/counting-and-aggregating-m%26ms-pyspark).\n\n- [Build a Spark Session](https://github.com/neemiasbsilva/case-study-data-science/blob/main/counting-and-aggregating-m%26ms-pyspark/counting_aggregating_m%26ms.ipynb);\n- [Load M\u0026M Dataset](https://github.com/neemiasbsilva/case-study-data-science/blob/main/counting-and-aggregating-m%26ms-pyspark/counting_aggregating_m%26ms.ipynb);\n- [Group Each State and Color and Ordering in Descending Order](https://github.com/neemiasbsilva/case-study-data-science/blob/main/counting-and-aggregating-m%26ms-pyspark/counting_aggregating_m%26ms.ipynb);\n- [Aggregate for a Particular State](https://github.com/neemiasbsilva/case-study-data-science/blob/main/counting-and-aggregating-m%26ms-pyspark/counting_aggregating_m%26ms.ipynb).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneemiasbsilva%2Fcase-study-data-science","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneemiasbsilva%2Fcase-study-data-science","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneemiasbsilva%2Fcase-study-data-science/lists"}