Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/khuyentran1401/suicide-rates

data-analysis data-science kaggle machine-learning python

Last synced: 1 day ago
JSON representation

Host: GitHub
URL: https://github.com/khuyentran1401/suicide-rates
Owner: khuyentran1401
Created: 2019-11-26T13:52:24.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-04-06T15:25:42.000Z (almost 5 years ago)
Last Synced: 2024-11-26T11:50:07.668Z (2 months ago)
Topics: data-analysis, data-science, kaggle, machine-learning, python
Language: Jupyter Notebook
Size: 805 KB
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# About this Project

Find the predictors of suicide among different countries, years, sex, generation, and age group.

# Data

The data is retrieved from [Kaggle Suicide Rates Overview 1985 to 2016](https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016)

# Notebook
Access the notebook of this project [here](https://github.com/khuyentran1401/Suicide-rates/blob/master/suicide-rates-1985-2016.ipynb)

# Tools

* Numpy, Pandas
* Seaborn
* Scikit-learn

# Steps

1. Import and explore data
1. Preprocess data
* Split traing and test set with `sklearn.model_selection.train_test_split`
* Shuffle the data to increase randomness with `sklearn.model_selection.StratifiedShuffleSplit`
* Merge map data to existing data
* Drop the unimportant features
* Transform categorical data with `sklearn.preprocessing.LabelEncoder`
* Impute missing data with `sklearn.preprocessing.Imputer`
* Scale data with `sklearn.preprocessing.StandardScaler`
1. Visualization
* Countplot to find the distribution of data
* Heatmap and pairplot to find the correlation between features
* Geopandas to find the distribution of suicide rates accross countries in the world

![image](https://github.com/khuyentran1401/Suicide-rates/blob/master/images/Screenshot%202020-04-06%2010.20.56.png)
1. Metrics:
* Mean squared error
* Cross validation score
1. Model Training
* Linear Regression
* Decision Tree Regressor
* Random Forest Regressor
* Grid Search Cross Validation
* Randomized Search CV

![image](https://github.com/khuyentran1401/Suicide-rates/blob/master/images/Screenshot%202020-04-06%2010.13.21.png?raw=true)

_Compare hyperparameters for Randomized Search CV_