Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dformoso/sklearn-classification
Data Science Notebook on a Classification Task, using sklearn and Tensorflow.
https://github.com/dformoso/sklearn-classification
classification-task data docker jupyter learning machine machine-learning notebook roc roc-curve science sklearn tensorflow
Last synced: 26 days ago
JSON representation
Data Science Notebook on a Classification Task, using sklearn and Tensorflow.
- Host: GitHub
- URL: https://github.com/dformoso/sklearn-classification
- Owner: dformoso
- License: gpl-3.0
- Created: 2017-08-12T05:05:33.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2021-12-21T05:36:30.000Z (almost 3 years ago)
- Last Synced: 2024-09-29T16:20:57.145Z (about 1 month ago)
- Topics: classification-task, data, docker, jupyter, learning, machine, machine-learning, notebook, roc, roc-curve, science, sklearn, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 10.8 MB
- Stars: 691
- Watchers: 42
- Forks: 232
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Census Income Dataset Classification
Data Science Notebook on a Classification Task## Objective
In the Jupyter Notebook included in this page, we will using the Census Income Dataset to predict whether an individual's income exceeds $50K/yr based on census data.The Dataset can be found here:
- https://archive.ics.uci.edu/ml/datasets/adultThe Notebook can be found here:
- https://github.com/dformoso/sklearn-classification/blob/master/Data%20Science%20Workbook%20-%20Census%20Income%20Dataset.ipynb## Companion Mindmap/Cheatsheet
This Jupyter Notepad has a companion Mindmap/Cheatsheet that lists most of the Data Science steps that can be found at the following link:
- https://github.com/dformoso/machine-learning-mindmap## Steps
In this Notebook, we'll perform:- Feature Exploration (Uni and Bi-variate)
- Feature Imputation
- Feature Selection
- Feature Encoding
- Feature Ranking
- Machine Learning with sklearn and Tensorflow
- Random Search
- Accuracy, Precision, Recall, and f1 calculations
- ROC Curve## Setup
This Notebook has been designed to be run on top of the Jupyter Tensorflow Docker instance found in the link below:
- https://github.com/jupyter/docker-stacks/tree/master/tensorflow-notebookIf you haven't downloaded Docker at this point, please visit:
- https://www.docker.com/get-dockerThen, open a shell or terminal session and copy/paste the following:
```shell
docker run -itd \
--restart always \
--name jupyter \
--hostname jupyter \
-p 8888:8888 \
-p 6006:6006 \
jupyter/tensorflow-notebook:latest \
start-notebook.sh --NotebookApp.token=''
```Upon running the command, docker will automatically pull the images it needs and get the containers going for us.
Give it a minute or so for Jupyter to start, and head to the following URL: http://localhost:8888
You should now have Jupyter running. If after a minute you can't reach the URL, check that the containers are running correctly and the network has been created by typing:
```shell
### Check the containers are running
docker ps -a
```
## Loading the Notebook
Download it from this link:
- https://github.com/dformoso/sklearn-classification/blob/master/Data%20Science%20Workbook%20-%20Census%20Income%20Dataset.ipynbGo back to:
- http://localhost:8888, load your Notebook into Jupyter and run it. That's it!## Troubleshooting Docker
Here's a few useful commands in case something goes wrong with your docker instance:```shell
# Restart Jupyter Docker Container
docker restart jupyter# Stop Jupyter Docker Container
docker stop jupyter# Remove Jupyter Docker Container
docker rm jupyter
```Feature Exploration (Uni and Bi-variate)
Feature Imputation
Feature Selection
Feature Encoding
Feature Ranking
Machine Learning Training
Random Search
Accuracy, Precision, Recall, and f1 calculations
ROC Curve## Screenshots
### Feature Distribution Analysis
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/distribution.png)### Feature Cleaning
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/cleaning.png)### Missing Values is Features
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/missing.png)### Bivariate Exploration
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/bivariate1.png)
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/bivariate2.png)### Feature Correlation
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/correlation.png)### Feature Importance
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/importance.png)### Feature PCA
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/pca.png)### Results from Machine Learning Algorithms
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/results.png)### ROC for each Algorithm
![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/analysis.png)
## About Me
Twitter:
- https://twitter.com/danielmartinezfLinkedin:
- https://www.linkedin.com/in/danielmartinezformoso/Email:
- [email protected]