Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dformoso/sklearn-classification
Data Science Notebook on a Classification Task, using sklearn and Tensorflow.
https://github.com/dformoso/sklearn-classification
classification-task data docker jupyter learning machine machine-learning notebook roc roc-curve science sklearn tensorflow
Last synced: 5 days ago
JSON representation
Data Science Notebook on a Classification Task, using sklearn and Tensorflow.
- Host: GitHub
- URL: https://github.com/dformoso/sklearn-classification
- Owner: dformoso
- License: gpl-3.0
- Created: 2017-08-12T05:05:33.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2021-12-21T05:36:30.000Z (about 3 years ago)
- Last Synced: 2025-02-08T04:08:31.329Z (12 days ago)
- Topics: classification-task, data, docker, jupyter, learning, machine, machine-learning, notebook, roc, roc-curve, science, sklearn, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 10.8 MB
- Stars: 692
- Watchers: 42
- Forks: 232
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Census Income Dataset Classification
Data Science Notebook on a Classification Task## Objective
In the Jupyter Notebook included in this page, we will using the Census Income Dataset to predict whether an individual's income exceeds $50K/yr based on census data.The Dataset can be found here:
- https://archive.ics.uci.edu/ml/datasets/adultThe Notebook can be found here:
- https://github.com/dformoso/sklearn-classification/blob/master/Data%20Science%20Workbook%20-%20Census%20Income%20Dataset.ipynb## Companion Mindmap/Cheatsheet
This Jupyter Notepad has a companion Mindmap/Cheatsheet that lists most of the Data Science steps that can be found at the following link:
- https://github.com/dformoso/machine-learning-mindmap## Steps
In this Notebook, we'll perform:- Feature Exploration (Uni and Bi-variate)
- Feature Imputation
- Feature Selection
- Feature Encoding
- Feature Ranking
- Machine Learning with sklearn and Tensorflow
- Random Search
- Accuracy, Precision, Recall, and f1 calculations
- ROC Curve## Setup
This Notebook has been designed to be run on top of the Jupyter Tensorflow Docker instance found in the link below:
- https://github.com/jupyter/docker-stacks/tree/master/tensorflow-notebookIf you haven't downloaded Docker at this point, please visit:
- https://www.docker.com/get-dockerThen, open a shell or terminal session and copy/paste the following:
```shell
docker run -itd \
--restart always \
--name jupyter \
--hostname jupyter \
-p 8888:8888 \
-p 6006:6006 \
jupyter/tensorflow-notebook:latest \
start-notebook.sh --NotebookApp.token=''
```Upon running the command, docker will automatically pull the images it needs and get the containers going for us.
Give it a minute or so for Jupyter to start, and head to the following URL: http://localhost:8888
You should now have Jupyter running. If after a minute you can't reach the URL, check that the containers are running correctly and the network has been created by typing:
```shell
### Check the containers are running
docker ps -a
```
## Loading the Notebook
Download it from this link:
- https://github.com/dformoso/sklearn-classification/blob/master/Data%20Science%20Workbook%20-%20Census%20Income%20Dataset.ipynbGo back to:
- http://localhost:8888, load your Notebook into Jupyter and run it. That's it!## Troubleshooting Docker
Here's a few useful commands in case something goes wrong with your docker instance:```shell
# Restart Jupyter Docker Container
docker restart jupyter# Stop Jupyter Docker Container
docker stop jupyter# Remove Jupyter Docker Container
docker rm jupyter
```Feature Exploration (Uni and Bi-variate)
Feature Imputation
Feature Selection
Feature Encoding
Feature Ranking
Machine Learning Training
Random Search
Accuracy, Precision, Recall, and f1 calculations
ROC Curve## Screenshots
### Feature Distribution Analysis
data:image/s3,"s3://crabby-images/2e84a/2e84a3b3bdacae7879e44c0088219d2667df93fc" alt="alt text"### Feature Cleaning
data:image/s3,"s3://crabby-images/11a3f/11a3f137fbb49cde6d3083ffdbafb6993bc60a83" alt="alt text"### Missing Values is Features
data:image/s3,"s3://crabby-images/190ca/190ca390229b0db8273288df9b047439bbfb950c" alt="alt text"### Bivariate Exploration
data:image/s3,"s3://crabby-images/281a5/281a5504f4664529abf68a9a0717cbcdcba757c2" alt="alt text"
data:image/s3,"s3://crabby-images/97984/9798405de8ed085942ce26fad542d9378573154b" alt="alt text"### Feature Correlation
data:image/s3,"s3://crabby-images/aebae/aebaedbb50e8bc2dda4103fcbfba5080bfc6b1fc" alt="alt text"### Feature Importance
data:image/s3,"s3://crabby-images/06d66/06d6656fadbfbc4ce073e820558387f7b6fa4bc1" alt="alt text"### Feature PCA
data:image/s3,"s3://crabby-images/a99fa/a99fabf05b6c2177a25df25809cce23ca13eceb1" alt="alt text"### Results from Machine Learning Algorithms
data:image/s3,"s3://crabby-images/74ba1/74ba121c2b87894b226f129a195b9b0244ac61d7" alt="alt text"### ROC for each Algorithm
data:image/s3,"s3://crabby-images/d31d4/d31d4db031758b07dc9144ecf03af0b4499632c8" alt="alt text"
## About Me
Twitter:
- https://twitter.com/danielmartinezfLinkedin:
- https://www.linkedin.com/in/danielmartinezformoso/Email:
- [email protected]