https://github.com/dformoso/sklearn-classification
  
  
    Data Science Notebook on a Classification Task, using sklearn and Tensorflow. 
    https://github.com/dformoso/sklearn-classification
  
classification-task data docker jupyter learning machine machine-learning notebook roc roc-curve science sklearn tensorflow
        Last synced: 7 months ago 
        JSON representation
    
Data Science Notebook on a Classification Task, using sklearn and Tensorflow.
- Host: GitHub
- URL: https://github.com/dformoso/sklearn-classification
- Owner: dformoso
- License: gpl-3.0
- Created: 2017-08-12T05:05:33.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2021-12-21T05:36:30.000Z (almost 4 years ago)
- Last Synced: 2025-03-28T11:09:01.539Z (7 months ago)
- Topics: classification-task, data, docker, jupyter, learning, machine, machine-learning, notebook, roc, roc-curve, science, sklearn, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 10.8 MB
- Stars: 690
- Watchers: 41
- Forks: 233
- Open Issues: 7
- 
            Metadata Files:
            - Readme: README.md
- License: LICENSE
 
Awesome Lists containing this project
README
          # Census Income Dataset Classification
Data Science Notebook on a Classification Task
## Objective
In the Jupyter Notebook included in this page, we will using the Census Income Dataset to predict whether an individual's income exceeds $50K/yr based on census data.
The Dataset can be found here:
- https://archive.ics.uci.edu/ml/datasets/adult
The Notebook can be found here:
- https://github.com/dformoso/sklearn-classification/blob/master/Data%20Science%20Workbook%20-%20Census%20Income%20Dataset.ipynb
## Companion Mindmap/Cheatsheet
This Jupyter Notepad has a companion Mindmap/Cheatsheet that lists most of the Data Science steps that can be found at the following link:
- https://github.com/dformoso/machine-learning-mindmap
## Steps
In this Notebook, we'll perform:
- Feature Exploration (Uni and Bi-variate)
- Feature Imputation
- Feature Selection
- Feature Encoding
- Feature Ranking
- Machine Learning with sklearn and Tensorflow
- Random Search
- Accuracy, Precision, Recall, and f1 calculations
- ROC Curve
## Setup
This Notebook has been designed to be run on top of the Jupyter Tensorflow Docker instance found in the link below:
- https://github.com/jupyter/docker-stacks/tree/master/tensorflow-notebook
If you haven't downloaded Docker at this point, please visit:
- https://www.docker.com/get-docker
Then, open a shell or terminal session and copy/paste the following:
```shell
docker run -itd \
  --restart always \
  --name jupyter \
  --hostname jupyter \
  -p 8888:8888 \
  -p 6006:6006 \
  jupyter/tensorflow-notebook:latest \
  start-notebook.sh --NotebookApp.token=''
```
Upon running the command, docker will automatically pull the images it needs and get the containers going for us.
Give it a minute or so for Jupyter to start, and head to the following URL: http://localhost:8888
You should now have Jupyter running. If after a minute you can't reach the URL, check that the containers are running correctly and the network has been created by typing:
```shell
### Check the containers are running
docker ps -a
```
## Loading the Notebook
Download it from this link:
- https://github.com/dformoso/sklearn-classification/blob/master/Data%20Science%20Workbook%20-%20Census%20Income%20Dataset.ipynb
Go back to:
- http://localhost:8888, load your Notebook into Jupyter and run it. That's it!
## Troubleshooting Docker
Here's a few useful commands in case something goes wrong with your docker instance:
```shell
# Restart Jupyter Docker Container
docker restart jupyter
# Stop Jupyter Docker Container
docker stop jupyter
# Remove Jupyter Docker Container
docker rm jupyter
```
Feature Exploration (Uni and Bi-variate)
Feature Imputation
Feature Selection
Feature Encoding
Feature Ranking
Machine Learning Training
Random Search
Accuracy, Precision, Recall, and f1 calculations
ROC Curve
## Screenshots
### Feature Distribution Analysis

### Feature Cleaning

### Missing Values is Features

### Bivariate Exploration


### Feature Correlation

### Feature Importance

### Feature PCA

### Results from Machine Learning Algorithms

### ROC for each Algorithm

## About Me
Twitter:
- https://twitter.com/danielmartinezf
Linkedin:
- https://www.linkedin.com/in/danielmartinezformoso/
Email:
- daniel.martinez.formoso@gmail.com