{"id":13934911,"url":"https://github.com/dformoso/sklearn-classification","last_synced_at":"2025-04-04T12:08:32.574Z","repository":{"id":42187773,"uuid":"100090407","full_name":"dformoso/sklearn-classification","owner":"dformoso","description":"Data Science Notebook on a Classification Task, using sklearn and Tensorflow.","archived":false,"fork":false,"pushed_at":"2021-12-21T05:36:30.000Z","size":11328,"stargazers_count":690,"open_issues_count":7,"forks_count":233,"subscribers_count":41,"default_branch":"master","last_synced_at":"2025-03-28T11:09:01.539Z","etag":null,"topics":["classification-task","data","docker","jupyter","learning","machine","machine-learning","notebook","roc","roc-curve","science","sklearn","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dformoso.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-12T05:05:33.000Z","updated_at":"2025-03-13T01:59:48.000Z","dependencies_parsed_at":"2022-08-31T01:50:54.744Z","dependency_job_id":null,"html_url":"https://github.com/dformoso/sklearn-classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dformoso%2Fsklearn-classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dformoso%2Fsklearn-classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dformoso%2Fsklearn-classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dformoso%2Fsklearn-classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dformoso","download_url":"https://codeload.github.com/dformoso/sklearn-classification/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247174423,"owners_count":20896078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification-task","data","docker","jupyter","learning","machine","machine-learning","notebook","roc","roc-curve","science","sklearn","tensorflow"],"created_at":"2024-08-07T23:01:18.538Z","updated_at":"2025-04-04T12:08:32.559Z","avatar_url":"https://github.com/dformoso.png","language":"Jupyter Notebook","readme":"# Census Income Dataset Classification\nData Science Notebook on a Classification Task\n\n## Objective\nIn the Jupyter Notebook included in this page, we will using the Census Income Dataset to predict whether an individual's income exceeds $50K/yr based on census data.\n\nThe Dataset can be found here:\n- https://archive.ics.uci.edu/ml/datasets/adult\n\nThe Notebook can be found here:\n- https://github.com/dformoso/sklearn-classification/blob/master/Data%20Science%20Workbook%20-%20Census%20Income%20Dataset.ipynb\n\n## Companion Mindmap/Cheatsheet\nThis Jupyter Notepad has a companion Mindmap/Cheatsheet that lists most of the Data Science steps that can be found at the following link:\n- https://github.com/dformoso/machine-learning-mindmap\n\n## Steps\nIn this Notebook, we'll perform:\n\n- Feature Exploration (Uni and Bi-variate)\n- Feature Imputation\n- Feature Selection\n- Feature Encoding\n- Feature Ranking\n- Machine Learning with sklearn and Tensorflow\n- Random Search\n- Accuracy, Precision, Recall, and f1 calculations\n- ROC Curve\n\n## Setup\nThis Notebook has been designed to be run on top of the Jupyter Tensorflow Docker instance found in the link below:\n- https://github.com/jupyter/docker-stacks/tree/master/tensorflow-notebook\n\nIf you haven't downloaded Docker at this point, please visit:\n- https://www.docker.com/get-docker\n\nThen, open a shell or terminal session and copy/paste the following:\n\n```shell\ndocker run -itd \\\n  --restart always \\\n  --name jupyter \\\n  --hostname jupyter \\\n  -p 8888:8888 \\\n  -p 6006:6006 \\\n  jupyter/tensorflow-notebook:latest \\\n  start-notebook.sh --NotebookApp.token=''\n```\n\nUpon running the command, docker will automatically pull the images it needs and get the containers going for us.\n\nGive it a minute or so for Jupyter to start, and head to the following URL: http://localhost:8888\n\nYou should now have Jupyter running. If after a minute you can't reach the URL, check that the containers are running correctly and the network has been created by typing:\n\n```shell\n### Check the containers are running\ndocker ps -a\n```\n## Loading the Notebook\nDownload it from this link:\n- https://github.com/dformoso/sklearn-classification/blob/master/Data%20Science%20Workbook%20-%20Census%20Income%20Dataset.ipynb\n\nGo back to:\n- http://localhost:8888, load your Notebook into Jupyter and run it. That's it!\n\n\n## Troubleshooting Docker\nHere's a few useful commands in case something goes wrong with your docker instance:\n\n```shell\n# Restart Jupyter Docker Container\ndocker restart jupyter\n\n# Stop Jupyter Docker Container\ndocker stop jupyter\n\n# Remove Jupyter Docker Container\ndocker rm jupyter\n```\n\nFeature Exploration (Uni and Bi-variate)\nFeature Imputation\nFeature Selection\nFeature Encoding\nFeature Ranking\nMachine Learning Training\nRandom Search\nAccuracy, Precision, Recall, and f1 calculations\nROC Curve\n\n## Screenshots\n\n### Feature Distribution Analysis\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/distribution.png)\n\n### Feature Cleaning\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/cleaning.png)\n\n### Missing Values is Features\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/missing.png)\n\n### Bivariate Exploration\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/bivariate1.png)\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/bivariate2.png)\n\n### Feature Correlation\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/correlation.png)\n\n### Feature Importance\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/importance.png)\n\n### Feature PCA\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/pca.png)\n\n### Results from Machine Learning Algorithms\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/results.png)\n\n### ROC for each Algorithm\n\n![alt text](https://github.com/dformoso/sklearn-classification/blob/master/images/analysis.png)\n\n## About Me\nTwitter:\n- https://twitter.com/danielmartinezf\n\nLinkedin:\n- https://www.linkedin.com/in/danielmartinezformoso/\n\nEmail:\n- daniel.martinez.formoso@gmail.com\n","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdformoso%2Fsklearn-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdformoso%2Fsklearn-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdformoso%2Fsklearn-classification/lists"}