Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jonad/finding_donors
Predicting income with UCI Census Income Dataset using supervised machine learning algorithms
https://github.com/jonad/finding_donors
numpy pandas scikit-learn scikitlearn-machine-learning
Last synced: about 1 month ago
JSON representation
Predicting income with UCI Census Income Dataset using supervised machine learning algorithms
- Host: GitHub
- URL: https://github.com/jonad/finding_donors
- Owner: jonad
- License: mit
- Created: 2017-01-26T23:48:22.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-11-28T22:45:14.000Z (about 7 years ago)
- Last Synced: 2024-11-07T06:27:59.544Z (3 months ago)
- Topics: numpy, pandas, scikit-learn, scikitlearn-machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 757 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Supervised Learning
## Project: Finding Donors for CharityML### Install
This project requires **Python 2.7** and the following Python libraries installed:
- [NumPy](http://www.numpy.org/)
- [Pandas](http://pandas.pydata.org)
- [matplotlib](http://matplotlib.org/)
- [scikit-learn](http://scikit-learn.org/stable/)You will also need to have software installed to run and execute an [iPython Notebook](http://ipython.org/notebook.html)
We recommend students install [Anaconda](https://www.continuum.io/downloads), a pre-packaged Python distribution that contains all of the necessary libraries and software for this project.
### Run
In a terminal or command window, navigate to the top-level project directory `finding_donors/` (that contains this README) and run one of the following commands:
```bash
ipython notebook finding_donors.ipynb
```
or
```bash
jupyter notebook finding_donors.ipynb
```
### DataThe modified census dataset consists of approximately 32,000 data points, with each datapoint having 13 features. This dataset is a modified version of the dataset published in the paper *"Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid",* by Ron Kohavi. You may find this paper [online](https://www.aaai.org/Papers/KDD/1996/KDD96-033.pdf), with the original dataset hosted on [UCI](https://archive.ics.uci.edu/ml/datasets/Census+Income).
**Features**
- `age`: Age
- `workclass`: Working Class (Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked)
- `education_level`: Level of Education (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool)
- `education-num`: Number of educational years completed
- `marital-status`: Marital status (Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse)
- `occupation`: Work Occupation (Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces)
- `relationship`: Relationship Status (Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried)
- `race`: Race (White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black)
- `sex`: Sex (Female, Male)
- `capital-gain`: Monetary Capital Gains
- `capital-loss`: Monetary Capital Losses
- `hours-per-week`: Average Hours Per Week Worked
- `native-country`: Native Country (United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands)**Target Variable**
- `income`: Income Class (<=50K, >50K)