Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wwbrannon/rpy_example

An example of data cleaning and supervised learning in R and Python, for comparison
https://github.com/wwbrannon/rpy_example

Last synced: 9 days ago
JSON representation

An example of data cleaning and supervised learning in R and Python, for comparison

Awesome Lists containing this project

README

        

I don't (as of the start of this project) know how to use scikit-learn or pandas
well at all. Let's fix that.

Here's the setup: I have a dataset from a 1994 Census survey of individuals, with
demographic and economic traits, and a binary indicator for whether the individual made
more than $50,000 a year. I've taken this dataset, explored and cleaned it, and
set up a model bake-off to see how well I can predict the high-income indicator.
I'll conduct the same analysis a) in R, b) with pandas / scikit-learn in Python.

This is much more about me picking up the scikit-learn API than it is about good
predictive performance :-)

## Measures of performance:
- inspecting a confusion matrix
- the ROC curve, plotted
- the area under the ROC curve

## TODO:
- the R should use caret
- the python version