Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wwbrannon/rpy_example
An example of data cleaning and supervised learning in R and Python, for comparison
https://github.com/wwbrannon/rpy_example
Last synced: 9 days ago
JSON representation
An example of data cleaning and supervised learning in R and Python, for comparison
- Host: GitHub
- URL: https://github.com/wwbrannon/rpy_example
- Owner: wwbrannon
- Created: 2015-04-14T05:07:30.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-04-17T12:52:56.000Z (over 9 years ago)
- Last Synced: 2023-08-01T07:22:29.199Z (over 1 year ago)
- Language: R
- Size: 1.34 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
I don't (as of the start of this project) know how to use scikit-learn or pandas
well at all. Let's fix that.Here's the setup: I have a dataset from a 1994 Census survey of individuals, with
demographic and economic traits, and a binary indicator for whether the individual made
more than $50,000 a year. I've taken this dataset, explored and cleaned it, and
set up a model bake-off to see how well I can predict the high-income indicator.
I'll conduct the same analysis a) in R, b) with pandas / scikit-learn in Python.This is much more about me picking up the scikit-learn API than it is about good
predictive performance :-)## Measures of performance:
- inspecting a confusion matrix
- the ROC curve, plotted
- the area under the ROC curve## TODO:
- the R should use caret
- the python version