https://github.com/adamrossnelson/skbuddy
Stata package designed to quickly export feature and target matrices as CSV for use with SciKit Learn.
https://github.com/adamrossnelson/skbuddy
Last synced: 3 months ago
JSON representation
Stata package designed to quickly export feature and target matrices as CSV for use with SciKit Learn.
- Host: GitHub
- URL: https://github.com/adamrossnelson/skbuddy
- Owner: adamrossnelson
- License: mit
- Created: 2018-03-18T03:42:19.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-04-20T20:55:37.000Z (about 7 years ago)
- Last Synced: 2025-01-10T04:54:11.771Z (4 months ago)
- Language: Stata
- Size: 10.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Stata Package skbuddy
Stata package designed to quickly export feature and target matrices as CSV for use with SciKit Learn. Designed for a State-Python side-by-side workflows.## Usage
This package produces two csv files. The first with the suffix _X is intended to be used as a feature matrix with SciKit Learn. While the second with the suffix _y is intended to be used as a target matrix. [More inforamtion about the feature and target matricies](http://scikit-learn.org/stable/documentation.html) is over at [http://scikit-learn.org/stable/documentation.html](http://scikit-learn.org/stable/documentation.html).
[This Jupyter notebook](https://github.com/adamrossnelson/skbuddy/blob/master/git_demo.ipynb) demonstrates importing `skbuddy` output for use with SciKit Learn.
## Installation
At present, not planning to send to SSC for distribution. Available for install via:
```
net install skbuddy, from(https://raw.githubusercontent.com/adamrossnelson/skbuddy/master)
```## Alternatives to skbuddy
The alternative to skbuddy would be to manually convert Stata `dta` files to `csv` or another format readily accessible in Python. A more direct option would be to use `pd.read_stata()`. For example:```Python
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 8)# Load example dta provided by Stata
exfile = pd.read_stata('http://www.stata-press.com/data/r15/auto2.dta')# Define features and targets
X = exfile[['price','mpg','length']]
y = exfile[['foreign']]# Use Scikit-Learn to fit a model
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(max_depth=4, criterion='entropy')
clf.fit(X, y)
```