https://github.com/adamrossnelson/skbuddy

Stata package designed to quickly export feature and target matrices as CSV for use with SciKit Learn.
https://github.com/adamrossnelson/skbuddy

Last synced: 3 months ago
JSON representation

Stata package designed to quickly export feature and target matrices as CSV for use with SciKit Learn.

Host: GitHub
URL: https://github.com/adamrossnelson/skbuddy
Owner: adamrossnelson
License: mit
Created: 2018-03-18T03:42:19.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2018-04-20T20:55:37.000Z (about 7 years ago)
Last Synced: 2025-01-10T04:54:11.771Z (4 months ago)
Language: Stata
Size: 10.7 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Stata Package skbuddy

Stata package designed to quickly export feature and target matrices as CSV for use with SciKit Learn. Designed for a State-Python side-by-side workflows.

## Usage

This package produces two csv files. The first with the suffix _X is intended to be used as a feature matrix with SciKit Learn. While the second with the suffix _y is intended to be used as a target matrix. [More inforamtion about the feature and target matricies](http://scikit-learn.org/stable/documentation.html) is over at [http://scikit-learn.org/stable/documentation.html](http://scikit-learn.org/stable/documentation.html).

[This Jupyter notebook](https://github.com/adamrossnelson/skbuddy/blob/master/git_demo.ipynb) demonstrates importing `skbuddy` output for use with SciKit Learn.

## Installation

At present, not planning to send to SSC for distribution. Available for install via:

```

net install skbuddy, from(https://raw.githubusercontent.com/adamrossnelson/skbuddy/master)

```

## Alternatives to skbuddy

The alternative to skbuddy would be to manually convert Stata `dta` files to `csv` or another format readily accessible in Python. A more direct option would be to use `pd.read_stata()`. For example:

```Python

import pandas as pd

import numpy as np

pd.set_option('display.max_rows', 8)

# Load example dta provided by Stata

exfile = pd.read_stata('http://www.stata-press.com/data/r15/auto2.dta')

# Define features and targets

X = exfile[['price','mpg','length']]

y = exfile[['foreign']]

# Use Scikit-Learn to fit a model

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(max_depth=4, criterion='entropy')

clf.fit(X, y)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adamrossnelson/skbuddy

Awesome Lists containing this project

README