Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/recruit-tech/drctrl
Automatically configuration tool for DataRobot.
https://github.com/recruit-tech/drctrl
Last synced: about 11 hours ago
JSON representation
Automatically configuration tool for DataRobot.
- Host: GitHub
- URL: https://github.com/recruit-tech/drctrl
- Owner: recruit-tech
- License: mit
- Created: 2018-04-19T07:05:18.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T01:13:46.000Z (almost 2 years ago)
- Last Synced: 2024-09-27T18:16:50.170Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 135 KB
- Stars: 6
- Watchers: 10
- Forks: 1
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# drctrl
drctrl is a tool for automatically configuration for DataRobot. drctrl can manage features provided datarobot like building project, training, freezing, prediction.
python support: 3.6.x and greater
## 1. Installation
```bash
$ pip install drctrl
```## 2. Get started
Setting up credential
```bash
$ cat << _EOF_ > ~/.config/datarobot/drconfig.yaml
token:
endpoint:
_EOF_# get all projects
$ drctrl get_projects# get project detail
$ drctrl get_project# you can get the exist project configuration for drctrl with
$ drctrl get_project_setting
```### Build a new project
Download Boston Housing dataset in UCI [url](https://archive.ics.uci.edu/ml/machine-learning-databases/housing/Index).
```bash
# downloading dataset into `./data/raw/`
$ drctrl create_dataset
```Setting up configure.yml
```yaml
environment:
project_id: # if null, build a new project
project_name: 'sample_project'
target_feature: target
metric: 'RMSE'
cv_method: 'random'
validation_type: 'CV'
validation_params:
holdout_pct: 20
#validation_pct: 10
reps: 3 # number of cross validation folds to use
seed: 2017 # a seed to use for randomization
dataset:
type: file
path: './data/raw'
filename: 'boston.csv'
autopilot: 'manual' # fullauto, quick, manual
convert_features:
- {name: RAD, rename_to: RAD_categoricalInt, variable_type: categoricalInt}fit:
model_id: # if None, run autopilot
autopilot: 'fullauto'
featurelist_name: 'without_feature' # if already exist, current time string will be used
source_featurelist: 'Raw Features'
except_features:
- 'NOX'
predict:
model_id: # if None, a model will be automatically selected
input: # prediction target dataset
type: 'file'
path: './data/raw/'
filename: 'boston.csv'
reasoncode: True
merge_origin: True
feature_impact: True
output: # output format
type: 'file'
path: './'
filename: 'prediction.csv'
```Run drctrl with configuration
```bash
$ drctrl apply configure.yml
```Details of commands and options is [here](docs/options.md)
## 3. Commands
```bash
Usage: drctrl [OPTIONS] COMMAND [ARGS]...Options:
--credential PATH
--help Show this message and exit.Commands:
apply Apply all commands in configuration file
build building project on the basis of a configuration file
create_dataset download and install boston housing and iris dataset
fit training model on the basis of a configuration file
frozen freezing model on the basis of configuration file
get_project fetch the project detail
get_project_setting dump the project parameter as yaml file
get_projects fetch project details
predict predicting on the basis of a configuration file
validate validate configuration file
```## 4. I/O format
There are several options for I/O format. redshift, file, url format can be specified as `dataset` param in `environment`, `input` / `output` param in `predict` for now.
Details are [here](docs/iotype.md)
### file format
```yaml
environment:
dataset:
type: file
path: /path/to/dataset
filename: dataset.csv
```or
```yaml
predict:
input:
type: redshift
aws_key_id:
aws_secret_key:
bucket:
key_path:
dbname:
host:
port:
user:
password:
schema:
table:
output:
type: redshift
aws_key_id:
aws_secret_key:
bucket:
key_path:
dbname:
host:
port:
user:
password:
schema:
table:
```and so on.
## 5. template
drctrl support [Jinja2](https://github.com/pallets/jinja/tree/master/jinja2) template format. Configuration file have to satisfy file extention format `.yml.tmple` .
In tmpl file, `env['FILE_PATH']` variable is replaced by environment variable `FILE_PATH`.
The following is an example.```yaml
environment:
project_id: {{ env.PROJECT_ID }}predict:
model_id: {{ env.MODEL_ID }}
dataset:
type: file
path: {{ env['DATASET_PATH'] }}
filename: {{ env['DATASET_FILE'] }}
feature_impact: false
reasoncode: false
merge_origin: true
```