An open API service indexing awesome lists of open source software.

https://github.com/ja-thomas/autoxgboost

autoxgboost - Automatic tuning and fitting of xgboost
https://github.com/ja-thomas/autoxgboost

Last synced: 9 months ago
JSON representation

autoxgboost - Automatic tuning and fitting of xgboost

Awesome Lists containing this project

README

          

# autoxgboost - Automatic tuning and fitting of [xgboost](https://github.com/dmlc/xgboost).

[![Build Status](https://travis-ci.org/ja-thomas/autoxgboost.svg?branch=master)](https://travis-ci.org/ja-thomas/autoxgboost)
[![Coverage Status](https://coveralls.io/repos/github/ja-thomas/autoxgboost/badge.svg?branch=master)](https://coveralls.io/github/ja-thomas/autoxgboost?branch=master)
[![CRAN Status Badge](http://www.r-pkg.org/badges/version/autoxgboost)](https://CRAN.R-project.org/package=autoxgboost)
[![CRAN Downloads](http://cranlogs.r-pkg.org/badges/autoxgboost)](https://cran.rstudio.com/web/packages/autoxgboost/index.html)

* Install the development version

```splus
devtools::install_github("ja-thomas/autoxgboost")
```

# General overview

autoxgboost aims to find an optimal [xgboost](https://github.com/dmlc/xgboost) model automatically using the machine learning framework [mlr](https://github.com/mlr-org/mlr)
and the bayesian optimization framework [mlrMBO](https://github.com/mlr-org/mlrMBO).

**Work in progress**!

# Benchmark

|**Name** | **Factors**| **Numerics**| **Classes**| **Train instances**| **Test instances**
|-----------------|-------------|--------------|-------------|---------------------|--------------------
|Dexter | 20 000| 0| 2| 420| 180
|GermanCredit | 13| 7| 2| 700| 300
|Dorothea | 100 000| 0| 2| 805| 345
|Yeast | 0| 8| 10| 1 038| 446
|Amazon | 10 000| 0| 49| 1 050| 450
|Secom | 0| 591| 2| 1 096| 471
|Semeion | 256| 0| 10| 1 115| 478
|Car | 6| 0| 4| 1 209| 519
|Madelon | 500| 0| 2| 1 820| 780
|KR-vs-KP | 37| 0| 2| 2 237| 959
|Abalone | 1| 7| 28| 2 923| 1 254
|Wine Quality | 0| 11| 11| 3 425| 1 469
|Waveform | 0| 40| 3| 3 500| 1 500
|Gisette | 5 000| 0| 2| 4 900| 2 100
|Convex | 0| 784| 2| 8 000| 50 000
|Rot. MNIST + BI | 0| 784| 10| 12 000| 50 000

Datasets used for the comparison benchmark of autoxgboost, Auto-WEKA and auto-sklearn.

|**Dataset** | **baseline**| **autoxgboost**| **Auto-WEKA**| **auto-sklearn**
|-----------------|-----------------------|------------------------|------------------------|------------------------
|Dexter | 52,78| 12.22| 7.22| **5.56**
|GermanCredit | 32.67| 27.67| 28.33| **27.00**
|Dorothea | 6.09| **5.22**| 6.38| 5.51
|Yeast | 68.99| **38.88**| 40.45| 40.67
|Amazon | 99.33| 26.22| 37.56| **16.00**
|Secom | **7.87**| **7.87**| **7.87**| **7.87**
|Semeion | 92.45| 8.38| **5.03**| 5.24
|Car | 29,15| 1.16| 0.58| **0.39**
|Madelon | 50.26| 16.54| 21.15| **12.44**
|KR-vs-KP | 48.96| 1.67| **0.31**| 0.42
|Abalone | 84.04| 73.75| **73.02**| 73.50
|Wine Quality | 55.68| **33.70**| **33.70**| 33.76
|Waveform | 68.80| 15.40| **14.40**| 14.93
|Gisette | 50.71| 2.48| 2.24| **1.62**
|Convex | 50.00| 22.74| 22.05| **17.53**
|Rot. MNIST + BI | 88.88| 47.09| 55.84| **46.92**

Benchmark results are median percent error across 100 000 bootstrap samples (out of 25 runs) simulating 4 parallel runs. Bold numbers indicate best performing algorithms.

# autoxgboost - How to Cite

The **Automatic Gradient Boosting** framework was presented at the [ICML/IJCAI-ECAI 2018 AutoML Workshop](https://sites.google.com/site/automl2018icml/accepted-papers) ([poster](poster_2018.pdf)).
Please cite our [ICML AutoML workshop paper on arxiv](https://arxiv.org/abs/1807.03873v2).
You can get citation info via `citation("autoxgboost")` or copy the following BibTex entry:

```bibtex
@inproceedings{autoxgboost,
title={Automatic Gradient Boosting},
author={Thomas, Janek and Coors, Stefan and Bischl, Bernd},
booktitle={International Workshop on Automatic Machine Learning at ICML},
year={2018}
}
```