https://github.com/rstudio/sparkxgb

R interface for XGBoost on Spark
https://github.com/rstudio/sparkxgb

apache-spark machine-learning r rstats spark xgboost

Last synced: 3 months ago
JSON representation

R interface for XGBoost on Spark

Host: GitHub
URL: https://github.com/rstudio/sparkxgb
Owner: rstudio
License: other
Created: 2018-11-21T08:31:04.000Z (over 6 years ago)
Default Branch: main
Last Pushed: 2024-05-01T17:36:04.000Z (about 1 year ago)
Last Synced: 2024-07-31T19:25:34.834Z (12 months ago)
Topics: apache-spark, machine-learning, r, rstats, spark, xgboost
Language: R
Homepage: https://spark.posit.co/packages/sparkxgb/
Size: 184 KB
Stars: 46
Watchers: 6
Forks: 14
Open Issues: 16
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE.md

Awesome Lists containing this project

awesome-sparklyr - sparkxgb: R interface for XGBoost on Spark

README

        ---

output: github_document

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# sparkxgb

[![R-CMD-check](https://github.com/rstudio/sparkxgb/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/rstudio/sparkxgb/actions/workflows/R-CMD-check.yaml)

[![Spark Tests](https://github.com/rstudio/sparkxgb/actions/workflows/Tests.yaml/badge.svg)](https://github.com/rstudio/sparkxgb/actions/workflows/Tests.yaml)

[![Codecov test coverage](https://codecov.io/gh/rstudio/sparkxgb/branch/main/graph/badge.svg)](https://app.codecov.io/gh/rstudio/sparkxgb?branch=main)

[![CRAN status](https://www.r-pkg.org/badges/version/sparkxgb)](https://CRAN.R-project.org/package=sparkxgb)

## Overview

**sparkxgb** is a [sparklyr](https://spark.posit.co/) extension that provides

an interface to [XGBoost](https://github.com/dmlc/xgboost) on Spark.

## Installation

```r

install.packages("sparkxgb")

```

### Development version 

You can install the development version of `sparkxgb` with:

``` r

# install.packages("pak")

pak::pak("rstudio/sparkxgb")

```

## Example

**sparkxgb** supports the familiar formula interface for specifying models:

```{r, message = FALSE}

library(sparkxgb)

library(sparklyr)

library(dplyr)

sc <- spark_connect(master = "local")

iris_tbl <- sdf_copy_to(sc, iris)

xgb_model <- xgboost_classifier(

  iris_tbl,

  Species ~ .,

  num_class = 3,

  num_round = 50,

  max_depth = 4

)

xgb_model %>%

  ml_predict(iris_tbl) %>%

  select(Species, predicted_label, starts_with("probability_")) %>%

  glimpse()

```

It also provides a Pipelines API, which means you can use a `xgboost_classifier`

or `xgboost_regressor` in a pipeline as any `Estimator`, and do things like 

hyperparameter tuning:

```{r}

pipeline <- ml_pipeline(sc) %>%

  ft_r_formula(Species ~ .) %>%

  xgboost_classifier(num_class = 3)

param_grid <- list(

  xgboost = list(

    max_depth = c(1, 5),

    num_round = c(10, 50)

  )

)

cv <- ml_cross_validator(

  sc,

  estimator = pipeline,

  evaluator = ml_multiclass_classification_evaluator(

    sc,

    label_col = "label",

    raw_prediction_col = "rawPrediction"

  ),

  estimator_param_maps = param_grid

)

cv_model <- cv %>%

  ml_fit(iris_tbl)

summary(cv_model)

spark_disconnect(sc)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rstudio/sparkxgb

Awesome Lists containing this project

README