Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bgreenwell/mlday18

Material from "Random Forests and Gradient Boosting Machines in R" presented at Machine Learning Day '18
https://github.com/bgreenwell/mlday18

decision-trees gradient-boosting-machine machine-learning partial-dependence-plot r random-forest variable-importance-plots

Last synced: 23 days ago
JSON representation

Material from "Random Forests and Gradient Boosting Machines in R" presented at Machine Learning Day '18

Host: GitHub
URL: https://github.com/bgreenwell/mlday18
Owner: bgreenwell
Created: 2018-02-03T23:02:26.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2018-09-15T19:15:19.000Z (about 6 years ago)
Last Synced: 2024-08-02T06:03:11.581Z (3 months ago)
Topics: decision-trees, gradient-boosting-machine, machine-learning, partial-dependence-plot, r, random-forest, variable-importance-plots
Language: R
Homepage: https://bgreenwell.github.io/MLDay18/MLDay18.html#1
Size: 98.2 MB
Stars: 16
Watchers: 1
Forks: 7
Open Issues: 0
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

---
output:
md_document:
variant: markdown_github
---

# MLDay18: Random Forests and Gradient Boosting Machines in R

Slides for **Machine Learning Day '18**. This talk provides an overview of the following topics, as well as some of their implementations in the R programming language:

* [Decision trees](https://en.wikipedia.org/wiki/Decision_tree_learning)

* [Random forests](https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm)

* [Gradient boosting machines](https://projecteuclid.org/euclid.aos/1013203451)

[Launch slides](https://bgreenwell.github.io/MLDay18/MLDay18.html#1)

# Abstract

Good modeling tools should be universally applicable in classification and regression, have state-of-the-art accuracy, scale well to large data sets, and handle missing values effectively. Additionally, it would be nice for these tools to be able to automatically discover which variables are important, how they interact, and whether there are any novel cases or outliers. In this presentation, we discuss two such modeling tools: random forests and gradient boosting machines. The talk will cover a brief background of both methodologies (including decision trees) as well as various implementations of each in the R software environment for statistical computing. The pros and cons of each implementation will also be covered.

```{r img, echo=FALSE, fig.align="center", out.width="80%"}
knitr::include_graphics("docs/figures/MLDay18.jpg")
```