Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bgreenwell/mlday18
Material from "Random Forests and Gradient Boosting Machines in R" presented at Machine Learning Day '18
https://github.com/bgreenwell/mlday18
decision-trees gradient-boosting-machine machine-learning partial-dependence-plot r random-forest variable-importance-plots
Last synced: 23 days ago
JSON representation
Material from "Random Forests and Gradient Boosting Machines in R" presented at Machine Learning Day '18
- Host: GitHub
- URL: https://github.com/bgreenwell/mlday18
- Owner: bgreenwell
- Created: 2018-02-03T23:02:26.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-09-15T19:15:19.000Z (about 6 years ago)
- Last Synced: 2024-08-02T06:03:11.581Z (3 months ago)
- Topics: decision-trees, gradient-boosting-machine, machine-learning, partial-dependence-plot, r, random-forest, variable-importance-plots
- Language: R
- Homepage: https://bgreenwell.github.io/MLDay18/MLDay18.html#1
- Size: 98.2 MB
- Stars: 16
- Watchers: 1
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output:
md_document:
variant: markdown_github
---# MLDay18: Random Forests and Gradient Boosting Machines in R
Slides for **Machine Learning Day '18**. This talk provides an overview of the following topics, as well as some of their implementations in the R programming language:
* [Decision trees](https://en.wikipedia.org/wiki/Decision_tree_learning)
* [Random forests](https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm)
* [Gradient boosting machines](https://projecteuclid.org/euclid.aos/1013203451)
[Launch slides](https://bgreenwell.github.io/MLDay18/MLDay18.html#1)
# Abstract
Good modeling tools should be universally applicable in classification and regression, have state-of-the-art accuracy, scale well to large data sets, and handle missing values effectively. Additionally, it would be nice for these tools to be able to automatically discover which variables are important, how they interact, and whether there are any novel cases or outliers. In this presentation, we discuss two such modeling tools: random forests and gradient boosting machines. The talk will cover a brief background of both methodologies (including decision trees) as well as various implementations of each in the R software environment for statistical computing. The pros and cons of each implementation will also be covered.
```{r img, echo=FALSE, fig.align="center", out.width="80%"}
knitr::include_graphics("docs/figures/MLDay18.jpg")
```