Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mayer79/statistical_computing_material
Material for the lecture Statistical Computing
https://github.com/mayer79/statistical_computing_material
data-science machine-learning r statistics
Last synced: about 2 months ago
JSON representation
Material for the lecture Statistical Computing
- Host: GitHub
- URL: https://github.com/mayer79/statistical_computing_material
- Owner: mayer79
- Created: 2022-11-05T16:30:02.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-02T16:17:18.000Z (4 months ago)
- Last Synced: 2024-11-02T04:12:09.429Z (about 2 months ago)
- Topics: data-science, machine-learning, r, statistics
- Language: TeX
- Homepage: https://mayer79.github.io/statistical_computing_material/
- Size: 21.7 MB
- Stars: 4
- Watchers: 2
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Statistical Computing
### Lecture Notes
#### Michael Mayer
## Organization
The lecture has six chapters:
1. [R in Action](https://mayer79.github.io/statistical_computing_material/1_R_in_Action.html)
2. [Statistical Inference](https://mayer79.github.io/statistical_computing_material/2_Statistical_Inference.html)
3. [Linear Models](https://mayer79.github.io/statistical_computing_material/3_Linear_Models.html)
4. [Model Selection and Validation](https://mayer79.github.io/statistical_computing_material/4_Model_Selection_and_Validation.html)
5. [Trees](https://mayer79.github.io/statistical_computing_material/5_Trees.html)
6. [Neural Nets](https://mayer79.github.io/statistical_computing_material/6_Neural_Nets.html)Chapters 3 to 6 can be summarized as "Statistical ML in Action".
Each chapter will keep us busy for two weeks (3 hours + 1 hour exercises).
## Prerequisites### Lecture material
Fetch everything by running
```
git clone https://github.com/mayer79/statistical_computing_material.git
```in your Git console, or by downloading everything as Zip file.
### Large data
Download the large dataset "January 2018 - Yellow Taxi Trip Records" from [this page](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page).
Place it in the project subfolder "taxi/".
### Software
We will work with R version >= 4.4 and RStudio.
In the first two chapters, we will need these contributed R packages:
- tidyverse
- plotly
- insuranceData
- bench
- withr
- boot
- coinFor the remaining chapters, we further need:
- h2o (requires Java)
- arrow
- data.table
- FNN
- duckdb
- sparklyr (requires Java)
- rpart.plot
- ranger
- xgboost
- lightgbm
- hstats
- MetricsWeighted
- keras (requires Python, see below)For the last chapter, we additionally need Python with TensorFlow >= 2.15. You can install it by running the R command `keras::install_keras(version = "release-cpu")`. If the following code works, you are all set. (Some red start-up messages/warnings are okay.)
```
library(tensorflow)
tf$constant("Hello Tensorflow!")
```## Further Material
### Books
- James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). *An Introduction to Statistical Learning - with Applications in R*. New York: Springer.
- Hastie, T., Tibshirani, R., Friedman, J. (2001). *The Elements of Statistical Learning: Data Mining, Inference, and Prediction*. New York: Springer.
- Wickham, H., Grolemund, G. (2017). *R for Data Science: Import, Tidy, Transform, Visualize, and Model Data*. O'Reilly Media.
- Chollet, F., Allaire, J. J. (2018). *Deep Learning with R*. Manning Publications Co.### Video by Trevor Hastie
- Hastie Big Data 45': https://www.youtube.com/watch?v=0EWJZIC4JxA
## Copyright
This lecture is being distributed under the [creative commons license](https://creativecommons.org/licenses/by/2.0/).
## How to cite?
Michael Mayer (2023), *Statistical Computing*, lecture notes, Institute of Mathematical Statistics and Actuarial Science, University of Bern. URL: [https://github.com/mayer79/statistical_computing_material](https://github.com/mayer79/statistical_computing_material)