https://github.com/edrubin/ec524w25

Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2025). Taught by Ed Rubin and Andrew Dickinson.
https://github.com/edrubin/ec524w25
course data-science econometrics economics machine-learning ml open-educational-resources prediction university university-of-oregon
Last synced: 29 days ago
JSON representation
Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2025). Taught by Ed Rubin and Andrew Dickinson.
Host: GitHub
URL: https://github.com/edrubin/ec524w25
Owner: edrubin
License: mit
Created: 2025-01-06T18:03:55.000Z (9 months ago)
Default Branch: master
Last Pushed: 2025-02-21T23:48:31.000Z (8 months ago)
Last Synced: 2025-02-22T00:26:41.436Z (8 months ago)
Topics: course, data-science, econometrics, economics, machine-learning, ml, open-educational-resources, prediction, university, university-of-oregon
Language: HTML
Homepage:
Size: 77.2 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          
# EC 524/424, Winter 2025

Welcome to Economics 524 (424): Prediction and machine-learning in econometrics, taught by [Ed Rubin](https://edrub.in) and [Andrew Dickinson](https://ajdickinson.github.io).

## Schedule

**Lecture** Tuesdays and Thursdays, 10:00a-11:20a (Pacific), [101 McKenzie](https://classrooms.uoregon.edu/mckenzie-101)

**Lab** Friday, 2:00p–2:50p (Pacific), [195 Anstett](https://classrooms.uoregon.edu/anstett-195)

**Office hours**

- **Ed Rubin** Tu. 2:30p–3:30p ([PLC 530](https://map.uoregon.edu/b83e556a1))

- **Andrew Dickinson** We. 3p–4p (Zoom)

## Syllabus

[**Syllabus**](https://raw.githack.com/edrubin/EC524W25/master/syllabus/syllabus.pdf)

## Books

### Required books

- [Introduction to Statistical Learning](https://www.statlearning.com/)

- [The Hundred-Page Machine Learning Book](http://themlbook.com/)

- [Data Visualization](https://socviz.co/)

### Suggested books

- [R for Data Science](https://r4ds.had.co.nz/)

- [Introduction to Data Science](https://www.springer.com/us/book/9783319500164) (not available without purchase)

- [The Elements of Statistical Learning](http://web.stanford.edu/~hastie/ElemStatLearn/)

- [Data Science for Public Policy](https://link.springer.com/book/10.1007/978-3-030-71352-2) (ebook available through UO library)

## Lecture notes

**Note:** Links to topics that we have not yet covered lead to older slides. I will update links to the new slides as we work our way through the term/slides.

[**000 - Overview (Why predict?)**](https://raw.githack.com/edrubin/EC524W25/master/lecture/000/slides.html)

1. Why do we have a class on prediction?

2. How is prediction (and how are its tools) different from causal inference?

3. Motivating examples

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/master/lecture/000/slides.html) | [.pdf](https://github.com/edrubin/EC524W25/blob/master/lecture/000/slides.pdf) | [.rmd](https://github.com/edrubin/EC524W25/blob/master/lecture/000/slides.rmd)

**Readings** Introduction in *ISL*

[**001 - Statistical learning foundations**](https://raw.githack.com/edrubin/EC524W25/master/lecture/001/slides.html)

1. Why do we have a class on prediction?

2. How is prediction (and how are its tools) different from causal inference?

3. Motivating examples

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/master/lecture/001/slides.html) | [.pdf](https://github.com/edrubin/EC524W25/blob/master/lecture/001/slides.pdf) | [.rmd](https://github.com/edrubin/EC524W25/blob/master/lecture/001/slides.rmd)

**Readings**

- [Prediction Policy Problems](https://www.aeaweb.org/articles?id=10.1257/aer.p20151023) by Kleinberg *et al.* (2015)

- *ISL* Ch1

- *ISL* Start Ch2

**Supplements** [Unsupervised character recognization](https://colah.github.io/posts/2014-10-Visualizing-MNIST/)

[**002 - Model accuracy**](https://raw.githack.com/edrubin/EC524W25/master/lecture/002/slides.html)

1. Model accuracy

1. Loss for regression and classification

1. The variance-bias tradeoff

1. The Bayes classifier

1. KNN

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/master/lecture/002/slides.html) | [.pdf](https://github.com/edrubin/EC524W25/blob/master/lecture/002/slides.pdf) | [.rmd](https://github.com/edrubin/EC524W25/blob/master/lecture/002/slides.rmd)

**Readings** 

- *ISL* Ch2–Ch3

- *Optional:* *100ML* Preface and Ch1–Ch4

[**003 - Resampling methods**](https://raw.githack.com/edrubin/EC524W25/master/lecture/003/slides.html)

1. Review

1. The validation-set approach

1. Leave-out-out cross validation

1. k-fold cross validation

1. The bootstrap

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/master/lecture/003/slides.html) | [.pdf](https://github.com/edrubin/EC524W25/blob/master/lecture/003/slides.pdf) | [.rmd](https://github.com/edrubin/EC524W25/blob/master/lecture/003/slides.rmd)

**Readings**

- *ISL* Ch5

- *Optional:* *100ML* Ch5

[**004 - Linear regression strikes back**](https://raw.githack.com/edrubin/EC524W23/master/lecture/004/004-slides.html)

1. Returning to linear regression

1. Model performance and overfit

1. Model selection—best subset and stepwise

1. Selection criteria

**Formats** [.html](https://raw.githack.com/edrubin/EC524W23/master/lecture/004/004-slides.html) | [.pdf](https://github.com/edrubin/EC524W23/blob/master/lecture/004/004-slides.pdf) | [.Rmd](https://github.com/edrubin/EC524W23/blob/master/lecture/004/004-slides.Rmd)

**Readings**

- *ISL* Ch3

- *ISL* Ch6.1

**In between: `tidymodels`-ing**

- [An introduction to preprocessing with `tidymodels`](https://www.kaggle.com/edwardarubin/intro-tidymodels-preprocessing). (Kaggle notebook)

- [An introduction to modeling with `tidymodels`](https://www.kaggle.com/edwardarubin/intro-tidymodels-modeling). (Kaggle notebook)

- [An introduction to resampling, model tuning, and workflows with `tidymodels`](https://www.kaggle.com/edwardarubin/intro-tidymodels-resampling) (Kaggle notebook)

- [Introduction to `tidymodels`: Follow up for Kaggle](https://www.kaggle.com/edwardarubin/intro-tidymodels-split-kaggle)

[**005 - Shrinkage methods**](https://raw.githack.com/edrubin/EC524W25/master/lecture/005/slides.html)

(AKA: Penalized or regularized regression)

1. Ridge regression

1. Lasso

1. Elasticnet

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/master/lecture/005/slides.html) | [.pdf](https://github.com/edrubin/EC524W25/blob/master/lecture/005/slides.pdf) | [.Rmd](https://github.com/edrubin/EC524W25/blob/master/lecture/005/slides.Rmd)

**Readings**

- *ISL* Ch4

- *ISL* Ch6

[**006 - Classification intro**](https://raw.githack.com/edrubin/EC524W25/master/lecture/006/slides.html)

1. Introduction to classification

1. Why not regression?

1. But also: Logistic regression

1. Assessment: Confusion matrix, assessment criteria, ROC, and AUC

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/master/lecture/006/slides.html) | [.pdf](https://github.com/edrubin/EC524W25/blob/master/lecture/006/slides.pdf) | [.Rmd](https://github.com/edrubin/EC524W25/blob/master/lecture/006/slides.Rmd)

**Readings**

- *ISL* Ch4

[**007 - Decision trees**](https://raw.githack.com/edrubin/EC524W25/master/lecture/007/slides.html)

1. Introduction to trees

1. Regression trees

1. Classification trees—including the Gini index, entropy, and error rate

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/master/lecture/007/slides.html) | [.pdf](https://github.com/edrubin/EC524W25/blob/master/lecture/007/slides.pdf) | [.rmd](https://github.com/edrubin/EC524W25/blob/master/lecture/007/slides.rmd)

**Readings**

- *ISL* Ch8.1–Ch8.2

[**008 - Ensemble methods**](https://raw.githack.com/edrubin/EC524S24/master/lecture/008/slides.html)

1. Introduction

1. Bagging

1. Random forests

1. Boosting

**Formats** [.html](https://raw.githack.com/edrubin/EC524S24/master/lecture/008/slides.html) | [.pdf](https://github.com/edrubin/EC524S24/blob/master/lecture/008/slides.pdf) | [.rmd](https://github.com/edrubin/EC524S24/blob/master/lecture/008/slides.rmd)

**Readings**

- *ISL* Ch8.2

[**009 - Support vector machines**](https://raw.githack.com/edrubin/EC524S24/master/lecture/009/slides.html)

1. Hyperplanes and classification

2. The maximal margin hyperplane/classifier

3. The support vector classifier

4. Support vector machines

**Formats** [.html](https://raw.githack.com/edrubin/EC524S24/master/lecture/009/slides.html) | [.pdf](https://github.com/edrubin/EC524S24/blob/master/lecture/009/slides.pdf) | [.rmd](https://github.com/edrubin/EC524S24/blob/master/lecture/009/slides.rmd)

**Readings**

- *ISL* Ch9

[**010 - Dimensionality reduction and unsupervised learning**](https://raw.githack.com/edrubin/EC524W25/master/lecture/010/notebook.html)

0. MNIST dataset (machines with vision)

1. *K*-means clustering

2. Principal component analysis (PCA)

3. UMAP

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/master/lecture/010/notebook.html) | [.qmd](https://github.com/edrubin/EC524W22/blob/master/lecture/010/notebook.qmd)

## Projects

Past, present, and future projects.

[**000** Predicting sales price in housing data (Kaggle)](projects/project-000)

*Due:* Friday 31 January 2025 by midnight (before 11:59 PM) Pacific

**Help:** 

- [A simple example/walkthrough](https://www.kaggle.com/edwardarubin/project-000-example)

- [Kaggle notebooks](https://rpubs.com/Clennon/KagNotes) (from Connor Lennon)

[**001** Validation and out-of-sample performance](projects/project-001)

*Due:* Thursday 13 February 2025 by midnight (before 11:59 PM) Pacific

[**002** Penalized regression, logistic regression, and classification](projects/project-002)

*Due:* Saturday 22 February 2025 by midnight (before 11:59 PM) Pacific

[**003** Trees, ensembles, and imputation](projects/project-003)

*Due:* Saturday 01 March 2025 by midnight (before 11:59 PM) Pacific

[Help](projects/project-003/help-003.md)

[**004** Prediction finale](projects/project-004)

*Due:* Wednesday 19 March 2025 by midnight (before 11:59 PM) Pacific

## Class project

[Outline of the project](https://github.com/edrubin/EC524W25/tree/master/projects/class-project)

**Topic due by midnight on 09 February 2025**.

**Final project submission due by 11:59p on 12 March 2025.**

## Final exam

**In-class exam**: *Monday (17 March 2025) at [8:00a–10:00a](https://registrar.uoregon.edu/dates-deadlines/exams)*




*Note:* Previous years had a take-home portion of the final exam. This year, we will only have an in-class exam.

**Prep materials**




Previous take-home exam: [2023](exam/past-home/home-23.md) | [2024](exam/past-home/home-24.md) 




Previous in-class exams: [2023](exam/past-class/inclass-23.pdf) | [2024](exam/past-class/inclass-24.pdf)




*Note:* I am not providing keys.

## Lab notes

Approximate/planned topics...

[**000 - Workflow and cleaning**](https://raw.githack.com/edrubin/EC524W22/master/lab/000-cleaning/000-slides.html)

1. General "best practices" for coding

2. Working with RStudio

3. The pipe (`%>%`)

4. Cleaning and Kaggle follow up

**Formats** [.html](https://raw.githack.com/edrubin/EC524W22/master/lab/000-cleaning/000-slides.html) | [.pdf](https://raw.githack.com/edrubin/EC524W22/master/lab/000-cleaning/000-slides.pdf) | [.Rmd](https://raw.githack.com/edrubin/EC524W22/master/lab/000-cleaning/000-slides.Rmd)

[**001 - Workflow and cleaning: An example**](https://raw.githack.com/edrubin/EC524W25/refs/heads/master/lab/001-projects/doc001.html)

Follow these steps to get started on the lab this week.

1. Install Quarto. Follow this [link](https://quarto.org/docs/getting-started/installation.html), download the installer for your operating system, and follow the instructions to install Quarto

2. Download (_and unzip_) the [Housing data](https://github.com/edrubin/EC524W22/raw/master/lab/001-cleaning/data/house-prices-advanced-regression-techniques.zip) and the [Quarto document](https://github.com/edrubin/EC524W25/blob/master/lab/001-projects/doc001.qmd) (download button top right corner of page)

3. Create a project in RStudio in a separate folder

4. Copy/move the data files and the Quarto document to a folder dedicated to this lab

5. Open the Quarto document in RStudio and follow the instructions to get started on this weeks lab

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/refs/heads/master/lab/001-projects/doc001.html) | [.qmd](https://github.com/edrubin/EC524W25/blob/master/lab/001-projects/doc001.qmd)

[**002 - Validation**](https://raw.githack.com/edrubin/EC524W25/refs/heads/master/lab/002-validation/doc002.html)

1. Creating a training and validation data set from your observations dataframe in R

2. Writing a function to iterate over multiple models to test and compare MSEs

**Download**: This [zip](https://github.com/edrubin/EC524W25/raw/master/lab/002-validation/lab002.zip) file

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/refs/heads/master/lab/002-validation/doc002.html) | [.qmd](https://github.com/edrubin/EC524W25/blob/master/lab/002-validation/doc002.qmd)

[**003 - Practice using `tidymodels`**](https://www.kaggle.com/edwardarubin/intro-tidymodels-preprocessing)

1. Cleaning data quickly and efficiently with `tidymodels`

**Formats** [.html](https://www.kaggle.com/edwardarubin/intro-tidymodels-preprocessing)

[**004 - Practice using `tidymodels`**](https://www.kaggle.com/edwardarubin/intro-tidymodels-preprocessing) (continued)

1. [An introduction to preprocessing with `tidymodels`](https://www.kaggle.com/edwardarubin/intro-tidymodels-preprocessing) (refresher from last week) 

2. [An introduction to modeling with `tidymodels`](https://www.kaggle.com/edwardarubin/intro-tidymodels-modeling)

3. [An introduction to resampling, model tuning, and workflows with `tidymodels`](https://www.kaggle.com/edwardarubin/intro-tidymodels-resampling) (will finish up next week)

[**005 - More practice with `tidymodels`**](https://raw.githack.com/edrubin/EC524W25/refs/heads/master/lab/003-tidymodels/doc003.html)

Change an OLS workflow to a Lasso or Ridge regression workflow.

- [Updated verion of the lab document with penalized regression](https://raw.githack.com/edrubin/EC524W25/refs/heads/master/lab/003-tidymodels/doc003-update.html)

**Download**: lab project [zip file](https://github.com/edrubin/EC524W25/raw/master/lab/003-tidymodels/lab003.zip)

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/refs/heads/master/lab/003-tidymodels/doc003.html) | [.qmd](https://github.com/edrubin/EC524W25/blob/master/lab/003-tidymodels/doc003.qmd)

**007 - Decision trees**

Setting up decision trees, with and without `tidymodels`.

**Download**: [Quarto document](https://github.com/edrubin/EC524W25/blob/master/lab/004-decision-trees/doc004.qmd)

**Formats** [.html](https://raw.githack.com/edrubin/EC524W25/refs/heads/master/lab/004-decision-trees/doc004.html) | [.qmd](https://github.com/edrubin/EC524W25/blob/master/lab/004-decision-trees/doc004.qmd)

## Prediction in the media

- NPR: [Google's new AI chatbot made a $100 billion mistake in a demo ad](https://www.npr.org/2023/02/09/1155650909/google-chatbot--error-bard-shares)

- NYT: [Disinformation Researchers Raise Alarms About A.I. Chatbots](https://www.nytimes.com/2023/02/08/technology/ai-chatbots-disinformation.html)

- NPR: [She was denied entry to a Rockettes show — then the facial recognition debate ignited](https://www.npr.org/2023/01/21/1150289272/facial-recognition-technology-madison-square-garden-law-new-york)

- LA Times: [Nobody knows how widespread illegal cannabis grows are in California. So we mapped them](https://www.latimes.com/california/story/2022-09-08/how-we-mapped-illegal-cannabis-farms-in-california)

- NYT: [Can A.I. Write Recipes Better Than Humans? We Put It to the Ultimate Test](https://www.nytimes.com/2022/11/04/dining/ai-thanksgiving-menu.html)

- [ChatGPT](https://chat.openai.com/chat)

  - Business Insider: [List of exams ChatGPT has passed](https://www.businessinsider.com/list-here-are-the-exams-chatgpt-has-passed-so-far-2023-1?op=1#-5)

  - NPR: ['Everybody is cheating': Why this teacher has adopted an open ChatGPT policy](https://www.npr.org/2023/01/26/1151499213/chatgpt-ai-education-cheating-classroom-wharton-school)

  - [How Should Schools Respond to ChatGPT?](https://www.nytimes.com/2023/01/24/learning/how-should-schools-respond-to-chatgpt.html)

  - Energy Institute: [Can ChatGPT Save the Planet?](https://energyathaas.wordpress.com/2023/01/23/can-chatgpt-save-the-planet/)

  - MIT Tech Review: [Here’s how Microsoft could use ChatGPT](https://www.technologyreview.com/2023/01/17/1067014/heres-how-microsoft-could-use-chatgpt/)

  - NPR: [This 22-year-old is trying to save us from ChatGPT before it changes writing forever](https://www.npr.org/sections/money/2023/01/17/1149206188/this-22-year-old-is-trying-to-save-us-from-chatgpt-before-it-changes-writing-for)

  - NYT: [How ChatGPT Hijacks Democracy](https://www.nytimes.com/2023/01/15/opinion/ai-chatgpt-lobbying-democracy.html)

  - NYT: [Don’t Ban ChatGPT in Schools. Teach With It.](https://www.nytimes.com/2023/01/12/technology/chatgpt-schools-teachers.html)

  - NYT: [How to Use ChatGPT and Still Be a Good Person](https://www.nytimes.com/2022/12/21/technology/personaltech/how-to-use-chatgpt-ethically.html)

  - NPR: [A new AI chatbot might do your homework for you. But it's still not an A+ student](https://www.npr.org/2022/12/19/1143912956/chatgpt-ai-chatbot-homework-academia)

  - NYT: [The Brilliance and Weirdness of ChatGPT](https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html)

- Military applications

  - The Drive: [M1 Abrams Tank Tested With Artificial Intelligence Targeting System](https://www.thedrive.com/the-war-zone/m1-abrams-tank-tested-with-artificial-intelligence-targeting-system)

  - Task and Purpose: [Marines outwitted an AI security camera by hiding in a cardboard box and pretending to be trees](https://taskandpurpose.com/news/marines-ai-paul-scharre/)

  - WP: [The next U.S. battle tank could use AI to identify targets](https://www.washingtonpost.com/technology/2022/10/12/abramsx-ai-hybrid-military-battle-tank/)

## Additional resources

### Jobs

I wrote a very short guide to [finding a job](jobs).

### R

- [UO library resources/workshops](https://researchguides.uoregon.edu/library_workshops)

- [RStudio's recommendations for learning R](https://education.rstudio.com/learn/), plus cheatsheets, books, and tutorials

- [YaRrr! The Pirate’s Guide to R](https://bookdown.org/ndphillips/YaRrr/) (free online)

- [Eugene R Users](https://www.meetup.com/meetup-group-cwPiAlnB/)

### Data Science

- [Happy Git and GitHub for the useR](https://happygitwithr.com/) by Jenny Bryan, the "STAT 545 TAs", and Jim Hester

- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) by Jake VanderPlas

- [Elements of AI](https://course.elementsofai.com/)

- [Caltech professor Yaser Abu-Mostafa: Lectures about machine learning on YouTube](https://www.youtube.com/user/caltech/search?query=Yaser+Abu-Mostafa)

- From Google:

  - [Machine-learning crash course](https://developers.google.com/machine-learning/crash-course/ml-intro)

  - [Google Cloud training for data and machine learning](https://cloud.google.com/training/data-ml)

  - [General Google education platform](https://ai.google/education/)

### Spatial data

- [Geocomputation with R](https://geocompr.robinlovelace.net) (free online)

- [Spatial Data Science](https://keen-swartz-3146c4.netlify.com) (free online)

- [Applied Spatial Data Analysis with R](https://asdar-book.org)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/edrubin/ec524w25

Awesome Lists containing this project

README