Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adgaudio/kaggle
https://github.com/adgaudio/kaggle
Last synced: 15 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/adgaudio/kaggle
- Owner: adgaudio
- Created: 2012-04-29T02:41:46.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2012-04-30T14:41:32.000Z (over 12 years ago)
- Last Synced: 2024-10-03T11:12:05.064Z (about 1 month ago)
- Language: Python
- Size: 35.9 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
kaggle
======# Data Transformation
## Create dummy variables
* Site --> a variable for each site.
* Weekday --> a variable for each day.## Lag timeseries
First test each variable for autoregression. They will almost certainly be autoregressive, but we are unsure how much. We are looking for a threshold at which point older values are not statistically significant after controlling for more recent values.After we have established that threshold, create a set of lagged variables for each timeseries variable. Each original column will be transformed into X lagged columns based on position within chunk, where X is the threshold hour. This will reduce the number of observations available, but it is likely that we will gain more information than we lose.
## Linearize
For any non-normal weather variable, take the log.## Estimate missing values
Fill in points by averaging nearest 2 times. Fill in segments by copying nearest site.# Model fitting
## Ordinary Least Squares
Form a baseline by using OLS on the complete set of variables (including lagged and dummy variables)## Dimensionality reduction
##
# Visualization
Use gapminder.org style animation