https://github.com/Arturus/kaggle-web-traffic

1st place solution
https://github.com/Arturus/kaggle-web-traffic

cocob cudnn kaggle kaggle-web-traffic rnn rnn-encoder-decoder seq2seq tensorflow time-series timeseries

Last synced: 3 months ago
JSON representation

1st place solution

Host: GitHub
URL: https://github.com/Arturus/kaggle-web-traffic
Owner: Arturus
License: mit
Created: 2017-11-17T21:15:59.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-10-09T07:10:28.000Z (over 2 years ago)
Last Synced: 2024-10-31T03:35:34.481Z (8 months ago)
Topics: cocob, cudnn, kaggle, kaggle-web-traffic, rnn, rnn-encoder-decoder, seq2seq, tensorflow, time-series, timeseries
Language: Jupyter Notebook
Size: 8.76 MB
Stars: 1,828
Watchers: 75
Forks: 667
Open Issues: 13
Metadata Files:
- Readme: Readme.md
- Changelog: Changelog.md
- License: LICENSE

Awesome Lists containing this project

README

# Kaggle Web Traffic Time Series Forecasting
1st place solution

![predictions](images/predictions.png)

Main files:
* `make_features.py` - builds features from source data
* `input_pipe.py` - TF data preprocessing pipeline (assembles features
into training/evaluation tensors, performs some sampling and normalisation)
* `model.py` - the model
* `trainer.py` - trains the model(s)
* `hparams.py` - hyperpatameter sets.
* `submission-final.ipynb` - generates predictions for submission

How to reproduce competition results:
1. Download input files from https://www.kaggle.com/c/web-traffic-time-series-forecasting/data :
`key_2.csv.zip`, `train_2.csv.zip`, put them into `data` directory.
2. Run `python make_features.py data/vars --add_days=63`. It will
extract data and features from the input files and put them into
`data/vars` as Tensorflow checkpoint.
3. Run trainer:
`python trainer.py --name s32 --hparam_set=s32 --n_models=3 --name s32 --no_eval --no_forward_split
--asgd_decay=0.99 --max_steps=11500 --save_from_step=10500`. This command
will simultaneously train 3 models on different seeds (on a single TF graph)
and save 10 checkpoints from step 10500 to step 11500 to `data/cpt`.
__Note:__ training requires GPU, because of cuDNN usage. CPU training will not work.
If you have 3 or more GPUs, add `--multi_gpu` flag to speed up the training. One can also try different
hyperparameter sets (described in `hparams.py`): `--hparam_set=definc`,
`--hparam_set=inst81`, etc.
Don't be afraid of displayed NaN losses during training. This is normal,
because we do the training in a blind mode, without any evaluation of model performance.
4. Run `submission-final.ipynb` in a standard jupyter notebook environment,
execute all cells. Prediction will take some time, because it have to
load and evaluate 30 different model weights. At the end,
you'll get `submission.csv.gz` file in `data` directory.

See also [detailed model description](how_it_works.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Arturus/kaggle-web-traffic

Awesome Lists containing this project

README