https://github.com/umstek/dengai

Solution for DengAI Competition by DrivenData (CS4642 Data Mining and Information Retrieval, CS4622 Machine Learning - assignments)
https://github.com/umstek/dengai

data-science dengai drivendata

Last synced: 7 days ago
JSON representation

Solution for DengAI Competition by DrivenData (CS4642 Data Mining and Information Retrieval, CS4622 Machine Learning - assignments)

Host: GitHub
URL: https://github.com/umstek/dengai
Owner: umstek
Created: 2018-05-27T01:40:07.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-04-03T00:00:42.000Z (over 1 year ago)
Last Synced: 2025-09-01T21:48:19.329Z (about 2 months ago)
Topics: data-science, dengai, drivendata
Language: Jupyter Notebook
Homepage:
Size: 22.3 MB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

# DengAI

## Reports and Presentations
### [Presentation](https://github.com/umstek/DengAI/blob/master/DengAI.pdf) for CS4622 (Machine Learning)

### [Report](https://github.com/umstek/DengAI/blob/master/Machine%20Learning%20Report%20-%20Group%2030.pdf) for CS4622 (Machine Learning)

### [Report](https://github.com/umstek/DengAI/blob/master/Data%20Mining%20Report%20-%20Group%2030.pdf) for CS4642 (Data Mining and Information Retrieval)

## Results
Current best result: 19.3798 (MAE), Rank 89 as of July 27 - 2018.
See [Generated files](https://github.com/umstek/DengAI/releases/tag/v1) for a complete list of intermediate generated files and submissions.

## Directory contents
+ The `.` root directory contains the data files downloaded from _drivendata_ and some milestone submissions.
+ `deprecated` folder contains the first approaches to the problem with _Matlab regression learner_ and _Orange3_ (with minimal preprocessing) and the resulting `.csv` files.
+ `Neural Networks` folder contains the first approaches to the problem with deep neural networks with _Keras_ and _Tensorflow_.
+ `Negative Binominal Regression` contains the DengAI benchmark model built with _Jupyter Notebook_ and _sklearn_, _statsmodels_ etc.
+ `Interactive Python 1` contains the approaches that do general preprocessing with _Jupyter Notebook_, _pandas_, _sklearn_, _statsmodels_, _seaborn_ and uses various models for prediction.
+ `Interactive Python 2` contains a pipeline that processes the files in various stages using _Jupyter Notebook_, _pandas_, _sklearn_, _statsmodels_, _seaborn_, and _R_'s STL (time series decomposition) borrowed with the _r2py_ bridge. This pipeline does preprocessing, visualization, analysing, automatic selection of features, best model selection etc. The best working model is a time series decomposing predicter with a linear regression model.
+ `Orange` folder contains an Orange3 pipeline that tests cross-validated errors of various learners with preprocessing, feature engineering etc.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/umstek/dengai

Awesome Lists containing this project

README