An open API service indexing awesome lists of open source software.

https://github.com/louisguitton/titanic-kaggle

R code for the "Hello World" Kaggle competition about the titanic dataset
https://github.com/louisguitton/titanic-kaggle

Last synced: 2 months ago
JSON representation

R code for the "Hello World" Kaggle competition about the titanic dataset

Awesome Lists containing this project

README

          

This repository contains the code I used to participate to the "Hello World" Kaggle competition about the titanic dataset.
https://www.kaggle.com/c/titanic

I chose to start with R.

Decision Trees & Feature Engineering
====
As advised by Kaggle, I first went through the DataCamp tutorial to ML.
https://www.datacamp.com/courses/kaggle-tutorial-on-machine-learing-the-sinking-of-the-titanic
It introduces:
- decision trees with rPart
- feature engineering
- overfitting
- random forests

The simple decision tree with feature engineering was my best entry so far.

General Linear Models & Complete Approach with R
====
I was looking to improve my score.
I went through this tutorial https://github.com/wehrley/wehrley.github.io/blob/master/SOUPTONUTS.md
It introduces:
- advances feature engineering
- logistic regression
- adaptative boosting
- random forest
- support vector machines

Conditionnal Forest
====
At that point I learnt what I was looking for.
But still I wanted to see what the next step was.
this script shows the difference between rForest and cForest:
https://www.kaggle.com/uioreanu/titanic/randomforest-cforest-method