An open API service indexing awesome lists of open source software.

https://github.com/guptaachin/titanic-data-analysis

This is analysis and modelling of the famous Titanic Data Set from Kaggle.
https://github.com/guptaachin/titanic-data-analysis

dataanalytics datamining datascience machinelearning numpy pandas python scikit-learn structuredpyramidanalysisplan tableau tableau-desktop

Last synced: 3 months ago
JSON representation

This is analysis and modelling of the famous Titanic Data Set from Kaggle.

Awesome Lists containing this project

README

        

# Titanic-data-analysis
This is analysis and modelling of the famous Titanic Data Set from Kaggle.

In this repository I used best practices to analyze and model the classic Titanic data set.

Quick Links :
1. [tableau story board](https://public.tableau.com/profile/gauscian#!/vizhome/tab-wkb/TitanicDataSetAnalysis?publish=yes)
2. [jupyter notebook](https://github.com/gauscian/Titanic-data-analysis/blob/master/jupyter-nb.ipynb)
4. [cleaning code](https://github.com/gauscian/Titanic-data-analysis/blob/master/cleaning_helper.py)
3. [SPAP](https://github.com/gauscian/Titanic-data-analysis/blob/master/%5BSPAP%5D%20Titanic%20Data%20Set.png)


Please feel free to fork and contribute.


My take aways from this project:
1. Reiterating the basic strategy of working through a Data Science Project.
2. Importance of carrying out exhaustive analysis.
3. Use intuition and understanding gained during analysis to mold the data. This is crucial since you would want your molded data to still be representative of the real data set.
4. Use intuition to decide on the best suitable Machine Learning Algorithms and employ them using Scikit Learn Pipelines.