https://github.com/guptaachin/titanic-data-analysis
This is analysis and modelling of the famous Titanic Data Set from Kaggle.
https://github.com/guptaachin/titanic-data-analysis
dataanalytics datamining datascience machinelearning numpy pandas python scikit-learn structuredpyramidanalysisplan tableau tableau-desktop
Last synced: 3 months ago
JSON representation
This is analysis and modelling of the famous Titanic Data Set from Kaggle.
- Host: GitHub
- URL: https://github.com/guptaachin/titanic-data-analysis
- Owner: guptaachin
- Created: 2018-09-09T10:25:19.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-09-28T03:44:18.000Z (over 6 years ago)
- Last Synced: 2025-01-14T06:46:08.527Z (4 months ago)
- Topics: dataanalytics, datamining, datascience, machinelearning, numpy, pandas, python, scikit-learn, structuredpyramidanalysisplan, tableau, tableau-desktop
- Language: Jupyter Notebook
- Size: 873 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Titanic-data-analysis
This is analysis and modelling of the famous Titanic Data Set from Kaggle.
In this repository I used best practices to analyze and model the classic Titanic data set.
Quick Links :
1. [tableau story board](https://public.tableau.com/profile/gauscian#!/vizhome/tab-wkb/TitanicDataSetAnalysis?publish=yes)
2. [jupyter notebook](https://github.com/gauscian/Titanic-data-analysis/blob/master/jupyter-nb.ipynb)
4. [cleaning code](https://github.com/gauscian/Titanic-data-analysis/blob/master/cleaning_helper.py)
3. [SPAP](https://github.com/gauscian/Titanic-data-analysis/blob/master/%5BSPAP%5D%20Titanic%20Data%20Set.png)
Please feel free to fork and contribute.
My take aways from this project:
1. Reiterating the basic strategy of working through a Data Science Project.
2. Importance of carrying out exhaustive analysis.
3. Use intuition and understanding gained during analysis to mold the data. This is crucial since you would want your molded data to still be representative of the real data set.
4. Use intuition to decide on the best suitable Machine Learning Algorithms and employ them using Scikit Learn Pipelines.