An open API service indexing awesome lists of open source software.

https://github.com/arnabushna24/titanic-disaster-analysis

Titanic - Machine Learning from Disaster
https://github.com/arnabushna24/titanic-disaster-analysis

data-analysis data-visualization python statistical-analysis

Last synced: 9 months ago
JSON representation

Titanic - Machine Learning from Disaster

Awesome Lists containing this project

README

          

# Titanic Disaster Analysis

## Overview
This project aims to analyze [Titanic - Machine Learning from Disaster](https://www.kaggle.com/c/titanic/data) dataset and find insights using statistical analysis. It serves the following concerns:

* Retriveing the data from the target location.
* Handling the missing values and outliers (if there is any).
* Performing data visualization.
* Performing basis statistical analyses.

## Data Retrieval
`Titanic - Machine Learning from Disaster` dataset is available on Kaggle. It contains three (3) `.csv` files - `gender_submission.csv`, `test.csv`, and `train.csv`. Among them, `train.csv` file was used for this project. It contains twelve (12) columns - `PassengerId`, `Survived`, `Pclass`, `Name`, `Sex`, `Age`, `SibSp`, `Parch`, `Ticket`, `Fare`, `Cabin`, and `Embarked`. `train.csv` file was loaded into a `pandas` dataframe for further analysis.

## Data Cleaning and Manipulation
To find missing values in the dataset, `isnull` function was used. There were 177 missing `Age` values, 687 missing `Cabin` values, and 2 `Embarked` values. For the missing values in `Age` column, it was imputed with the median of the column values, whereas `Cabin` and `Embarked` columns were handled using `notnull` function and mode, respectively. After that, outlier identification was performed and outliers were then capped. However, there were no duplicated records.

## Data Visualizations



Distribution of Passengers by Gender
Age Distribution Histogram



Fig. 1: Distribution of Passengers by Gender
Fig. 2: Age Distribution Histogram



Survival Rate by Gender
Survival Rate by Class



Fig. 3: Survival Rate by Gender
Fig. 4: Survival Rate by Class

## Statistical Analysis

Table 1: Mean, Median and Mode of 'Fare' and 'Age' Columns


Columns
Mean
Median
Mode


Fare
32.2042
14.4542
8.05


Age
29.3616
28.0
28.0

Table 2: Gender-wise Survival Rate


Test Component
Result



Null hypothesis
Significant difference in survival rates


Significance level (α)
0.05


T-statistic
-18.672


P-value
2.28 × 10⁻⁶¹


Decision
Reject the null hypothesis


Interpretation
There is a significant difference in survival rates between males and females on the Titanic

## Build from Source
Instructions are provided in the `.ipynb` file.

If you have any queries, contact me: arnabnushna24@gmail.com