An open API service indexing awesome lists of open source software.

https://github.com/ahsankhizar5/titanic-eda-visualization

Exploratory Data Analysis and Visualization on the Titanic Dataset using Python, Pandas, Matplotlib, and Seaborn to uncover survival patterns.
https://github.com/ahsankhizar5/titanic-eda-visualization

data-analysis data-science data-visualization eda kaggle machine-learning matplotlib pandas python seaborn titanic-dataset

Last synced: 19 days ago
JSON representation

Exploratory Data Analysis and Visualization on the Titanic Dataset using Python, Pandas, Matplotlib, and Seaborn to uncover survival patterns.

Awesome Lists containing this project

README

          

# ๐Ÿšข Titanic Dataset - EDA & Visualization

A complete **Exploratory Data Analysis (EDA)** on the Titanic dataset using Python. This project cleans the dataset, explores key insights, visualizes patterns, and summarizes findings to understand survival factors.

---

## ๐Ÿ“ Dataset

- Source: [Kaggle - Titanic Dataset](https://www.kaggle.com/c/titanic/data)
- Filename: `TiTanic_Dataset.csv`

---

## ๐Ÿ” Objective

Perform EDA and generate visual insights to answer:
- Who were most likely to survive?
- Were there patterns in class, gender, age, or fare?
- What variables are correlated?

---

## ๐Ÿ“Œ Features Explored

- Passenger Class (Pclass)
- Sex
- Age
- Fare
- Survival
- Siblings/Spouse & Parents/Children (SibSp, Parch)
- Embarked

---

## ๐Ÿ“Š Visualizations

Saved in the `Graphs/` folder:
- ๐Ÿ“ฆ Bar charts for categorical data (Sex, Pclass)
- ๐Ÿ“ˆ Histograms for distributions (Age, Fare)
- ๐ŸŒก๏ธ Correlation Heatmap

---

## ๐Ÿงน Cleaning & Processing

- Handled missing values:
- `Age`: Filled with median
- `Embarked`: Filled with mode
- `Cabin`: Dropped (too sparse)
- Removed duplicates
- Detected outliers in `Fare` using IQR

---

## ๐Ÿ’ก Key Insights

- Majority of passengers were **male** and in **3rd class**
- **Females had higher survival rates**
- **Younger passengers** were common
- Strong correlation between **SibSp** and **Parch** (family)
- **Fare** had significant outliers

Full findings are documented in [`TiTanic_EDA_Summery.docx`](./TiTanic_EDA_Summery.docx)

---

## โ–ถ๏ธ Run It Yourself

```bash
python eda_titanic.py
```

---

## โš™๏ธ Tools Used

- **Python**
- **Pandas**
- **Matplotlib** & **Seaborn**
- **Numpy**

---

## ๐Ÿ‘จโ€๐Ÿ’ป Author

**Ahsan Khizar**
[GitHub](https://github.com/ahsankhizar5) โ€” [LinkedIn](https://linkedin.com/in/ahsankhizar5)

---

> ๐Ÿ’ก *"Models may predict prices, but code quality predicts trust."* โ€” Ahsan Khizar