An open API service indexing awesome lists of open source software.

https://github.com/nkamilla/titanic-eda

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.
https://github.com/nkamilla/titanic-eda

data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset

Last synced: 25 days ago
JSON representation

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

Awesome Lists containing this project

README

          

# Titanic-EDA
Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.
# Titanic EDA

Exploratory Data Analysis (EDA) of the classic Titanic dataset. The goal is to clean the data, explore patterns, and present clear visualizations and business-style insights.

## Setup

```bash
# (Recommended) create a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate

# install dependencies
pip install -r requirements.txt
```

## Data

Place the dataset as `data/titanic.csv`. You can download a compatible version from Kaggle ("Titanic - Machine Learning from Disaster") and save the combined train/test or just `train.csv` as `data/titanic.csv`.

Required columns (common in public versions): `Survived`, `Pclass`, `Sex`, `Age`, `SibSp`, `Parch`, `Fare`, `Embarked`.

> Tip: If your file is named `train.csv`, update the notebook path or simply rename it to `titanic.csv`.

## Analysis

Open the notebook and run all cells:

```bash
jupyter notebook notebooks/01_titanic_eda.ipynb
```

What you'll find inside:
- Data loading & basic sanity checks
- Cleaning: missing values, types, outliers (simple)
- 6-8 visualizations (class distribution, survival by sex/class/embarked, age/fare distributions, correlations)
- Compact takeaways and "What I found" section

## Results

Exported figures (PNG) will be saved to the `results/` folder automatically when you run the notebook. You can attach the best 2-3 charts in your README for recruiters.

## Next Steps

- Add a simple baseline model (e.g., logistic regression) in a new notebook `02_baseline_model.ipynb`
- Compare metrics and add feature importance
- Create a short executive summary in the README (top 3-5 insights)
## Business Insights

Survival differed significantly by gender and class – female passengers and those in higher classes had higher survival rates.

Age and fare distributions are skewed; for modeling they may require transformation or winsorization.

Missing values (e.g., Age, occasionally Embarked) require simple imputation to avoid bias.

Passenger class (Pclass) partly reflects socio-economic status and is a strong predictor of survival.

For decision-makers: segmentation by gender, class, and embarkation port clearly highlights groups at higher risk.