https://github.com/nkamilla/titanic-eda

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.
https://github.com/nkamilla/titanic-eda

data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset

Last synced: 25 days ago
JSON representation

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

Host: GitHub
URL: https://github.com/nkamilla/titanic-eda
Owner: nkamilla
License: mit
Created: 2025-09-11T10:47:38.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2025-09-11T11:13:58.000Z (about 1 month ago)
Last Synced: 2025-09-11T13:32:26.886Z (about 1 month ago)
Topics: data-analysis, eda, jupyter-notebook, matplotlib, numpy, pandas, python, titanic-dataset
Language: Jupyter Notebook
Homepage:
Size: 50.8 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Titanic-EDA

Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.

# Titanic EDA

Exploratory Data Analysis (EDA) of the classic Titanic dataset. The goal is to clean the data, explore patterns, and present clear visualizations and business-style insights.

## Setup

```bash

# (Recommended) create a virtual environment

python -m venv .venv

source .venv/bin/activate  # Windows: .venv\Scripts\activate

# install dependencies

pip install -r requirements.txt

```

## Data

Place the dataset as `data/titanic.csv`. You can download a compatible version from Kaggle ("Titanic - Machine Learning from Disaster") and save the combined train/test or just `train.csv` as `data/titanic.csv`.

Required columns (common in public versions): `Survived`, `Pclass`, `Sex`, `Age`, `SibSp`, `Parch`, `Fare`, `Embarked`.

> Tip: If your file is named `train.csv`, update the notebook path or simply rename it to `titanic.csv`.

## Analysis

Open the notebook and run all cells:

```bash

jupyter notebook notebooks/01_titanic_eda.ipynb

```

What you'll find inside:

- Data loading & basic sanity checks

- Cleaning: missing values, types, outliers (simple)

- 6-8 visualizations (class distribution, survival by sex/class/embarked, age/fare distributions, correlations)

- Compact takeaways and "What I found" section

## Results

Exported figures (PNG) will be saved to the `results/` folder automatically when you run the notebook. You can attach the best 2-3 charts in your README for recruiters.

## Next Steps

- Add a simple baseline model (e.g., logistic regression) in a new notebook `02_baseline_model.ipynb`

- Compare metrics and add feature importance

- Create a short executive summary in the README (top 3-5 insights)

## Business Insights

Survival differed significantly by gender and class – female passengers and those in higher classes had higher survival rates.

Age and fare distributions are skewed; for modeling they may require transformation or winsorization.

Missing values (e.g., Age, occasionally Embarked) require simple imputation to avoid bias.

Passenger class (Pclass) partly reflects socio-economic status and is a strong predictor of survival.

For decision-makers: segmentation by gender, class, and embarkation port clearly highlights groups at higher risk.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nkamilla/titanic-eda

Awesome Lists containing this project

README