https://github.com/nkamilla/titanic-eda
Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.
https://github.com/nkamilla/titanic-eda
data-analysis eda jupyter-notebook matplotlib numpy pandas python titanic-dataset
Last synced: 25 days ago
JSON representation
Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.
- Host: GitHub
- URL: https://github.com/nkamilla/titanic-eda
- Owner: nkamilla
- License: mit
- Created: 2025-09-11T10:47:38.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-09-11T11:13:58.000Z (about 1 month ago)
- Last Synced: 2025-09-11T13:32:26.886Z (about 1 month ago)
- Topics: data-analysis, eda, jupyter-notebook, matplotlib, numpy, pandas, python, titanic-dataset
- Language: Jupyter Notebook
- Homepage:
- Size: 50.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Titanic-EDA
Exploratory Data Analysis of the Titanic dataset using Python (Pandas, NumPy, Matplotlib). Includes data cleaning, visualizations, correlations, and key business insights.
# Titanic EDAExploratory Data Analysis (EDA) of the classic Titanic dataset. The goal is to clean the data, explore patterns, and present clear visualizations and business-style insights.
## Setup
```bash
# (Recommended) create a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate# install dependencies
pip install -r requirements.txt
```## Data
Place the dataset as `data/titanic.csv`. You can download a compatible version from Kaggle ("Titanic - Machine Learning from Disaster") and save the combined train/test or just `train.csv` as `data/titanic.csv`.
Required columns (common in public versions): `Survived`, `Pclass`, `Sex`, `Age`, `SibSp`, `Parch`, `Fare`, `Embarked`.
> Tip: If your file is named `train.csv`, update the notebook path or simply rename it to `titanic.csv`.
## Analysis
Open the notebook and run all cells:
```bash
jupyter notebook notebooks/01_titanic_eda.ipynb
```What you'll find inside:
- Data loading & basic sanity checks
- Cleaning: missing values, types, outliers (simple)
- 6-8 visualizations (class distribution, survival by sex/class/embarked, age/fare distributions, correlations)
- Compact takeaways and "What I found" section## Results
Exported figures (PNG) will be saved to the `results/` folder automatically when you run the notebook. You can attach the best 2-3 charts in your README for recruiters.
## Next Steps
- Add a simple baseline model (e.g., logistic regression) in a new notebook `02_baseline_model.ipynb`
- Compare metrics and add feature importance
- Create a short executive summary in the README (top 3-5 insights)
## Business InsightsSurvival differed significantly by gender and class – female passengers and those in higher classes had higher survival rates.
Age and fare distributions are skewed; for modeling they may require transformation or winsorization.
Missing values (e.g., Age, occasionally Embarked) require simple imputation to avoid bias.
Passenger class (Pclass) partly reflects socio-economic status and is a strong predictor of survival.
For decision-makers: segmentation by gender, class, and embarkation port clearly highlights groups at higher risk.