https://github.com/johannaschmidle/titanicsurvival

My code for the Titanic - Machine Learning from Disaster competition on Kaggle
https://github.com/johannaschmidle/titanicsurvival

Last synced: 4 months ago
JSON representation

My code for the Titanic - Machine Learning from Disaster competition on Kaggle

Host: GitHub
URL: https://github.com/johannaschmidle/titanicsurvival
Owner: johannaschmidle
Created: 2024-08-08T19:52:15.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-13T15:31:29.000Z (10 months ago)
Last Synced: 2025-01-11T01:59:37.588Z (5 months ago)
Language: Jupyter Notebook
Size: 756 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Titanic Survival Predictor

This repository has my code for my [competition submission](https://www.kaggle.com/code/johannaschmidle7/titanic-survival-predictor) for the [Titanic - Machine Learning from Disaster Competition](https://www.kaggle.com/competitions/titanic/overview) on Kaggle.

## Motivation
**Goal:** build a machine learning model to predict if a passenger survived the sinking of the Titanic or not.
For each instance in the test set, you must predict a 0 or 1 value for the target variable (Classifier).

## Metric
Submissions are evaluated on **accuracy**. The score is the percentage of passengers you correctly predict (this is known as accuracy).

## Steps
1. **EDA**
2. **Feature Engineering**
- **Family Size:** Larger families might have different survival rates compared to solo travelers.
- **Person's Title:** (ex. Ms, Mr) Titles can provide insight into age, gender, and social status, which might affect survival chances.
- **Cabin Deck:** The deck could correlate with proximity to lifeboats and thus survival rates.
- **Cabin Assigned:** Passengers who have not been assigned a cabin might have different survival probabilities compared to those with recorded cabin details.
- **Age Group:** Different age groups might have had different survival probabilities.
- **Fare Price Groups:** I will create different groups of fare price, which can capture non-linear relationships between fare and survival.
- **Name Length:** Especially in the early 1900s a person with a longer name could indicate importance which can impact survival rate
4. **Preprocessing**
1. Dealing With Nulls
2. Split the Data
3. Create Pipelines + Transform Columns
5. **Visualize and Understand Data**
- Histogram
- KDE
- Pie Chart
- Heatmap
6. **Define Models**
- I created 5 models:
- Model 1: Random Forest Classifier
- Model 2: Logistic Regression
- Model 3: K-Nearest Neighbours
- Model 4: XGBoost
- Model 5: Adaptive Boost
7. **Create Competition Submission**

## Result of Model Evaluations
### Model 1: Random Forest Regressor
- Best Score: 0.834
- Correct: 138
- Incorrect: 41
### Model 2: Logistic Regression
- Best Score: 0.795
- Correct: 141
- Incorrect: 38
### Model 3: K-Nearest Neighbours
- Best Score: 0.829
- Correct: 136
- Incorrect: 43
### Model 4: XGBoost
- Best Score: 0.803
- Correct: 137
- Incorrect: 42
### Model 5: Adaptive Boost
- Best Score: 0.819
- Correct: 137
- Incorrect: 42

## Competition Scores (Best to Worst)
- Model 4: XGBoost - **0.76555**
- Model 2: Logistic Regression - **0.76794**
- Model 5: Adaptive Boost - **0.77751**
- Model 3: K-Nearest Neighbours - **0.77990**
- Model 1: Random Forest Regressor - **0.78229**

## Data
The dataset used in this project is available publicly on Kaggle: [https://www.kaggle.com/competitions/titanic/data](https://www.kaggle.com/competitions/titanic/data)

## Technologies
Python
- pandas, numpy, matplotlib, seaborn
- sklearn (OrdinalEncoder, OneHotEncoder, SimpleImputer, make_column_transformer, ColumnTransformer, Pipeline, LogisticRegression, DecisionTreeClassifier, KNeighborsClassifier, RandomForestClassifier, AdaBoostClassifier, cross_val_score, GridSearchCV, ConfusionMatrixDisplay)
- XGBoost

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/johannaschmidle/titanicsurvival

Awesome Lists containing this project

README