An open API service indexing awesome lists of open source software.

https://github.com/busradeveci/titanic-randomforest-v1

Titanic survival prediction using Random Forest classifier as part of Kaggle's beginner-friendly competition.
https://github.com/busradeveci/titanic-randomforest-v1

beginner-project classification data-science kaggle machine-learning python random-forest titanic-dataset

Last synced: 5 months ago
JSON representation

Titanic survival prediction using Random Forest classifier as part of Kaggle's beginner-friendly competition.

Awesome Lists containing this project

README

          

# Titanic - Random Forest (v1)

This repository contains my solution to the classic Kaggle competition: **Titanic - Machine Learning from Disaster**. The goal is to predict which passengers survived the Titanic shipwreck using a classification model.

---

## ๐Ÿ“Š Overview

- **Competition**: [Titanic - Machine Learning from Disaster](https://www.kaggle.com/competitions/titanic)
- **Model**: Random Forest Classifier
- **Public Score**: `0.76076`
- **Best Score**: `0.76076` (Version 1)

---

## ๐Ÿ“ Dataset

The dataset includes passenger details such as age, gender, ticket class, number of siblings/spouses aboard, and fare. These features were used to build the model.

---

## ๐Ÿงน Data Preprocessing

The following preprocessing steps were applied:

- Dropped unnecessary columns: `PassengerId`, `Name`, `Ticket`, `Cabin`
- Filled missing values:
- `Age`: Filled with median
- `Embarked`: Filled with mode (`'S'`)
- `Fare`: Filled with median (only in test set)
- Converted categorical variables:
- `Sex`: Binary mapping
- `Embarked`: One-Hot Encoding

---

## ๐Ÿค– Model

- **Algorithm**: `RandomForestClassifier` from `sklearn.ensemble`
- **Training-Validation Split**: 80% training / 20% validation
- **Selected Features**:
- `Pclass`
- `Sex`
- `Age`
- `SibSp`
- `Parch`
- `Fare`
- One-hot encoded `Embarked`

The model was trained and evaluated using basic performance metrics.

---

## ๐Ÿ“ˆ Results

- Achieved a public Kaggle score of **0.76076**
- This was the first version of the model and performed well on the leaderboard.

---

## ๐Ÿš€ Next Steps

Planned improvements and experiments:

- Try other models (e.g., Logistic Regression, XGBoost)
- Perform hyperparameter tuning using GridSearchCV
- Use feature importance to select or engineer better features
- Consider using cross-validation for more reliable evaluation

---

## ๐Ÿ”— Resources

- ๐Ÿ““ Kaggle Notebook: [Titanic - Random Forest v1](https://www.kaggle.com/code/busradeveci/titanic-randomforest-v1)
- ๐Ÿ† Competition Page: [Kaggle Titanic](https://www.kaggle.com/competitions/titanic)

---

## ๐Ÿง‘โ€๐Ÿ’ป Author
Kaggle: [kaggle.com/busradeveci](https://www.kaggle.com/busradeveci)