https://github.com/busradeveci/titanic-randomforest-v1

Titanic survival prediction using Random Forest classifier as part of Kaggle's beginner-friendly competition.
https://github.com/busradeveci/titanic-randomforest-v1

beginner-project classification data-science kaggle machine-learning python random-forest titanic-dataset

Last synced: 5 months ago
JSON representation

Titanic survival prediction using Random Forest classifier as part of Kaggle's beginner-friendly competition.

Host: GitHub
URL: https://github.com/busradeveci/titanic-randomforest-v1
Owner: busradeveci
Created: 2025-05-13T15:00:23.000Z (5 months ago)
Default Branch: master
Last Pushed: 2025-05-13T15:12:44.000Z (5 months ago)
Last Synced: 2025-05-13T16:32:36.095Z (5 months ago)
Topics: beginner-project, classification, data-science, kaggle, machine-learning, python, random-forest, titanic-dataset
Language: Jupyter Notebook
Homepage:
Size: 9.77 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Titanic - Random Forest (v1)

This repository contains my solution to the classic Kaggle competition: **Titanic - Machine Learning from Disaster**. The goal is to predict which passengers survived the Titanic shipwreck using a classification model.

---

## 📊 Overview

- **Competition**: [Titanic - Machine Learning from Disaster](https://www.kaggle.com/competitions/titanic)
- **Model**: Random Forest Classifier
- **Public Score**: `0.76076`
- **Best Score**: `0.76076` (Version 1)

---

## 📁 Dataset

The dataset includes passenger details such as age, gender, ticket class, number of siblings/spouses aboard, and fare. These features were used to build the model.

---

## 🧹 Data Preprocessing

The following preprocessing steps were applied:

- Dropped unnecessary columns: `PassengerId`, `Name`, `Ticket`, `Cabin`
- Filled missing values:
- `Age`: Filled with median
- `Embarked`: Filled with mode (`'S'`)
- `Fare`: Filled with median (only in test set)
- Converted categorical variables:
- `Sex`: Binary mapping
- `Embarked`: One-Hot Encoding

---

## 🤖 Model

- **Algorithm**: `RandomForestClassifier` from `sklearn.ensemble`
- **Training-Validation Split**: 80% training / 20% validation
- **Selected Features**:
- `Pclass`
- `Sex`
- `Age`
- `SibSp`
- `Parch`
- `Fare`
- One-hot encoded `Embarked`

The model was trained and evaluated using basic performance metrics.

---

## 📈 Results

- Achieved a public Kaggle score of **0.76076**
- This was the first version of the model and performed well on the leaderboard.

---

## 🚀 Next Steps

Planned improvements and experiments:

- Try other models (e.g., Logistic Regression, XGBoost)
- Perform hyperparameter tuning using GridSearchCV
- Use feature importance to select or engineer better features
- Consider using cross-validation for more reliable evaluation

---

## 🔗 Resources

- 📓 Kaggle Notebook: [Titanic - Random Forest v1](https://www.kaggle.com/code/busradeveci/titanic-randomforest-v1)
- 🏆 Competition Page: [Kaggle Titanic](https://www.kaggle.com/competitions/titanic)

---

## 🧑‍💻 Author
Kaggle: [kaggle.com/busradeveci](https://www.kaggle.com/busradeveci)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/busradeveci/titanic-randomforest-v1

Awesome Lists containing this project

README