https://github.com/busradeveci/titanic-randomforest-v1
Titanic survival prediction using Random Forest classifier as part of Kaggle's beginner-friendly competition.
https://github.com/busradeveci/titanic-randomforest-v1
beginner-project classification data-science kaggle machine-learning python random-forest titanic-dataset
Last synced: 5 months ago
JSON representation
Titanic survival prediction using Random Forest classifier as part of Kaggle's beginner-friendly competition.
- Host: GitHub
- URL: https://github.com/busradeveci/titanic-randomforest-v1
- Owner: busradeveci
- Created: 2025-05-13T15:00:23.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-05-13T15:12:44.000Z (5 months ago)
- Last Synced: 2025-05-13T16:32:36.095Z (5 months ago)
- Topics: beginner-project, classification, data-science, kaggle, machine-learning, python, random-forest, titanic-dataset
- Language: Jupyter Notebook
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Titanic - Random Forest (v1)
This repository contains my solution to the classic Kaggle competition: **Titanic - Machine Learning from Disaster**. The goal is to predict which passengers survived the Titanic shipwreck using a classification model.
---
## ๐ Overview
- **Competition**: [Titanic - Machine Learning from Disaster](https://www.kaggle.com/competitions/titanic)
- **Model**: Random Forest Classifier
- **Public Score**: `0.76076`
- **Best Score**: `0.76076` (Version 1)---
## ๐ Dataset
The dataset includes passenger details such as age, gender, ticket class, number of siblings/spouses aboard, and fare. These features were used to build the model.
---
## ๐งน Data Preprocessing
The following preprocessing steps were applied:
- Dropped unnecessary columns: `PassengerId`, `Name`, `Ticket`, `Cabin`
- Filled missing values:
- `Age`: Filled with median
- `Embarked`: Filled with mode (`'S'`)
- `Fare`: Filled with median (only in test set)
- Converted categorical variables:
- `Sex`: Binary mapping
- `Embarked`: One-Hot Encoding---
## ๐ค Model
- **Algorithm**: `RandomForestClassifier` from `sklearn.ensemble`
- **Training-Validation Split**: 80% training / 20% validation
- **Selected Features**:
- `Pclass`
- `Sex`
- `Age`
- `SibSp`
- `Parch`
- `Fare`
- One-hot encoded `Embarked`The model was trained and evaluated using basic performance metrics.
---
## ๐ Results
- Achieved a public Kaggle score of **0.76076**
- This was the first version of the model and performed well on the leaderboard.---
## ๐ Next Steps
Planned improvements and experiments:
- Try other models (e.g., Logistic Regression, XGBoost)
- Perform hyperparameter tuning using GridSearchCV
- Use feature importance to select or engineer better features
- Consider using cross-validation for more reliable evaluation---
## ๐ Resources
- ๐ Kaggle Notebook: [Titanic - Random Forest v1](https://www.kaggle.com/code/busradeveci/titanic-randomforest-v1)
- ๐ Competition Page: [Kaggle Titanic](https://www.kaggle.com/competitions/titanic)---
## ๐งโ๐ป Author
Kaggle: [kaggle.com/busradeveci](https://www.kaggle.com/busradeveci)