Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fabriciocarraro/kaggle-titanic_survivor_prediction
Desafio do Titanic do Kaggle
https://github.com/fabriciocarraro/kaggle-titanic_survivor_prediction
Last synced: about 2 months ago
JSON representation
Desafio do Titanic do Kaggle
- Host: GitHub
- URL: https://github.com/fabriciocarraro/kaggle-titanic_survivor_prediction
- Owner: fabriciocarraro
- Created: 2024-03-03T21:37:58.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-28T23:17:27.000Z (4 months ago)
- Last Synced: 2024-08-29T00:29:32.028Z (4 months ago)
- Language: Jupyter Notebook
- Size: 295 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Kaggle Challenge - Titanic Survival Prediction
This project uses machine learning to predict passenger survival on the Titanic. It demonstrates data preprocessing, exploratory data analysis, and the application of a Random Forest Classifier to make predictions.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Features](#features)
- [Data Preprocessing](#data-preprocessing)
- [Exploratory Data Analysis](#exploratory-data-analysis)
- [Model Training and Evaluation](#model-training-and-evaluation)
- [Making Predictions](#making-predictions)## Prerequisites
- Python 3.x
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn## Installation
1. Clone this repository:
```
git clone https://github.com/yourusername/titanic-survival-prediction.git
```
2. Navigate to the project directory:
```
cd titanic-survival-prediction
```
3. Install the required packages:
```
pip install -r requirements.txt
```## Usage
1. Ensure you have the Titanic dataset files (`train.csv` and `test.csv`) in the project directory.
2. Run the main script:
```
python titanic_prediction.py
```## Features
- Data cleaning and preprocessing
- Exploratory data analysis with visualizations
- Feature engineering
- Model training using Random Forest Classifier
- Cross-validation for model evaluation
- Prediction on test data## Data Preprocessing
- Handling missing values in 'Age' and 'Embarked' columns
- Creating new features: 'Sex_binary', 'AgeGroup', 'Fare_level', 'Title', 'Fam_size'
- Encoding categorical variables## Exploratory Data Analysis
The script includes several visualization functions to explore the relationships between features and survival rates:
- `bar_chart_stacked()`: Creates stacked bar charts for categorical features
- `bar_chart_compare()`: Compares survival rates across different feature combinations
- `plot_distribution()`: Visualizes the distribution of numerical features## Model Training and Evaluation
- Uses Random Forest Classifier with 500 estimators and max depth of 5
- Performs cross-validation using RepeatedKFold with 2 splits and 10 repeats
- Calculates and displays accuracy scores for each fold and the mean accuracy## Making Predictions
The trained model is used to make predictions on the test dataset. The results are saved in a CSV file named 'Titanic_submission2.csv'.
---
Feel free to contribute to this project by opening issues or submitting pull requests!