https://github.com/asma-hachaichi/titanic-dataset-analyst

Exploratory and predictive analysis of the Titanic dataset using Python. This project encompasses data cleaning, visualization, feature engineering, and machine learning to predict survival rates.
https://github.com/asma-hachaichi/titanic-dataset-analyst

Last synced: 2 months ago
JSON representation

Exploratory and predictive analysis of the Titanic dataset using Python. This project encompasses data cleaning, visualization, feature engineering, and machine learning to predict survival rates.

Host: GitHub
URL: https://github.com/asma-hachaichi/titanic-dataset-analyst
Owner: asma-hachaichi
Created: 2024-02-02T18:28:31.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-02-28T16:25:29.000Z (over 1 year ago)
Last Synced: 2025-02-06T09:36:00.210Z (4 months ago)
Size: 535 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Titanic Dataset Analysis

## Overview

This project explores the infamous Titanic dataset to uncover insights into the tragic sinking of the Titanic and predict survival outcomes of its passengers. Using Python and various data science libraries, the analysis encompasses data cleaning, exploratory data analysis (EDA), feature engineering, and predictive modeling.

## Dataset

The dataset contains passenger information from the Titanic, including demographic data, ticket details, and survival status. Key columns include:

- `Survived`: Survival (0 = No, 1 = Yes)
- `Pclass`: Ticket class
- `Name`
- `Sex`
- `Age`
- `SibSp`: Number of siblings/spouses aboard
- `Parch`: Number of parents/children aboard
- `Ticket`
- `Fare`
- `Cabin`
- `Embarked`: Port of embarkation

## Files

- `Titanic-Dataset-Analysis.ipynb`: Jupyter notebook containing the full analysis, from data preprocessing to model training and evaluation.
- `train.csv`: Dataset used for the train.
- `test.csv`: Dataset used for the test.

## Tools and Libraries

- **Python:** Programming language used for the analysis.
- **Pandas:** Data manipulation and analysis.
- **NumPy:** Numerical computing.
- **Matplotlib & Seaborn:** Data visualization.
- **Scikit-learn:** Machine learning library used for preprocessing data and model training.

## Key Findings and Insights

- **Survival Rate Analysis:** Initial exploratory data analysis provided insights into survival rates based on gender, class, and age.
- **Feature Importance:** Feature engineering revealed that certain features, such as gender and passenger class, significantly impacted survival chances.
- **Model Performance:** Model training and evaluation of accuracy.

## How to Run

Ensure Python and the required packages are installed. You can install the dependencies by uncommenting the first code cell of the notebook.
Open the `Titanic-Dataset-Analysis.ipynb` notebook in a Jupyter environment to view and run the analysis.

## Conclusion

This project highlights the power of data science in uncovering hidden patterns and making predictions. The analysis of the Titanic dataset not only provides historical insights but also demonstrates various data science techniques, from data cleaning and EDA to predictive modeling.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/asma-hachaichi/titanic-dataset-analyst

Awesome Lists containing this project

README