https://github.com/yousuf1733/titanic-dataset-analysis
Exploratory data analysis of the Titanic dataset, uncovering insights on passenger survival rates based on gender, age, and class. Includes data cleaning, visualization, and findings.
https://github.com/yousuf1733/titanic-dataset-analysis
exploratory-data-analysis hadoop-mapreduce linear-discriminant-analysis machine-learning plotly-python powerquery predictive-analytics python random-forest scipy sql sqlite statistical-analysis titanic-dataset
Last synced: 20 days ago
JSON representation
Exploratory data analysis of the Titanic dataset, uncovering insights on passenger survival rates based on gender, age, and class. Includes data cleaning, visualization, and findings.
- Host: GitHub
- URL: https://github.com/yousuf1733/titanic-dataset-analysis
- Owner: Yousuf1733
- Created: 2025-04-22T10:33:32.000Z (21 days ago)
- Default Branch: main
- Last Pushed: 2025-04-23T14:41:24.000Z (20 days ago)
- Last Synced: 2025-04-23T15:16:41.763Z (20 days ago)
- Topics: exploratory-data-analysis, hadoop-mapreduce, linear-discriminant-analysis, machine-learning, plotly-python, powerquery, predictive-analytics, python, random-forest, scipy, sql, sqlite, statistical-analysis, titanic-dataset
- Language: Jupyter Notebook
- Size: 71.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Titanic Dataset Analysis 🚢🔍

Welcome to the **Titanic Dataset Analysis** repository! This project dives into the Titanic dataset, revealing insights about passenger survival rates based on gender, age, and class. Through exploratory data analysis, we clean the data, visualize trends, and present our findings.
## Table of Contents
- [Introduction](#introduction)
- [Dataset Overview](#dataset-overview)
- [Installation](#installation)
- [Usage](#usage)
- [Data Cleaning](#data-cleaning)
- [Data Visualization](#data-visualization)
- [Findings](#findings)
- [Contributing](#contributing)
- [License](#license)
- [Contact](#contact)
- [Releases](#releases)## Introduction
The Titanic dataset is a well-known dataset that provides valuable insights into passenger demographics and survival rates from the infamous sinking of the RMS Titanic in 1912. By analyzing this data, we can uncover patterns that help us understand the factors influencing survival.
## Dataset Overview
The dataset contains various features, including:
- **PassengerId**: Unique identifier for each passenger.
- **Survived**: Survival status (0 = No, 1 = Yes).
- **Pclass**: Ticket class (1st, 2nd, 3rd).
- **Name**: Name of the passenger.
- **Sex**: Gender of the passenger.
- **Age**: Age of the passenger.
- **SibSp**: Number of siblings/spouses aboard.
- **Parch**: Number of parents/children aboard.
- **Ticket**: Ticket number.
- **Fare**: Fare paid for the ticket.
- **Cabin**: Cabin number.
- **Embarked**: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).## Installation
To get started with this analysis, you need to clone the repository and install the required libraries.
1. Clone the repository:
```bash
git clone https://github.com/Yousuf1733/Titanic-Dataset-Analysis.git
```2. Navigate to the project directory:
```bash
cd Titanic-Dataset-Analysis
```3. Install the required libraries:
```bash
pip install -r requirements.txt
```## Usage
After installing the necessary libraries, you can run the analysis script to explore the dataset.
```bash
python titanic_analysis.py
```This will execute the analysis and generate visualizations based on the dataset.
## Data Cleaning
Data cleaning is a crucial step in any analysis. In this project, we address missing values, incorrect data types, and outliers.
### Steps taken:
- **Handling Missing Values**: We impute missing ages with the median age and drop rows with missing embarked values.
- **Correcting Data Types**: We convert the 'Survived' and 'Pclass' columns to categorical data types for better analysis.
- **Removing Duplicates**: We check for and remove any duplicate entries in the dataset.## Data Visualization
Visualizing the data helps to identify trends and patterns. We use libraries like Matplotlib and Seaborn to create various plots.
### Key Visualizations:
- **Survival Rate by Gender**: A bar chart showing the survival rates for male and female passengers.
- **Age Distribution**: A histogram illustrating the age distribution of passengers.
- **Survival Rate by Class**: A stacked bar chart showing survival rates across different ticket classes.Here are some examples of the visualizations created:



## Findings
Through our analysis, we discovered several key insights:
1. **Gender Impact**: Females had a significantly higher survival rate compared to males.
2. **Age Factor**: Younger passengers had a better chance of survival.
3. **Class Influence**: Passengers in 1st class had the highest survival rates, while those in 3rd class had the lowest.These findings highlight the importance of demographic factors in survival during the Titanic disaster.
## Contributing
We welcome contributions to improve this project. If you would like to contribute, please follow these steps:
1. Fork the repository.
2. Create a new branch for your feature or bug fix.
3. Make your changes and commit them.
4. Push to your branch.
5. Create a pull request.## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Contact
For questions or feedback, please reach out to me:
- **Email**: [email protected]
- **GitHub**: [Yousuf1733](https://github.com/Yousuf1733)## Releases
To access the latest releases, please visit [Releases](https://github.com/Yousuf1733/Titanic-Dataset-Analysis/releases). You can download the latest version and execute the scripts for your analysis.
We hope you find this analysis helpful and insightful. Happy analyzing!