https://github.com/keneandita/exploratory-data-analysis-eda-
Explore EDA on 5 datasets: Titanic 🚢, Heart Disease ❤️, Wine Quality 🍷, Car Price 🚗, and NBA Players 🏀. Includes data cleaning, preprocessing, and visualizations to uncover insights. Perfect for beginners to learn data analysis with Pandas, Matplotlib, and Seaborn! 🎨📈
https://github.com/keneandita/exploratory-data-analysis-eda-
data-analysis data-visualization eda matplotlib pandas python seaborn sklearn
Last synced: 2 months ago
JSON representation
Explore EDA on 5 datasets: Titanic 🚢, Heart Disease ❤️, Wine Quality 🍷, Car Price 🚗, and NBA Players 🏀. Includes data cleaning, preprocessing, and visualizations to uncover insights. Perfect for beginners to learn data analysis with Pandas, Matplotlib, and Seaborn! 🎨📈
- Host: GitHub
- URL: https://github.com/keneandita/exploratory-data-analysis-eda-
- Owner: KeneanDita
- Created: 2025-03-09T18:51:00.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-15T14:31:21.000Z (over 1 year ago)
- Last Synced: 2025-03-15T14:33:59.192Z (over 1 year ago)
- Topics: data-analysis, data-visualization, eda, matplotlib, pandas, python, seaborn, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# EDA on Multiple Datasets
This repository contains exploratory data analysis (EDA) for five widely-used datasets, including Titanic, Heart Disease, Wine Quality, Car Price, and NBA Players. The primary objective is to clean, preprocess, and visualize each dataset to uncover valuable insights and prepare the data for further machine learning tasks.
## Table of Contents
- [EDA on Multiple Datasets](#eda-on-multiple-datasets)
- [Table of Contents](#table-of-contents)
- [Project Overview](#project-overview)
- [Key Features](#key-features)
- [Requirements](#requirements)
- [Getting Started](#getting-started)
- [Exploratory Data Analysis](#exploratory-data-analysis)
- [Results](#results)
- [Contributing](#contributing)
## Project Overview
Exploratory data analysis (EDA) is a critical step in the data science workflow. It helps to uncover patterns, detect anomalies, and discover relationships in the data. This project covers EDA for the following datasets:
- **Titanic Dataset** - Passenger survival data from the Titanic disaster.
- **Heart Disease Dataset** - Medical records indicating the presence of heart disease.
- **Wine Quality Dataset** - Chemical and sensory data of red and white wines.
- **Car Price Dataset** - Information on car features and their prices.
- **NBA Players Dataset** - Statistical data for NBA players.
## Key Features
- Data Cleaning and Preprocessing
- Data Visualization with Matplotlib and Seaborn
- Statistical Analysis and Insights
- Outlier Detection and Handling
- Correlation Analysis
- Feature Engineering
- Model training
## Requirements
- Python 3.10+
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- Jupyter Notebook (optional for interactive analysis)
Install the required packages using:
```bash
pip install -r requirements.txt
```
## Getting Started
Clone the repository and navigate to the project directory:
```bash
git clone https://github.com/KeneanDita/Exploratory-Data-Analysis-EDA-
cd Exploratory-Data-Analysis-EDA-
```
## Exploratory Data Analysis
Each dataset has a separate notebook that includes:
- Data loading and inspection
- Data cleaning and preprocessing
- Univariate and bivariate analysis
- Visualization and pattern discovery
- Key findings and observations
## Results
The insights from the analysis are documented in the respective pdf files under the `Steps to follow for each dataset/` directory.
## Contributing
Contributions are welcome! Please fork the repository and submit a pull request for any improvements.