Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adadalshabab/advanced-eda
The project aims to provide a comprehensive understanding of the dataset related to heart disease and build predictive models to classify whether a patient is likely to have heart disease based on various attributes.
https://github.com/adadalshabab/advanced-eda
datanalysis datanalytics datavisualization dataviz eda machine-learning matplotlib
Last synced: about 1 month ago
JSON representation
The project aims to provide a comprehensive understanding of the dataset related to heart disease and build predictive models to classify whether a patient is likely to have heart disease based on various attributes.
- Host: GitHub
- URL: https://github.com/adadalshabab/advanced-eda
- Owner: AdadAlShabab
- Created: 2024-03-30T18:29:02.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-03-30T18:31:46.000Z (9 months ago)
- Last Synced: 2024-03-30T19:36:02.386Z (9 months ago)
- Topics: datanalysis, datanalytics, datavisualization, dataviz, eda, machine-learning, matplotlib
- Language: Jupyter Notebook
- Homepage:
- Size: 1.11 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Advanced EDA-Heart Disease Prediction
This repository contains an advanced exploratory data analysis (EDA) and heart disease prediction project. The project aims to provide a comprehensive understanding of the dataset related to heart disease and build predictive models to classify whether a patient is likely to have heart disease based on various attributes.
## Dataset
The dataset used in this project is sourced from https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset , containing a collection of clinical parameters for individuals, including demographic information, medical test results, and lifestyle factors. It comprises both numerical and categorical features, which are crucial in understanding the factors influencing heart disease.
## Project Structure
The project is structured as follows:
- **data:** This directory contains the dataset used for analysis and modeling.
- **notebooks:** This directory contains Jupyter notebooks outlining the exploratory data analysis process, feature engineering, model development, and evaluation.
- **src:** This directory contains any source code used in the project, such as utility functions or custom modules.
- **reports:** This directory contains any reports generated from the analysis, including visualizations, summary statistics, and model evaluation metrics.## Exploratory Data Analysis (EDA)
The EDA process involves thorough examination and visualization of the dataset to gain insights into the relationships between variables, identify patterns, anomalies, and potential predictive features related to heart disease. The notebooks in the `notebooks` directory detail the steps taken in the EDA process.
Key components of the EDA process include:
- Descriptive statistics
- Data cleaning and preprocessing
- Univariate and multivariate analysis
- Visualization of distributions, correlations, and trends
- Identification of outliers and missing values
- Feature engineering and selection## Heart Disease Prediction
Following the EDA, predictive models are developed to classify whether an individual is likely to have heart disease based on the available features. Various machine learning algorithms such as logistic regression, decision trees, random forests, support vector machines, or neural networks may be explored and evaluated for their predictive performance.
The prediction task involves:
- Data splitting into training and testing sets
- Model training using different algorithms
- Hyperparameter tuning for optimizing model performance
- Model evaluation using appropriate metrics such as accuracy, precision, recall, F1-score, and ROC-AUC curve## Getting Started
To get started with this project, follow these steps:
1. Clone this repository to your local machine.
2. Navigate to the `notebooks` directory to explore the Jupyter notebooks containing the analysis and modeling process.
3. Ensure you have the necessary dependencies installed. You can install them via pip using the `requirements.txt` file.```bash
pip install -r requirements.txt
```4. Run the notebooks sequentially to replicate the analysis and understand the steps involved.
## Requirements
The project requires Python 3.11 along with various libraries such as pandas, numpy, matplotlib, seaborn, scikit-learn, etc. These dependencies are listed in the `requirements.txt` file.
## Contributing
Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.
---
Feel free to customize this README according to your project's specifics and preferences.