https://github.com/vitornegromonte/eda_stroke

Exploratory data analysis in the stroke prediction dataset
https://github.com/vitornegromonte/eda_stroke

data-analysis data-science exploratory-data-analysis kaggle-dataset visualization

Last synced: 2 months ago
JSON representation

Exploratory data analysis in the stroke prediction dataset

Host: GitHub
URL: https://github.com/vitornegromonte/eda_stroke
Owner: vitornegromonte
Created: 2023-04-29T11:43:42.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-05-14T19:20:33.000Z (about 3 years ago)
Last Synced: 2025-03-09T22:41:23.489Z (over 1 year ago)
Topics: data-analysis, data-science, exploratory-data-analysis, kaggle-dataset, visualization
Language: Jupyter Notebook
Homepage:
Size: 2.7 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Stroke Prediction Dataset - Exploratory Data Analysis

## 👋| About
This repository contains a project on Exploratory Data Analysis (EDA) applied to the [Stroke Prediction Dataset](https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset). The goal is to investigate and understand the patterns and characteristics of the data.

🔬 | Project description

The project utilizes R and Python for conducting the analysis of the dataset. R is employed for the unidimensional analysis of qualitative variables, making use of its extensive range of packages specialized in data visualization and analysis. The main packages utilized in this phase are as follows:
- tidyverse
- esquisse
- ggthemes
- data.table
- outliers
- BHH2
- latex2exp
- moments
- modeest

For the unidimensional analysis of quantitative variables and the bidimensional analysis, Python is preferred due to its versatility in handling data analysis tasks. Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn are utilized extensively. These libraries provide robust data manipulation capabilities, statistical functions, and advanced visualization capabilities. The bidimensional analysis is conducted using Python, leveraging its extensive data manipulation capabilities and the rich set of statistical and visualization libraries.

![Programming Languages Diagram](lang_diagram.svg)

**Note:** The project requires **R**, **Python** and **Jupyer** to be installed on your system along with the necessary [packages](R/packages.R) mentioned above. To install them you can use:

[![R](https://img.shields.io/badge/version_4.3_or_higher-gray.svg?style=for-the-badge&logo=r&logoColor=white)](https://cran.r-project.org/mirrors.html)
[![Python](https://img.shields.io/badge/version_3.10_or_higher-gray?style=for-the-badge&logo=python&logoColor=blue)](https://www.python.org/downloads/)
[![Jupyter Notebook](https://img.shields.io/badge/jupyter_notebooks-gray.svg?style=for-the-badge&logo=jupyter&logoColor=orange)](https://jupyter.org/install)

## 🌱 | Getting Started
1. Clone the repository:
```shell
git clone https://github.com/vitornegromonte/EDA_stroke.git
```

2. Navigate to the project directory:
```shell
cd EDA_stroke
```

3. For run the R scripts execute the file `R/packages.R` to install all the required packages.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vitornegromonte/eda_stroke

Awesome Lists containing this project

README