https://github.com/vitornegromonte/eda_stroke
Exploratory data analysis in the stroke prediction dataset
https://github.com/vitornegromonte/eda_stroke
data-analysis data-science exploratory-data-analysis kaggle-dataset visualization
Last synced: 2 months ago
JSON representation
Exploratory data analysis in the stroke prediction dataset
- Host: GitHub
- URL: https://github.com/vitornegromonte/eda_stroke
- Owner: vitornegromonte
- Created: 2023-04-29T11:43:42.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-14T19:20:33.000Z (about 3 years ago)
- Last Synced: 2025-03-09T22:41:23.489Z (over 1 year ago)
- Topics: data-analysis, data-science, exploratory-data-analysis, kaggle-dataset, visualization
- Language: Jupyter Notebook
- Homepage:
- Size: 2.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Stroke Prediction Dataset - Exploratory Data Analysis
## 👋| About
This repository contains a project on Exploratory Data Analysis (EDA) applied to the [Stroke Prediction Dataset](https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset). The goal is to investigate and understand the patterns and characteristics of the data.
🔬 | Project description
The project utilizes R and Python for conducting the analysis of the dataset. R is employed for the unidimensional analysis of qualitative variables, making use of its extensive range of packages specialized in data visualization and analysis. The main packages utilized in this phase are as follows:
- tidyverse
- esquisse
- ggthemes
- data.table
- outliers
- BHH2
- latex2exp
- moments
- modeest
For the unidimensional analysis of quantitative variables and the bidimensional analysis, Python is preferred due to its versatility in handling data analysis tasks. Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn are utilized extensively. These libraries provide robust data manipulation capabilities, statistical functions, and advanced visualization capabilities. The bidimensional analysis is conducted using Python, leveraging its extensive data manipulation capabilities and the rich set of statistical and visualization libraries.

**Note:** The project requires **R**, **Python** and **Jupyer** to be installed on your system along with the necessary [packages](R/packages.R) mentioned above. To install them you can use:
[](https://cran.r-project.org/mirrors.html)
[](https://www.python.org/downloads/)
[](https://jupyter.org/install)
## 🌱 | Getting Started
1. Clone the repository:
```shell
git clone https://github.com/vitornegromonte/EDA_stroke.git
```
2. Navigate to the project directory:
```shell
cd EDA_stroke
```
3. For run the R scripts execute the file `R/packages.R` to install all the required packages.