https://github.com/linconavila/correlation-regression-analysis-research-project
https://github.com/linconavila/correlation-regression-analysis-research-project
correlation-analysis r regression-analysis statistics
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/linconavila/correlation-regression-analysis-research-project
- Owner: LinconAvila
- License: other
- Created: 2025-01-13T23:49:20.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-14T01:32:05.000Z (over 1 year ago)
- Last Synced: 2025-02-14T02:36:04.459Z (over 1 year ago)
- Topics: correlation-analysis, r, regression-analysis, statistics
- Language: R
- Homepage:
- Size: 15.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Correlation and Regression Analysis Research Project
This repository contains the code and documentation for a research project analyzing the correlation and regression between the Human Development Index (HDI) and Life Expectancy. The analysis is performed using R and employs statistical techniques such as descriptive analysis, hypothesis testing, and regression modeling.
## Project Overview
This project is divided into two main stages:
1. **Exploratory Data Analysis (EDA)**:
- Analysis of HDI and Life Expectancy for the 50 highest-ranked countries globally.
- Univariate analysis: Includes measures like mean, median, standard deviation, and skewness for individual variables.
- Bivariate analysis: Examines relationships between HDI and Life Expectancy using scatterplots and correlation coefficients.
2. **Regression Analysis**:
- Construction and evaluation of a linear regression model to describe the relationship between HDI (independent variable) and Life Expectancy (dependent variable).
- Assessment of model assumptions, including residual independence, normality, and homoscedasticity.
- Statistical tests to evaluate the significance of regression coefficients and overall model fit.
## Repository Contents
### Code
- **`script.r`**: Contains code for data preprocessing, exploratory data analysis, and visualization.
- **`script2.r`**: Implements linear regression modeling, statistical testing (ANOVA, Shapiro-Wilk, Ljung-Box), and model diagnostics.
### Database
- **`database.txt`**: Includes HDI and Life Expectancy data for the top 50 countries globally, sourced from:
- **Human Development Index (HDI)**: Extracted from the UNDP Human Development Report 2022, which evaluates development based on health, education, and living standards.
- **Life Expectancy**: Collected from the CIA World Factbook 2022, representing the average years a person is expected to live at birth.
### Research Papers and Slides
- Link to Folder: [Here](https://drive.google.com/drive/folders/1Fd6u5p-lweRe2Og5dW7qOepyhS0gsz8Y?usp=sharing)
This folder contains research papers and a slide presentation (in Portuguese) summarizing the key findings of the study.
## Key Findings
1. **Correlation Analysis**:
- Pearson's correlation coefficient `r = 0.668` indicates a moderate positive relationship between HDI and Life Expectancy.
- Hypothesis testing confirmed the statistical significance of this relationship (`p-value < 0.05`).
2. **Regression Model**:
- Linear regression equation: `Life Expectancy = 38.315 + 46.950 * HDI`.
- `R² = 0.4465`: About 44.65% of the variation in Life Expectancy is explained by HDI.
- Residual analysis showed no significant violations of model assumptions.
3. **Model Limitations**:
- The model explains only 44.65% of the variance, suggesting other factors influence Life Expectancy.
- Future analyses could incorporate additional socioeconomic and environmental variables.
## Requirements
- **R version**: `>= 4.0.0`
- **R libraries**: `ggplot2`, `dplyr`, `car`, `lmtest`
## Usage
1. Clone the repository:
```bash
git clone https://github.com/LinconAvila/Correlation-Regression-Analysis-Research-Project.git
cd Correlation-Regression-Analysis-Research-Project
```
2. Open the scripts (`script.r` and `script2.r`) in RStudio.
3. Run the scripts sequentially to reproduce the analyses and generate the visualizations.
## References
- CIA World Factbook (2022): Life Expectancy data. [https://www.cia.gov/the-world-factbook/about/archives/2022/field/life-expectancy-at-birth/country-comparison](https://www.cia.gov/the-world-factbook/about/archives/2022/field/life-expectancy-at-birth/country-comparison)
- UNDP (2022): Human Development Index. [https://hdr.undp.org/system/files/documents/global-report-document/hdr2021-22overviewen.pdf](https://hdr.undp.org/system/files/documents/global-report-document/hdr2021-22overviewen.pdf)
- R Documentation: [https://www.rdocumentation.org/](https://www.rdocumentation.org/)
## Author
Lincon Avila de Souza
Fundação Universidade Federal do Rio Grande (FURG)
2024-2025