https://github.com/poglolopez/nesarc_research
Analyzing the relationship between Social Anxiety Disorder (SAD) and family history of behavioral problems using NESARC data. Includes statistical hypothesis testing (ANOVA, Chi-Square, Pearson Correlation, Moderation Analysis). Developed as part of the Data Analysis and Interpretation Specialization from Wesleyan University (Coursera).
https://github.com/poglolopez/nesarc_research
anova chi-square coursera-assignment data-analysis hypothesis-testing mental-health moderation-analysis nesarc pandas pearson-correlation python social-anxiety statistical-analysis
Last synced: 2 months ago
JSON representation
Analyzing the relationship between Social Anxiety Disorder (SAD) and family history of behavioral problems using NESARC data. Includes statistical hypothesis testing (ANOVA, Chi-Square, Pearson Correlation, Moderation Analysis). Developed as part of the Data Analysis and Interpretation Specialization from Wesleyan University (Coursera).
- Host: GitHub
- URL: https://github.com/poglolopez/nesarc_research
- Owner: PogloLopez
- License: mit
- Created: 2025-02-25T01:47:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-25T03:04:22.000Z (over 1 year ago)
- Last Synced: 2025-03-05T13:42:28.918Z (over 1 year ago)
- Topics: anova, chi-square, coursera-assignment, data-analysis, hypothesis-testing, mental-health, moderation-analysis, nesarc, pandas, pearson-correlation, python, social-anxiety, statistical-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 427 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π§ NESARC Research: Social Anxiety Disorder (SAD) & Family Behavioral History
*A data-driven exploration of the relationship between SAD severity and family history of behavioral problems.*
## π Overview
This project was developed as part of the **Data Analysis and Interpretation Specialization** from **Wesleyan University** on **Coursera**. It explores the relationship between **Social Anxiety Disorder (SAD)** and a family history of behavioral problems using data from the **National Epidemiologic Survey on Alcohol and Related Conditions (NESARC)**. The analysis applies **statistical hypothesis testing**, including **ANOVA, Chi-Square test, Pearson Correlation, and moderation analysis (Gender as a moderator)** to assess these relationships.
## π Dataset
- **Source**: [National Epidemiologic Survey on Alcohol and Related Conditions (NESARC)](https://www.nesarc.org)
- **File**: `source/NESARC Dataset.csv`
- **Size**: 252.61 MB *(tracked with [Git LFS](https://git-lfs.github.com/))*
- **Key Variables**:
| Variable | Type | Description |
|---------------------------|---------------|-----------------------------------------------------------------------------|
| `SAD_score` | Numerical | Composite score derived from SAD-related survey responses. |
| `SAD_spectrum` | Categorical | **Low (β€2)**, **Medium (2-5)**, **High (>5)** severity categories. |
| `behavior_problems_count` | Numerical | Number of relatives with behavioral problems. |
| `relatives_with_problems` | Binary (Y/N) | Presence of β₯1 relative with behavioral problems. |
## π― Objectives
1. **Primary**: Determine if family history of behavioral problems correlates with higher SAD severity.
2. **Secondary**: Assess if **gender** moderates this relationship.
## π Methodology
### 1. Data Preprocessing
- **Cleaning**: Removed missing values and standardized variables.
- **Feature Engineering**:
- Created `SAD_score` from symptom-related survey responses.
- Binned `SAD_score` into **Low/Medium/High** categories.
- Derived `relatives_with_problems` from `behavior_problems_count`.
### 2. Statistical Analysis
#### **ANOVA: SAD Spectrum vs. Behavior Problems Count**
A one-way ANOVA revealed significant differences in behavior problems across SAD severity groups:
| Source | Sum Sq | df | F | p-value |
|-----------------|----------|-----|--------|---------------|
| SAD_spectrum | 125.78 | 2 | 22.24 | **2.48e-10** |
| Residual | 10919.14 | 3862| β | β |
**Post-hoc Tukey's HSD**: All group pairs showed significant differences (p < 0.05).
#### **Chi-Square Test: Relatives With Problems vs. SAD Spectrum**
- **ΟΒ² = 34.56** *(p = 3.13e-08)*
- **CramΓ©r's V = 0.095** (small effect size)
*Conclusion*: Significant association between family history and SAD severity.
#### **Pearson Correlation: SAD Score vs. Behavior Problems Count**
- **r = 0.08** *(p = 3.82e-07)*
- **rΒ² = 0.0067** (0.67% variance explained)
*Conclusion*: Weak but statistically significant correlation.
#### **Moderation Analysis: Gender as a Moderator**
An ANOVA with interaction terms tested if gender moderates the SAD-behavior relationship
**Key Findings**:
- Interaction term **p = 0.187** β Gender does not significantly moderate the relationship.
- Main effects of `SAD_spectrum` remain significant.
## π Repository Structure
```
NESARC_research/
βββ source/ # Raw data (tracked via Git LFS)
β βββ NESARC Dataset.csv
βββ .gitattributes # Git LFS configuration
βββ DMV.ipynb # Jupyter Notebook (full analysis)
βββ LICENCE # MIT Licence
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
```
## π οΈ Installation & Usage
### 1. Clone the Repository
```bash
git clone https://github.com/PogloLopez/nesarc_research.git
cd nesarc_research
```
### 2. Install Git LFS & Download Data
```bash
git lfs install # Set up Git LFS
git lfs pull # Download dataset
```
### 3. Set Up a Virtual Environment
```bash
python -m venv .venv
source .venv/bin/activate # Mac/Linux
.\.venv\Scripts\activate # Windows
```
### 4. Install Dependencies
```bash
pip install -r requirements.txt
```
### 5. Run the Analysis
Launch Jupyter Notebook:
```bash
jupyter notebook DMV.ipynb
```
## π License
This project is licensed under the **MIT License**. See [LICENSE](LICENSE) for details.
---
*π‘ For questions or collaborations, contact [Pablo LΓ³pez](mailto:poglolopez@gmail.com) or connect on [LinkedIn](https://linkedin.com/in/pablo-a-lopez-s).*