An open API service indexing awesome lists of open source software.

https://github.com/gurpreet0022/unveiling-pcos

Data Driven approach to get insights about PCOS
https://github.com/gurpreet0022/unveiling-pcos

analysis eda insights matplotlib numpy pandas python3 scipy-stats seaborn visualisation

Last synced: 10 months ago
JSON representation

Data Driven approach to get insights about PCOS

Awesome Lists containing this project

README

          

# PCOS Data Analysis - Exploratory Data Analysis (EDA)

## πŸ“Œ Project Overview
Polycystic Ovary Syndrome (PCOS) is a prevalent hormonal disorder affecting women worldwide. This project focuses on analyzing PCOS data to uncover key insights related to symptoms, lifestyle factors, and potential associations.

## πŸ“Š Objectives
- Explore the prevalence of PCOS in different demographics.
- Identify common symptoms and their associations.
- Examine the impact of BMI, lifestyle, and mental health on PCOS.
- Use statistical methods to find meaningful correlations.

## πŸ“ About PCOS
PCOS is a hormonal disorder causing enlarged ovaries with small cysts. Symptoms include irregular periods, excessive androgen levels, weight gain, and insulin resistance. Understanding PCOS is crucial for early diagnosis and lifestyle management.

## πŸ“‚ Dataset Overview
- **Type:** Categorical-heavy dataset (mostly Yes/No values)
- **Key Features:**
- **Symptoms:** Menstrual irregularity, hormonal imbalance, hyperandrogenism, hirsutism, etc.
- **Lifestyle Factors:** Diet, exercise, and sleep habits
- **Health Metrics:** BMI, mental health status, family history

## πŸ› οΈ Tools & Technologies Used
- Python (Pandas, NumPy, Seaborn, Matplotlib)
- Jupyter Notebook for analysis
- Feature Engineering & Normalization
- CramΓ©r’s V for correlation analysis

## πŸ”¬ Methodology
### 1️⃣ Data Preprocessing
- Label encoding categorical values
- Feature engineering (Sleep Score, Diet Score, Exercise Score, Healthy Lifestyle Score)
- Normalization of numerical values

### 2️⃣ Exploratory Data Analysis (EDA)
- **PCOS Prevalence:** 22% of women in the dataset have PCOS
- **Common Symptoms:** Menstrual Irregularity, Hormonal Imbalance, and Hirsutism
- **BMI & PCOS:** Higher BMI observed in PCOS cases, but no direct age correlation
- **Lifestyle Impact:** Women without PCOS tend to have healthier habits
- **Childhood Trauma:** Possible association with PCOS cases
- **Cramer's V Analysis:** Strong correlation with hormonal imbalance, hyperandrogenism, and mental health

## πŸ“Œ Key Insights
βœ… PCOS is most common in the **20-25 age group** and **unmarried women**
βœ… Lifestyle factors such as **diet, exercise, and sleep quality** may influence PCOS risk
βœ… **Mental health and childhood trauma** may be potential risk factors
βœ… **Statistical correlations** confirm strong links between PCOS and hormonal disorders

## 🚧 Challenges Faced
- High categorical dominance made predictive modeling difficult
- Complex feature engineering required to quantify lifestyle factors
- Needed effective visualizations to communicate insights better

## πŸ“ Conclusion
This analysis provides valuable insights into PCOS prevalence and related factors. Due to the categorical nature of the data, the project was concluded at the **EDA stage** rather than proceeding to predictive modeling.

## πŸ“Ž Next Steps
- Extend analysis with additional datasets to improve generalizability
- Explore time-series data for tracking PCOS symptoms over time
- Investigate possible interventions based on lifestyle factors

πŸ’‘ **Open for feedback and collaboration! Let’s discuss more about PCOS and data-driven insights in healthcare.**