https://github.com/gurpreet0022/unveiling-pcos
Data Driven approach to get insights about PCOS
https://github.com/gurpreet0022/unveiling-pcos
analysis eda insights matplotlib numpy pandas python3 scipy-stats seaborn visualisation
Last synced: 10 months ago
JSON representation
Data Driven approach to get insights about PCOS
- Host: GitHub
- URL: https://github.com/gurpreet0022/unveiling-pcos
- Owner: Gurpreet0022
- Created: 2025-02-04T13:19:29.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-04T13:36:58.000Z (about 1 year ago)
- Last Synced: 2025-02-04T14:32:07.263Z (about 1 year ago)
- Topics: analysis, eda, insights, matplotlib, numpy, pandas, python3, scipy-stats, seaborn, visualisation
- Language: Jupyter Notebook
- Homepage:
- Size: 738 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PCOS Data Analysis - Exploratory Data Analysis (EDA)
## π Project Overview
Polycystic Ovary Syndrome (PCOS) is a prevalent hormonal disorder affecting women worldwide. This project focuses on analyzing PCOS data to uncover key insights related to symptoms, lifestyle factors, and potential associations.
## π Objectives
- Explore the prevalence of PCOS in different demographics.
- Identify common symptoms and their associations.
- Examine the impact of BMI, lifestyle, and mental health on PCOS.
- Use statistical methods to find meaningful correlations.
## π About PCOS
PCOS is a hormonal disorder causing enlarged ovaries with small cysts. Symptoms include irregular periods, excessive androgen levels, weight gain, and insulin resistance. Understanding PCOS is crucial for early diagnosis and lifestyle management.
## π Dataset Overview
- **Type:** Categorical-heavy dataset (mostly Yes/No values)
- **Key Features:**
- **Symptoms:** Menstrual irregularity, hormonal imbalance, hyperandrogenism, hirsutism, etc.
- **Lifestyle Factors:** Diet, exercise, and sleep habits
- **Health Metrics:** BMI, mental health status, family history
## π οΈ Tools & Technologies Used
- Python (Pandas, NumPy, Seaborn, Matplotlib)
- Jupyter Notebook for analysis
- Feature Engineering & Normalization
- CramΓ©rβs V for correlation analysis
## π¬ Methodology
### 1οΈβ£ Data Preprocessing
- Label encoding categorical values
- Feature engineering (Sleep Score, Diet Score, Exercise Score, Healthy Lifestyle Score)
- Normalization of numerical values
### 2οΈβ£ Exploratory Data Analysis (EDA)
- **PCOS Prevalence:** 22% of women in the dataset have PCOS
- **Common Symptoms:** Menstrual Irregularity, Hormonal Imbalance, and Hirsutism
- **BMI & PCOS:** Higher BMI observed in PCOS cases, but no direct age correlation
- **Lifestyle Impact:** Women without PCOS tend to have healthier habits
- **Childhood Trauma:** Possible association with PCOS cases
- **Cramer's V Analysis:** Strong correlation with hormonal imbalance, hyperandrogenism, and mental health
## π Key Insights
β
PCOS is most common in the **20-25 age group** and **unmarried women**
β
Lifestyle factors such as **diet, exercise, and sleep quality** may influence PCOS risk
β
**Mental health and childhood trauma** may be potential risk factors
β
**Statistical correlations** confirm strong links between PCOS and hormonal disorders
## π§ Challenges Faced
- High categorical dominance made predictive modeling difficult
- Complex feature engineering required to quantify lifestyle factors
- Needed effective visualizations to communicate insights better
## π Conclusion
This analysis provides valuable insights into PCOS prevalence and related factors. Due to the categorical nature of the data, the project was concluded at the **EDA stage** rather than proceeding to predictive modeling.
## π Next Steps
- Extend analysis with additional datasets to improve generalizability
- Explore time-series data for tracking PCOS symptoms over time
- Investigate possible interventions based on lifestyle factors
π‘ **Open for feedback and collaboration! Letβs discuss more about PCOS and data-driven insights in healthcare.**