https://github.com/grascya/sleep-health_-lifestyle-dataset
Classifier to predict the presence of a sleep disorder based on the other columns in the dataset.
https://github.com/grascya/sleep-health_-lifestyle-dataset
data-visualization exploratory-data-analysis joblib machine-learning-algorithms pickle python statistical-analysis
Last synced: 8 months ago
JSON representation
Classifier to predict the presence of a sleep disorder based on the other columns in the dataset.
- Host: GitHub
- URL: https://github.com/grascya/sleep-health_-lifestyle-dataset
- Owner: grascya
- Created: 2023-10-30T14:19:28.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-10T19:08:32.000Z (11 months ago)
- Last Synced: 2025-01-21T01:14:09.235Z (10 months ago)
- Topics: data-visualization, exploratory-data-analysis, joblib, machine-learning-algorithms, pickle, python, statistical-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 5.2 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Sleep Health and Lifestyle
This synthetic dataset contains sleep and cardiovascular metrics and lifestyle factors of close to 400 fictive persons.
The workspace is set up with one CSV file, `data.csv`, with the following columns:
- `Person ID`
- `Gender`
- `Age`
- `Occupation`
- `Sleep Duration`: Average number of hours of sleep per day
- `Quality of Sleep`: A subjective rating on a 1-10 scale
- `Physical Activity Level`: Average number of minutes the person engages in physical activity daily
- `Stress Level`: A subjective rating on a 1-10 scale
- `BMI Category`
- `Blood Pressure`: Indicated as systolic pressure over diastolic pressure
- `Heart Rate`: In beats per minute
- `Daily Steps`
- `Sleep Disorder`: One of `None`, `Insomnia` or `Sleep Apnea`
**Background**: You work for a health insurance company and are tasked to identify whether a potential client will likely have a sleep disorder. The company wants to use this information to determine the premium they want the client to pay.
**Objective**: Construct a classifier to predict the presence of a sleep disorder based on the other columns in the dataset.
**Methods Used**: Exploratory Data Analysis, Inferential Statistics, Data Visualization, Machine Learning, Predictive Modeling.
**Type of Problem**: Multi-class Classification Task.
**Language, Libraries, technologies used**: Python, Pandas, Matplotlib, Seaborn, Numpy, Scipy, Scikit-learn, joblib
## KEY INSIGHTS:
To start this project, I first checked that all the data was clean and matched the description in the data dictionary; I cleaned up the data that wasn't clean and then validated all my data.
Once my data was clean, I carried out an exploratory data analysis, followed by statistical tests which revealed that :
- Those whose occupation is Accountant, Doctor, Engineer, or Lawyer are less likely to have a sleep disorder nurses have a high chance of sleep apnea, and Salespersons and Teachers are more likely to have insomnia
- Overweight people have a high chance to suffer from a sleep disorder and people with an ideal or normal Blood pressure are less likely to have a sleep disorder.
- People between the ages of 50 and 60 have low stress levels, and sleep quality of around 9, but are susceptible to sleep apnea
- Men and women aged between 42 and 45 are very likely to have insomnia, and women of 50 and above 55 have a very high chance of having sleep apnea
After that, I preprocessed my data and created a baseline model: A LogisticRegression and a comparison model: A DecisionTree, i fitted both models and evaluated them. With an accuracy of 89% the baseline model performs better .
I plotted the importance of each variable to see which variables contributed the most to the model prediction. I saved the model as a pickle file using joblib
Dataset Source: [Kaggle](https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset/)