An open API service indexing awesome lists of open source software.

https://github.com/grascya/sleep-health_-lifestyle-dataset

Classifier to predict the presence of a sleep disorder based on the other columns in the dataset.
https://github.com/grascya/sleep-health_-lifestyle-dataset

data-visualization exploratory-data-analysis joblib machine-learning-algorithms pickle python statistical-analysis

Last synced: 8 months ago
JSON representation

Classifier to predict the presence of a sleep disorder based on the other columns in the dataset.

Awesome Lists containing this project

README

          

# Sleep Health and Lifestyle

This synthetic dataset contains sleep and cardiovascular metrics and lifestyle factors of close to 400 fictive persons.

The workspace is set up with one CSV file, `data.csv`, with the following columns:

- `Person ID`
- `Gender`
- `Age`
- `Occupation`
- `Sleep Duration`: Average number of hours of sleep per day
- `Quality of Sleep`: A subjective rating on a 1-10 scale
- `Physical Activity Level`: Average number of minutes the person engages in physical activity daily
- `Stress Level`: A subjective rating on a 1-10 scale
- `BMI Category`
- `Blood Pressure`: Indicated as systolic pressure over diastolic pressure
- `Heart Rate`: In beats per minute
- `Daily Steps`
- `Sleep Disorder`: One of `None`, `Insomnia` or `Sleep Apnea`

**Background**: You work for a health insurance company and are tasked to identify whether a potential client will likely have a sleep disorder. The company wants to use this information to determine the premium they want the client to pay.

**Objective**: Construct a classifier to predict the presence of a sleep disorder based on the other columns in the dataset.

**Methods Used**: Exploratory Data Analysis, Inferential Statistics, Data Visualization, Machine Learning, Predictive Modeling.

**Type of Problem**: Multi-class Classification Task.

**Language, Libraries, technologies used**: Python, Pandas, Matplotlib, Seaborn, Numpy, Scipy, Scikit-learn, joblib

## KEY INSIGHTS:
To start this project, I first checked that all the data was clean and matched the description in the data dictionary; I cleaned up the data that wasn't clean and then validated all my data.

Once my data was clean, I carried out an exploratory data analysis, followed by statistical tests which revealed that :
- Those whose occupation is Accountant, Doctor, Engineer, or Lawyer are less likely to have a sleep disorder nurses have a high chance of sleep apnea, and Salespersons and Teachers are more likely to have insomnia
- Overweight people have a high chance to suffer from a sleep disorder and people with an ideal or normal Blood pressure are less likely to have a sleep disorder.
- People between the ages of 50 and 60 have low stress levels, and sleep quality of around 9, but are susceptible to sleep apnea

- Men and women aged between 42 and 45 are very likely to have insomnia, and women of 50 and above 55 have a very high chance of having sleep apnea

After that, I preprocessed my data and created a baseline model: A LogisticRegression and a comparison model: A DecisionTree, i fitted both models and evaluated them. With an accuracy of 89% the baseline model performs better .

I plotted the importance of each variable to see which variables contributed the most to the model prediction. I saved the model as a pickle file using joblib

Dataset Source: [Kaggle](https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset/)