https://github.com/ndleah/health-analysis
This case study is contained within the Serious SQL course by Danny Ma
https://github.com/ndleah/health-analysis
data-analysis data-exploration data-with-danny database serious-sql sql
Last synced: 8 months ago
JSON representation
This case study is contained within the Serious SQL course by Danny Ma
- Host: GitHub
- URL: https://github.com/ndleah/health-analysis
- Owner: ndleah
- Created: 2021-05-26T07:56:43.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-09-14T18:42:31.000Z (about 4 years ago)
- Last Synced: 2025-01-12T16:30:58.815Z (10 months ago)
- Topics: data-analysis, data-exploration, data-with-danny, database, serious-sql, sql
- Language: SQL
- Homepage: https://www.datawithdanny.com/
- Size: 1.05 MB
- Stars: 15
- Watchers: 3
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README


[](https://github.com/ndleah)
[](https://github.com/ndleah?tab=repositories)
# Health Analytics Case Study 
> This case study is contained within the [Serious SQL](https://www.datawithdanny.com) by [Danny Ma](https://www.linkedin.com/in/datawithdanny/)
>
## 📕 **Table of contents**
* 🛠️ [Overview](#️-overview)
* 🚀 [Solutions](#-solutions)
* 💻 [Key Highlights](#-key-highlight)
## 🛠️ Overview
With the **Health Analytics Mini Case Study**, I queried data to bring insights to the following questions:
1. How many `unique users` exist in the logs dataset?
2. How many total `measurements` do we have `per user on average`?
3. What about the `median` number of measurements per user?
4. How many users have `3 or more` measurements?
5. How many users have `1,000 or more` measurements?
6. Have logged `blood glucose` measurements?
7. Have `at least 2 types` of measurements?
8. Have all 3 measures - `blood glucose, weight and blood pressure`?
9. What is the `median systolic/diastolic` **blood pressure** values?
---
## 🚀 Solutions

### **How many unique users exist in the logs dataset?**
```sql
SELECT COUNT (DISTINCT id)
FROM health.user_logs;
```
|count |
|----------------------------------------|
|554 |
**`Note:` For question 2-8, I created a temporary table:**
**Step 1:** Firstly, I ran a code `DROP TABLE IF EXISTS` statement to clear out any previously created tables:
```sql
DROP TABLE IF EXISTS user_measure_count;
```
**Step 2:** Next, I created a new **temporary table** using the results of the query below:
```sql
CREATE TEMP TABLE user_measure_count AS
SELECT
id,
COUNT(*) AS measure_count,
COUNT (DISTINCT measure) AS unique_measures
FROM health.user_logs
GROUP BY 1;
```

### **How many total measurements do we have per user on average?**
```sql
SELECT
ROUND (AVG(measure_count), 2) AS mean_value
FROM user_measure_count;
```
|mean_value |
|----------------------------------------|
|79.23 |
---

### **What about the median number of measurements per user?**
```sql
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY measure_count) AS median_value
FROM user_measure_count;
```
|median_value |
|----------------------------------------|
|2 |

### **How many users have 3 or more measurements?**
```sql
SELECT COUNT(*)
FROM user_measure_count
WHERE measure_count >= 3;
```
|count |
|----------------------------------------|
|209 |

### **How many users have 1,000 or more measurements?**
```sql
SELECT COUNT(*)
FROM user_measure_count
WHERE measure_count >= 1000;
```
|count |
|----------------------------------------|
|5 |
---

### **Have logged blood glucose measurements?**
```sql
SELECT
COUNT(DISTINCT id)
FROM health.user_logs
WHERE measure = 'blood_glucose';
```
|count |
|----------------------------------------|
|325 |
---

### 7. **Have at least 2 types of measurements?**
```sql
SELECT
COUNT(*)
FROM user_measure_count
WHERE unique_measures >= 2;
```
|count |
|----------------------------------------|
|204 |
---

### **Have all 3 measures - blood glucose, weight and blood pressure?**
```sql
SELECT
COUNT(*)
FROM user_measure_count
WHERE unique_measures = 3;
```
|count |
|----------------------------------------|
|50 |
---

### **What is the median systolic/diastolic blood pressure values?**
```sql
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY systolic) AS median_systolic,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY diastolic) AS median_diastolic
FROM health.user_logs
WHERE measure = 'blood_pressure';
```
|median_systolic|median_diastolic|
|---------------|----------------|
|126 |79 |
---
## 💻 Key Highlight
> **Initial thoughts:**
Even though this is a short assignment which cover basic SQL syntax, I did run into problems several time during the solving process. However, it helped me to have a better understanding about data exploration using SQL from theories to real life application.
Some of the main areas covered in this case study, including:
* **Sorting Values**
* **Inspect Row Counts**
* **Duplicates & Record Frequency Review**
* **Summary Statistics** `(MEAN, MEDIAN)`