https://github.com/melinteflxrin/softserve-bigdata-project

End-to-end data warehousing project integrating APIs, ETL workflows, and PostgreSQL for analytics and reporting.
https://github.com/melinteflxrin/softserve-bigdata-project
analytics api bigdata data datawarehousing externalapi pipeline postgres postgresql python warehouse
Last synced: 5 months ago
JSON representation
End-to-end data warehousing project integrating APIs, ETL workflows, and PostgreSQL for analytics and reporting.
Host: GitHub
URL: https://github.com/melinteflxrin/softserve-bigdata-project
Owner: melinteflxrin
License: mit
Created: 2025-05-12T07:22:45.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-07-18T14:36:23.000Z (12 months ago)
Last Synced: 2025-07-18T19:04:00.029Z (12 months ago)
Topics: analytics, api, bigdata, data, datawarehousing, externalapi, pipeline, postgres, postgresql, python, warehouse
Language: Python
Homepage:
Size: 1.62 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Data Warehousing Project:
 Health, Fitness & Nutrition Analytics

This project is a comprehensive data warehousing solution designed for a hypothetical health-tech startup.


It demonstrates best practices in ETL, data governance, privacy, and analytics with interactive dashboards and role-based access control.

---

## Table of contents

1. [Description](#1-description)

2. [Business Requirements & Goals](#2-business-requirements--goals)

3. [Reports, Dashboards & KPIs](#3-reports-dashboards--kpis)

    - [Dashboard Example Output](#dashboard-example-output)

4. [Data Warehouse Design, Tables & Sources](#4-data-warehouse-design-tables--sources)


   4.1 [APIs and Data Sources](#41-apis-and-data-sources)


   4.2 [ETL Process](#42-etl-process)


   4.3 [Schemas](#43-schemas)


      - [Raw Schema](#raw-schema)


      - [Staging Schema](#staging-schema)


      - [Trusted Schema](#trusted-schema)


5. [Database Administration & Data Governance](#5-database-administration--data-governance)

    - [5.1 Database Administration](#51-database-administration)


    - [5.2 Data Governance](#52-data-governance)


6. [Graphical User Interface (GUI)](#6-graphical-user-interface-gui)

    - [GUI Example](#gui-example)

---

## 1. Description

This repository showcases an end-to-end data warehouse solution designed for a fictional health-tech startup.


The scenario centers on a mobile application that helps users monitor their health and wellness by tracking activity, sleep, and nutrition data, with a data engineer responsible for building and maintaining the pipelines that enable secure, reliable analysis of this information.

The project demonstrates how to:

- Integrate and process data from multiple sources (including synthetic data and public APIs).

- Implement robust ETL pipelines for cleaning, validating, and transforming raw data into analytics-ready tables.

- Enforce strong data governance, privacy, and security practices, including the separation of PII and non-PII data.

- Provide interactive dashboards and key performance indicators (KPIs) for user activity, sleep, nutrition, and goal achievement.

- Apply role-based access control to protect sensitive data.

> **Note:**  

> This project is for educational and demonstration purposes only. The app and its data are entirely fictional and intended to showcase best practices in data engineering, warehousing, and analytics.

---

## 2. Business Requirements & Goals

### Business Requirements 

- Integrate and process user health data (activity, sleep, nutrition) from multiple sources.

- Ensure data privacy by separating and protecting PII and non-PII data.

- Maintain high data quality and secure, role-based access.

- Support analytics and reporting with trusted, well-structured data.

### Core Business Goals 

- Provide actionable insights on user health and goal achievement.

- Demonstrate best practices in data governance and lifecycle management.

- Enable interactive dashboards and KPIs for key health metrics.

---

## 3. Reports, Dashboards & KPIs

> **Note:**  

> Before generating dashboards, you must first extract, transform, and load the data by running the following scripts in order:  

> 1. [`src/extract/healthapp.py`](src/extract/healthapp.py)  

> 2. [`src/transform/transform.py`](src/transform/transform.py)  

> 3. [`src/load/load.py`](src/load/load.py)

Interactive dashboards can be generated by running the provided script: 

```bash

python src/dashboard/dashboard.py

```

This script ([`src/dashboard/dashboard.py`](src/dashboard/dashboard.py)) automatically creates and visualizes dashboards for your key health and fitness metrics using the [trusted views](sql/business_view/).

**Dashboards and KPIs included:**

- **Goal Achievement:**  

  - % of users who have achieved their goals  

    - [`trusted.vw_pct_users_achieved_goals`](sql/business_view/create_view_pct_users_achieved_goals.sql): Calculates the percentage of user goals (across all goal types) that have been achieved.

- **Nutrition:**  

  - User's favourite food and typical meal time  

    - [`trusted.vw_user_favourite_food`](sql/business_view/create_view_user_favourite_food.sql): Shows each user's most frequently consumed food and the meal time they usually eat it.

  - Average macronutrient distribution per user  

    - [`trusted.vw_user_avg_macros`](sql/business_view/create_view_user_avg_macros.sql): Displays the average intake of calories, carbs, protein, and fat per user.

- **Activity:**  

  - Daily average calories burned per user  

    - [`trusted.vw_user_daily_avg_calories_burned`](sql/business_view/create_view_avg_calories_burned.sql): Reports the average number of calories burned per user per day.

- **All dashboards:**  

  - Displayed together for easy comparison

Dashboard Example Output

![Dashboard Example](assets/dashboard-example-screenshot.png)  

*Dashboard example shown for 10 generated users across 7 days*

---

## 4. Data Warehouse Design, Tables & Sources

### 4.1 APIs and Data Sources

- **HealthApp**: Synthetic data is generated using the [`healthapp.py`](src/extract/healthapp.py) script.  

  - To generate raw data, run the following command:

    ```bash

    python src/extract/healthapp.py

    ```

  - This script creates the necessary schemas and tables in the `raw` schema (if they do not already exist) and populates them with raw user data (see [`raw.user_data`](sql/tables/raw/create_raw_user_data_table.sql)), for activity, sleep, nutrition, and goals.  

- [**USDA API**](https://www.ers.usda.gov/developer/data-apis/): Nutritional data for food items is fetched dynamically via API calls within the [`healthapp.py`](src/extract/healthapp.py) script.  

---

## 4.2 ETL Process

Show ETL Pipeline Diagram

![ETL Pipeline](assets/etl-pipeline-diagram.png)  

*Overview of the ETL pipeline: Extract, Transform, Load.*




The ETL pipeline consists of the following stages, each with a dedicated script:

### 1. **Extract**

- **Purpose:** Generate and collect raw data from synthetic sources and public APIs.

- **Code:** [`src/extract/healthapp.py`](src/extract/healthapp.py)

  - Generates synthetic user/activity/nutrition data and loads it into the `raw` schema.

  - **Tables created in `raw` schema:**

    - [`raw.user_data`](sql/tables/raw/create_raw_user_data_table.sql)

    - [`raw.nutrition_log`](sql/tables/raw/create_raw_nutrition_log_table.sql)

- **Supporting code:**

  - [`src/extract/search_foods_api.py`](src/extract/search_foods_api.py): Fetches nutrition data from the USDA API.

  - [`src/extract/load_data_from_csv.py`](src/extract/load_data_from_csv.py): Loads food and activity types from CSV files.

- **How to run:**

  ```sh

  python src/extract/healthapp.py

  ```

Show Sample Raw User Data

![Sample Raw Data](assets/sample-raw-user-data.png)  

*Example of the `raw.user_data` table as generated by the extract script.*

Show Sample Raw Nutrition Log

![Sample Nutrition Log](assets/sample-raw-nutrition-log.png)  

*Example of the `raw.nutrition_log` table as generated by the extract script.*

---

### 2. **Transform**

- **Purpose:** Clean, validate, and transform raw data into analytics-ready staging tables.

- **Code:** [`src/transform/transform.py`](src/transform/transform.py)

  - Reads from the `raw` schema, processes the data, and loads it into the `staging` schema.

  - **Tables created in `staging` schema:**

    - [`staging.dim_user_profile`](sql/tables/staging/create_dim_user_profile_table.sql)

    - [`staging.dim_food_item`](sql/tables/staging/create_dim_food_item_table.sql)

    - [`staging.fact_activity_log`](sql/tables/staging/create_fact_activity_log_table.sql)

    - [`staging.fact_sleep_log`](sql/tables/staging/create_fact_sleep_log_table.sql)

    - [`staging.fact_nutrition_log`](sql/tables/staging/create_fact_nutrition_log_table.sql)

    - [`staging.fact_goals_log`](sql/tables/staging/create_fact_goals_log_table.sql)

- **How to run:**

  ```sh

  python src/transform/transform.py

  ```

Show Sample Staging User Profile

![Sample Staging User Profile](assets/sample-staging-user-profle.png)  

*Example of the `staging.dim_user_profile` table after transformation.*

Show Sample Staging Nutrition Log

![Sample Staging Nutrition Log](assets/sample-staging-nutrition-log.png)  

*Example of the `staging.fact_nutrition_log` table after transformation.*

---

### 3. **Load**

- **Purpose:** Load fully processed data from staging into the trusted schema for analytics and reporting.

- **Code:** [`src/load/load.py`](src/load/load.py)

  - Moves data from the `staging` schema into the `trusted` schema, enforcing all business and data quality rules.

  - **Tables created in `trusted` schema:**

    - [`trusted.nutrition_data`](sql/tables/trusted/create_trusted_nutrition_data_table.sql)

    - [`trusted.sleep_data`](sql/tables/trusted/create_trusted_sleep_data_table.sql)

    - [`trusted.activity_data`](sql/tables/trusted/create_trusted_activity_data_table.sql)

    - [`trusted.goals_data`](sql/tables/trusted/create_trusted_goals_data_table.sql)

- **How to run:**

  ```sh

  python src/load/load.py

  ```

> **Note:**  

> Each script will automatically create the required tables in its schema if they do not already exist.

Show Sample Trusted Nutrition

![Sample Trusted Nutrition Data](assets/sample-trusted-nutrition-data.png)  

*Example of the `trusted.nutrition_data` table after loading.*

Show Sample Trusted Goals

![Sample Trusted Goals Data](assets/sample-trusted-goals-data.png)  

*Example of the `trusted.goals_data` table after loading.*

---

### 4.3 Schemas

The data warehouse is organized into three schemas: **raw**, **staging**, and **trusted**. Each schema serves a specific purpose in the ETL pipeline:

#### Raw Schema

- **Purpose**: The initial storage for raw, unprocessed data directly extracted from the sources.  

#### `raw.user_data`

| Column Name       | Data Type     | Description                                      |

|-------------------|--------------|--------------------------------------------------|

| `record_id`       | `SERIAL PRIMARY KEY`  | Unique identifier for each record.      |

| `user_id`         | `INTEGER`    | Unique identifier for the user.                 |

| `name`            | `VARCHAR(100)` | User's name.                                   |

| `age`             | `INTEGER`    | User's age.                                     |

| `weight_kg`       | `NUMERIC(4,1)` | User's weight in kilograms.                    |

| `height_cm`       | `NUMERIC(4,1)` | User's height in centimeters.                  |

| `gender`          | `VARCHAR(10)` | User's gender.                                  |

| `calorie_goal`    | `INTEGER`    | Daily calorie goal for the user.                |

| `macro_goal`      | `JSON`       | JSON object containing macro goals (carbs, protein, fat). |

| `activity_start`  | `TIMESTAMP`  | Start time of the activity.                     |

| `activity_type`   | `VARCHAR(50)` | Type of activity (e.g., walking, running).      |

| `steps`           | `INTEGER`    | Daily step count                                |

| `heart_rate`      | `INTEGER`    | Heart rate during the activity.                 |

| `calories_burned` | `INTEGER`    | Calories burned during the day.                 |

| `sleep_start`     | `TIMESTAMP`  | Start time of sleep.                            |

| `sleep_end`       | `TIMESTAMP`  | End time of sleep.                              |

| `sleep_quality_score` | `INTEGER` | Quality score of sleep.                         |

| `goal_type`       | `VARCHAR(50)` | Type of goal (e.g., calories burned, steps taken). |

| `goal_target`     | `INTEGER`    | Target value for the goal.                      |

| `created_at`      | `TIMESTAMP`  | Timestamp when the record was created.          |

#### `raw.nutrition_log`

| Column Name       | Data Type     | Description                                      |

|-------------------|--------------|--------------------------------------------------|

| `nutrition_id`    | `SERIAL PRIMARY KEY` | Unique identifier for the nutrition record.  |

| `user_id`         | `INTEGER`     | Unique identifier for the user.                 |

| `date`            | `DATE`       | Date of the nutrition log.                      |

| `food_item`       | `VARCHAR(255)` | Name of the food item.                         |

| `meal_type`       | `VARCHAR(100)` | Type of meal (e.g., breakfast, lunch).         |

| `calories_per_100g` | `INTEGER` | Calories per 100 grams of the food item.     |

| `carbs_per_100g`  | `INTEGER` | Carbohydrates per 100 grams of the food item. |

| `protein_per_100g` | `INTEGER` | Protein per 100 grams of the food item.       |

| `fat_per_100g`    | `INTEGER` | Fat per 100 grams of the food item.           |

---

#### Staging Schema

- **Purpose**: Stores cleaned and transformed data, ready for further processing.  

Show Staging Schema Star Diagram

![Staging Star Diagram](assets/staging-star-diagram-mermaid.png)  

*Star diagram for the staging schema, illustrating fact and dimension tables and their relationships.*

#### `staging.dim_user_profile`

| Column Name       | Data Type     | Description                                      |

|-------------------|--------------|--------------------------------------------------|

| `user_id`         | `BIGINT PRIMARY KEY`     | Unique identifier for the user.      |

| `name`            | `VARCHAR(255)` | User's name.                                   |

| `age`             | `INTEGER`    | User's age.                                     |

| `weight_kg`       | `DECIMAL(4,1)` | User's weight in kilograms.                    |

| `height_cm`       | `DECIMAL(4,1)` | User's height in centimeters.                  |

| `gender`          | `VARCHAR(50)` | User's gender.                                  |

| `calorie_goal`    | `INTEGER`    | Daily calorie goal for the user.                |

| `carbs_goal`      | `INTEGER`    | Daily carbohydrate goal for the user.           |

| `protein_goal`    | `INTEGER`    | Daily protein goal for the user.                |

| `fat_goal`        | `INTEGER`    | Daily fat goal for the user.                    |

#### `staging.dim_food_item`

| Column Name       | Data Type     | Description                                      |

|-------------------|--------------|--------------------------------------------------|

| `food_item_id`    | `BIGINT PRIMARY KEY`     | Unique identifier for the food item.        |

| `food_item`       | `VARCHAR(255)` | Name of the food item.                         |

| `calories_per_100g` | `DECIMAL(4,0)` | Calories per 100 grams of the food item.     |

| `carbs_per_100g`  | `DECIMAL(3,0)` | Carbohydrates per 100 grams of the food item. |

| `protein_per_100g` | `DECIMAL(3,0)` | Protein per 100 grams of the food item.       |

| `fat_per_100g`    | `DECIMAL(3,0)` | Fat per 100 grams of the food item.           |

#### `staging.fact_activity_log`

| Column Name       | Data Type     | Description                                      |

|-------------------|--------------|--------------------------------------------------|

| `activity_id`     | `BIGINT PRIMARY KEY`     | Unique identifier for the activity record. |

| `user_id`         | `BIGINT`     | Unique identifier for the user.                 |

| `timestamp`       | `TIMESTAMP`  | Timestamp of the activity.                      |

| `activity_type`   | `VARCHAR(100)` | Type of activity (e.g., walking, running).     |

| `steps`           | `INTEGER`    | Number of steps taken during the activity.      |

| `heart_rate`      | `INTEGER`    | Heart rate during the activity.                 |

| `calories_burned` | `INTEGER`    | Calories burned during the activity.            |

#### `staging.fact_sleep_log`

| Column Name       | Data Type     | Description                                      |

|-------------------|--------------|--------------------------------------------------|

| `sleep_id`        | `BIGINT PRIMARY KEY`     | Unique identifier for the sleep record. |

| `user_id`         | `BIGINT`     | Unique identifier for the user.                 |

| `date`            | `DATE`       | Date of the sleep record.                       |

| `sleep_start`     | `TIMESTAMP`  | Start time of sleep.                            |

| `sleep_end`       | `TIMESTAMP`  | End time of sleep.                              |

| `sleep_duration_hours` | `DECIMAL(5,1)` | Duration of sleep in hours.                  |

| `sleep_quality_score` | `INTEGER` | Quality score of sleep.                         |

#### `staging.fact_nutrition_log`

| Column Name       | Data Type     | Description                                      |

|-------------------|--------------|--------------------------------------------------|

| `nutrition_id`    | `BIGINT PRIMARY KEY`     | Unique identifier for the nutrition record. |

| `user_id`         | `BIGINT`     | Unique identifier for the user.                 |

| `date`            | `DATE`       | Date of the nutrition log.                      |

| `food_item_id`    | `BIGINT`     | Foreign key to the food item dimension.         |

| `meal_type`       | `VARCHAR(100)` | Type of meal (e.g., breakfast, lunch).         |

#### `staging.fact_goals_log`

| Column Name       | Data Type     | Description                                      |

|-------------------|--------------|--------------------------------------------------|

| `goal_id`         | `BIGINT PRIMARY KEY`     | Unique identifier for the goal record.|

| `user_id`         | `BIGINT`     | Unique identifier for the user.                 |

| `date`            | `DATE`       | Date of the goal record.                        |

| `goal_type`       | `VARCHAR(100)` | Type of goal (e.g., calories burned, steps taken). |

| `target_value`    | `INTEGER`    | Target value for the goal.                      |

| `actual_value`    | `INTEGER`    | Actual value achieved for the goal.             |

| `status`          | `VARCHAR(50)` | Status of the goal (e.g., achieved, not achieved). |

--- 

#### Trusted Schema

- **Purpose**: Stores the final, fully processed data that is ready for analytics and reporting.  

Show Trusted Schema Star Diagram

![Trusted Star Diagram](assets/trusted-star-diagram-mermaid.png)  

*Star diagram for the trusted schema, illustrating the relationships between tables.*

#### `trusted.nutrition_data`

| Column Name         | Data Type        | Description                                      |

|---------------------|-----------------|--------------------------------------------------|

| `nutrition_id`      | `BIGINT PRIMARY KEY` | Unique identifier for the nutrition record. |

| `user_id`           | `BIGINT`        | Unique identifier for the user (foreign key to PII table). |

| `date`              | `DATE`          | Date of the nutrition log.                       |

| `food_item`         | `VARCHAR(255)`  | Name of the food item.                           |

| `meal_type`         | `VARCHAR(100)`  | Type of meal (e.g., breakfast, lunch).           |

| `calories_per_100g` | `DECIMAL(4,0)`  | Calories per 100 grams of the food item.         |

| `carbs_per_100g`    | `DECIMAL(3,0)`  | Carbohydrates per 100 grams of the food item.    |

| `protein_per_100g`  | `DECIMAL(3,0)`  | Protein per 100 grams of the food item.          |

| `fat_per_100g`      | `DECIMAL(3,0)`  | Fat per 100 grams of the food item.              |

#### `trusted.activity_data`

| Column Name         | Data Type        | Description                                      |

|---------------------|-----------------|--------------------------------------------------|

| `activity_id`       | `BIGINT PRIMARY KEY` | Unique identifier for the activity record.   |

| `user_id`           | `BIGINT`        | Unique identifier for the user (foreign key to PII table). |

| `timestamp`         | `TIMESTAMP`     | Timestamp of the activity.                       |

| `activity_type`     | `VARCHAR(100)`  | Type of activity (e.g., walking, running).       |

| `steps`             | `INT`           | Number of steps taken during the activity.       |

| `heart_rate`        | `INT`           | Heart rate during the activity.                  |

| `calories_burned`   | `INT`           | Calories burned during the activity.             |

#### `trusted.sleep_data`

| Column Name             | Data Type        | Description                                      |

|-------------------------|-----------------|--------------------------------------------------|

| `sleep_id`              | `BIGINT PRIMARY KEY` | Unique identifier for the sleep record.      |

| `user_id`               | `BIGINT`        | Unique identifier for the user (foreign key to PII table). |

| `date`                  | `DATE`          | Date of the sleep record.                        |

| `sleep_start`           | `TIMESTAMP`     | Start time of sleep.                             |

| `sleep_end`             | `TIMESTAMP`     | End time of sleep.                               |

| `sleep_duration_hours`  | `DECIMAL(5,1)`  | Duration of sleep in hours.                      |

| `sleep_quality_score`   | `INTEGER`       | Quality score of sleep.                          |

#### `trusted.goals_data`

| Column Name         | Data Type        | Description                                      |

|---------------------|-----------------|--------------------------------------------------|

| `goal_id`           | `BIGINT PRIMARY KEY` | Unique identifier for the goal record.        |

| `user_id`           | `BIGINT`        | Unique identifier for the user (foreign key to PII table). |

| `date`              | `DATE`          | Date of the goal record.                         |

| `goal_type`         | `VARCHAR(100)`  | Type of goal (e.g., calories burned, steps).     |

| `target_value`      | `INT`           | Target value for the goal.                       |

| `actual_value`      | `INT`           | Actual value achieved for the goal.              |

| `status`            | `VARCHAR(50)`   | Status of the goal (e.g., achieved, not achieved, pending). |

#### `trusted.user_profile`

| Column Name         | Data Type        | Description                                      |

|---------------------|-----------------|--------------------------------------------------|

| `user_id`           | `BIGINT PRIMARY KEY` | Unique identifier for the user.              |

| `name`              | `VARCHAR(255)`  | User's name (PII, access restricted).            |

| `age`               | `INT`           | User's age (PII, access restricted).             |

| `gender`            | `VARCHAR(50)`   | User's gender (PII, access restricted).          |

> **Note:**  

> All analytics and reporting are performed using the trusted fact tables, which only reference users by `user_id`.  

> PII is only accessible to authorized roles.

--- 

## 5. Database Administration & Data Governance

### 5.1 Database Administration

- **Create DBA Roles:**  

  - Use the scripts in [`sql/roles/`](sql/roles/) to create the `dba_role` and grant access to the `trusted` schema and its views:

    - [`grant_usage_trusted_schema.sql`](sql/roles/grant_usage_trusted_schema.sql)

    - [`grant_select_trusted_views.sql`](sql/roles/grant_select_trusted_views.sql)

    - [`grant_access_pii_data.sql`](sql/roles/grant_access_pii_data.sql) (restricts PII access to DBAs)

    - [`grant_access_non_pii_data.sql`](sql/roles/grant_access_non_pii_data.sql) (grants analytics access to non-PII data)

    - [`grant_access_archive_user_data_pii.sql`](sql/roles/grant_access_archive_user_data_pii.sql) (restricts archive access to DBAs)

- **Optimize DB Performance:**  

  - Use [`explain_query_execution.sql`](sql/roles/explain_query_execution.sql) to analyze and optimize query performance.

- **Schema Organization:**  

  - **Schema Organization:**  

  - All scripts for table creation, data insertion, archiving, and deletion are organized by data sensitivity in [`sql/data/`](sql/data/).

---

### 5.2 Data Governance

- **Data Privacy & PII Handling:**

  - PII data (e.g., user names, ages) is stored only in `trusted.user_data_pii` and managed via scripts in [`sql/data/pii/`](sql/data/pii/).

  - Non-PII data is stored in `trusted.user_data_non_pii` (see [`sql/data/non_pii/`](sql/data/non_pii/)), where user identifiers are hashed and ages are grouped.

  - Access to PII tables is strictly limited to users with the `dba_role`.

- **Data Lifecycle Management:**

  - Old or inactive records are archived using scripts in [`sql/data/archive/`](sql/data/archive/).

  - Archived records are deleted from the main tables only after successful archival, ensuring no data loss.

- **Data Quality Assurance:**

  - All trusted tables enforce strong data quality constraints (`NOT NULL`, `CHECK`, valid value lists) at the schema level.

  - ETL scripts filter out incomplete or invalid records before loading into trusted tables.

- **Security & Access Control:**

  - Role-based access is enforced at the schema and table level.

  - PII data is never exposed to analytics or reporting users.

- **Documentation:**

  - All scripts and policies are documented in this README for transparency and auditability.

---

---

## 6. Graphical User Interface (GUI)

A simple GUI is included to make running the ETL pipeline and generating dashboards more user-friendly.

### How it Works

- The GUI is built with [Tkinter](https://docs.python.org/3/library/tkinter.html).

- Launch it with:

  ```sh

  python src/interface/main.py

  ```

- The interface will prompt you to enter:

  - Number of Users

  - Number of Days

  - Database Host, Port, User, Password, and Name

  - USDA API Key

When you click **"Run Pipeline"**, the GUI passes your input as command-line arguments and environment variables to the ETL scripts. The scripts are then run in sequence: extract, transform, load, and dashboard generation.

**File location:**  

[`src/interface/main.py`](src/interface/main.py)

**Important:**  

- If you use the GUI, your input for number of users and days will be used for that run.

- If you run the extract script directly (e.g., `python src/extract/healthapp.py`), it will use the default global variables (`NO_USERS` and `NO_DAYS`) defined in the script.

> **Note:**  

> For larger numbers of users or days, the pipeline will take longer to complete. 


> This is because the USDA API is called to retrieve food information for every user and day, which can be time-consuming.

**Example GUI window:**  

Show Example GUI window

![GUI Example](assets/gui-example.png)  

*The GUI for running the ETL pipeline and dashboards with custom parameters.*

---
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/melinteflxrin/softserve-bigdata-project

Awesome Lists containing this project

README