https://github.com/dde-labs/zrch-data-pipeline
:ballot_box: ZRCH Assignment for create Data Pipeline
https://github.com/dde-labs/zrch-data-pipeline
airflow assignment-problem duckdb py39
Last synced: 13 days ago
JSON representation
:ballot_box: ZRCH Assignment for create Data Pipeline
- Host: GitHub
- URL: https://github.com/dde-labs/zrch-data-pipeline
- Owner: dde-labs
- License: mit
- Created: 2024-07-05T09:20:19.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-08T14:01:12.000Z (almost 2 years ago)
- Last Synced: 2025-03-13T06:17:22.156Z (about 1 year ago)
- Topics: airflow, assignment-problem, duckdb, py39
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ZRCH: Data Pipeline
**ZRCH** Assignment for create Data Pipeline.
> You are tasked with designing a data pipeline for a fictional e-commerce company,
> "ShopSmart". The company wants to analyze customer behavior, sales performance,
> and inventory management. You need to create a solution that ingests, processes,
> and stores data from various sources.
>
> **Questions**:
> - What are the top 2 best-selling products?
> - What is the average order value per customer?
> - What is the total revenue generated per product category?
> [!WARNING]
> This project does not implement testcases because I do not have time to learn.
## Architecture
```text
LocalFile --> MySQL --> Log
```
> [!NOTE]
> This it the first assignment that want to create end-to-end data pipeline from
> source to serve analytic reports. So, I will try to orchestrate with **Airflow**
> in this assignment.
## Getting Started
First, you should start with create your Python venv and install `uv`.
```shell
python -m venv venv
./venv/Scripts/activate
(venv) python -m pip install -U pip
(venv) pip install uv
```
Then, setup the connection file on `./secrets/connections.yaml`
```yaml
file_local:
conn_type: file
description: Local Data on Docker
extra_dejson:
path: "/opt/airflow/data"
warehouse:
conn_type: mysql
description: Local MySQL on Docker
host: mysql
schema: ...
login: ...
password: ...
port: 3306
```
Next, start provisioning all services by Docker Compose;
```shell
docker compose -f ./.container/docker-compose.warehouse.yml --env-file .env up -d
docker compose -f ./.container/docker-compose.yml --env-file .env up -d
```
> [!NOTE]
> Down Docker Compose;
> ```shell
> docker compose -f ./.container/docker-compose.yml --env-file .env down -v
> docker compose -f ./.container/docker-compose.warehouse.yml --env-file .env down
> ```
Login to Airflow `localhost:8080` with root user and start observe the dags.