An open API service indexing awesome lists of open source software.

https://github.com/bchaoss/trash-wheel-pipeline

dbt data pipeline for analyzing trash wheel collection data
https://github.com/bchaoss/trash-wheel-pipeline

analytics-engineering dbt duckdb elt motherduck sql tidytuesday

Last synced: 8 months ago
JSON representation

dbt data pipeline for analyzing trash wheel collection data

Awesome Lists containing this project

README

          

# trash-wheel-analysis
A data pipeline for analyzing trash wheel collection data.

Data Source: [TidyTuesday | Trash Wheel Collection Data](https://github.com/rfordatascience/tidytuesday/blob/main/data/2024/2024-03-05/readme.md)

[![DBT](https://img.shields.io/badge/DBT-orange?style=for-the-badge&logo=dbt)](https://www.getdbt.com/)
[![DuckDB](https://img.shields.io/badge/DuckDB-yellow?style=for-the-badge&logo=duckdb)](https://duckdb.org/)
[![MotherDuck](https://img.shields.io/badge/MotherDuck-green?style=for-the-badge&logo=motherduck)](https://www.motherduck.com/)


### Data Stack

| Stack | Purpose (Modern & Open-source) |
| :--- | :--- |
| [dbt](https://www.getdbt.com/) | Generate data transformation pipeline (models, documentation, and tests) in SQL |
| [DuckDB](https://duckdb.org/) | Analytical database engine |
| [MotherDuck](https://www.motherduck.com/) | Cloud deployment for DuckDB (free plan available) |
| [Evidence](https://github.com/evidence-dev/evidence?tab=readme-ov-file) | BI tool using SQL and Markdown |
| [Github Action](https://docs.github.com/en/actions/get-started/understand-github-actions) | CI/CD (to run pipeline, deploy docs to [GitHub Page](https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site)) |

### Structure


.
├── dbt_pipeline/
│ ├── macros/
│ ├── models/
│ │ ├── ingest/
│ │ ├── staging/
│ │ └── mart/
│ ├── seeds/
│ ├── dbt_project.yml
│ └── profiles.yml

├── evidence_BI/

├── .devcontainer/
├── .github/workflows
└── requirements.txt

### Docs & DAG

**[dbt docs](https://bchaoss.github.io/trash-wheel-pipeline/pipeline/)** : shows SQL models info and structure.

dbt-dag




### Get start

0\. Clone the Repo

1\. Create MotherDuck account and set environment variable `MOTHERDUCK_TOKEN`

2\. Setup Environment

| Option | Notes |
| :--- | :--- |
| **Local Machine** | Python 3 and `pip install -r requirements.txt` |
| **Github Codespace** | Uses the `.devcontainer` to set up |

3\. Run the dbt Pipeline

```bash
cd dbt_pipeline

dbt debug # Verify connection

dbt build # Run the full pipeline (ingest -> staging -> mart) and tests
```


### TBD:
- [x] test and build dbt
- [ ] mart analysis table
- [ ] incremental (ingest or refresh)
- [x] github action