An open API service indexing awesome lists of open source software.

https://github.com/datacody/dbt-jaffle-shop

A hands-on project built to deepen understanding of dbt modeling, testing, and documentation. Based on the Jaffle Shop dataset, the project showcases best practices in transforming and validating source data for business analytics using the modern data stack.
https://github.com/datacody/dbt-jaffle-shop

analytics bigquery data-eng data-modeling dbt etl-pipeline sql transformation

Last synced: 11 months ago
JSON representation

A hands-on project built to deepen understanding of dbt modeling, testing, and documentation. Based on the Jaffle Shop dataset, the project showcases best practices in transforming and validating source data for business analytics using the modern data stack.

Awesome Lists containing this project

README

          

## πŸ₯ͺ dbt-jaffle-shop

A Modern Data Stack Project for End-to-End Analytics Engineering

This repository is a customized implementation of the [dbt-labs/jaffle-shop](https://github.com/dbt-labs/jaffle-shop) project, tailored to demonstrate proficiency in data modeling, testing, documentation, and orchestration using dbt. It serves as a comprehensive example of building a robust analytics pipeline from raw data ingestion to curated data marts.

## πŸš€ Project Overview
- **Source Data**: Simulated e-commerce datasets representing customers, orders, and payments.
- **TransformationLayers**:
- **Staging Models**: Clean and standardize raw data.
- **Intermediate Models**: Join and enrich data across multiple sources.
- **Mart Models**: Provide business-ready datasets for analytics and reporting.
- **Testing**: Implemented data quality tests including unique, not_null, and referential integrity checks.
- **Documentation**: Auto-generated documentation with detailed model and column descriptions.
- **Orchestration**: Configured dbt Cloud jobs for scheduled runs and testing.

## 🧰 Tech Stack
- **dbt Core**: SQL-based data transformation framework.
- **dbt Cloud**: Hosted environment for development and job orchestration.
- **Data Warehouse**: BigQuery.
- **Version Control**: GitHub for source code management.
- **CI/CD**: [Optionalβ€”mention if integrated].

## πŸ—ΊοΈ Project Structure

dbt-jaffle-shop/

β”œβ”€β”€ analyses
β”œβ”€β”€ docs/
β”œβ”€β”€ models/
β”‚ β”œβ”€β”€ staging/
β”‚ β”œβ”€β”€ intermediate/
β”‚ └── marts/
β”œβ”€β”€ seeds/
β”œβ”€β”€ tests/
β”œβ”€β”€ macros/
β”œβ”€β”€ snapshots/
β”œβ”€β”€ dbt_project.yml
β”œβ”€β”€ LICENSE
└── README.md

## πŸ§ͺ Testing & Quality Assurance

Implemented comprehensive testing strategies to ensure data reliability:
- **Schema Tests**: Enforced constraints like unique and not_null.
- **Custom Tests**: Developed bespoke tests for business logic validation.
- **Data Freshness**: Configured freshness checks on source data.

## πŸ“Š Documentation & Lineage
- Auto-Generated Docs: Utilized dbt docs generate for creating interactive documentation.
- Data Lineage: Visualized model dependencies and data flow.

### πŸ“Έ Documentation screenshot
![Lineage Graph](docs/assets/docs.png)

### πŸ“Έ DAG visualization
![Lineage Graph](docs/assets/dag.png)

## πŸ—“οΈ Scheduling & Orchestration

Configured dbt Cloud jobs with the following characteristics:
- **Environment**: Production
- **Schedule**: Daily runs at 9 AM AEST
- **Commands**:
- `dbt seed`
- `dbt run`
- `dbt test`

## 🧩 Key Features
- Modular and scalable model architecture.
- Adherence to dbt best practices and naming conventions.
- Comprehensive testing and documentation.
- Automated workflows for continuous integration and deployment.

## πŸ“ˆ Future Enhancements
- Integration with BI tools like Looker or Tableau.
- Implementation of snapshots for slowly changing dimensions.
- Expansion of test coverage with more complex scenarios.

## πŸ“„ License

This project is licensed under the [MIT License](LICENSE).