An open API service indexing awesome lists of open source software.

https://github.com/matt-strautmann/dbt-bigquery-ecommerce-quickstart

Welcome to the dbt-BigQuery Quickstart Project! πŸŽ‰ This repository is designed as a hands-on guide to help you build a modern data stack leveraging powerful tools like Airbyte for ingestion, dbt for transformation, and BigQuery for storage and analytics.
https://github.com/matt-strautmann/dbt-bigquery-ecommerce-quickstart

bigquery dbt e-commerce quickstarts

Last synced: 10 months ago
JSON representation

Welcome to the dbt-BigQuery Quickstart Project! πŸŽ‰ This repository is designed as a hands-on guide to help you build a modern data stack leveraging powerful tools like Airbyte for ingestion, dbt for transformation, and BigQuery for storage and analytics.

Awesome Lists containing this project

README

          

# πŸš€ Getting Started with dbt and Airbyte

Welcome to your modern data stack template! This project demonstrates how to build a scalable data warehouse using:
- **Airbyte** for data ingestion
- **dbt** for transformation
- **BigQuery** for storage and computing

## Prerequisites

1. **VSCode** installed ([Download here](https://code.visualstudio.com/download))
2. **Turntable.so extension** installed in VSCode ([Install here](https://marketplace.visualstudio.com/items?itemName=Turntable.turntable))
3. **Python** installed (3.8 or higher)
4. **Google Cloud account** with BigQuery enabled
5. **Airbyte** instance set up with sources configured

## πŸ—οΈ Project Structure
```
πŸ“‚ models/
β”œβ”€β”€ πŸ“ staging/ # πŸ› οΈ Raw data standardization
β”‚ β”œβ”€β”€ πŸ“ stg_stripe/ # πŸ’³ Payment processing
β”‚ β”‚ β”œβ”€β”€ πŸ“ base/ # πŸ“œ Raw JSON parsing
β”‚ β”‚ β”‚ └── πŸ“„ base_stripe__customers.sql
β”‚ β”‚ β”œβ”€β”€ πŸ“„ stg_stripe__customers.sql
β”‚ β”‚ └── πŸ“„ _stripe_sources.yml
β”‚ β”‚
β”‚ β”œβ”€β”€ πŸ“ stg_hubspot/ # πŸ“ˆ Marketing automation
β”‚ β”‚ β”œβ”€β”€ πŸ“ base/
β”‚ β”‚ └── πŸ“„ _hubspot_sources.yml
β”‚ β”‚
β”‚ └── πŸ“ stg_shopify/ # πŸ›’ E-commerce platform
β”‚ β”œβ”€β”€ πŸ“ base/
β”‚ └── πŸ“„ _shopify_sources.yml
β”‚
β”œβ”€β”€ πŸ“ intermediate/ # πŸ” Business logic layer
β”‚ β”œβ”€β”€ πŸ“ finance/
β”‚ β”œβ”€β”€ πŸ“ marketing/
β”‚ └── πŸ“ sales/
β”‚
└── πŸ“ marts/ # πŸ“Š Business-specific models
β”œβ”€β”€ πŸ“ core/ # πŸ”‘ Core business entities
β”œβ”€β”€ πŸ“ finance/ # πŸ’° Finance-specific models
β”œβ”€β”€ πŸ“ marketing/ # πŸ“£ Marketing-specific models
└── πŸ“ sales/ # πŸ›’ Sales-specific models
```

## πŸ”„ Setup Instructions

1. **Clone the Repository**:
```bash
git clone https://github.com/yourusername/dbt-bigquery-quickstart-project.git
cd dbt-bigquery-quickstart-project
```

2. **Set Up Environment Variables**:
```bash
cp .env.example .env
# Edit .env with your configurations
```

3. **Configure dbt Profile**:
```bash
cp profiles.yml.example ~/.dbt/profiles.yml
# Edit profiles.yml with your BigQuery details
```

4. **Install Dependencies**:
```bash
pip install dbt-core dbt-bigquery
dbt deps
```

## πŸ”Œ Airbyte Integration

This template is designed to work with Airbyte's BigQuery destination. Key points:

1. **Raw Data Structure**:
- Airbyte creates tables with prefix `_airbyte_raw_`
- Data is stored in JSON format in `_airbyte_data` column
- Each record has `_airbyte_emitted_at` timestamp

2. **Base Models**:
```sql
-- Example: models/staging/stg_stripe/base/base_stripe__customers.sql
select
JSON_EXTRACT_SCALAR(_airbyte_data, '$.id') as customer_id,
JSON_EXTRACT_SCALAR(_airbyte_data, '$.email') as email,
_airbyte_emitted_at as ingested_at
from {{ source('stripe', '_airbyte_raw_customers') }}
```

3. **Source Configuration**:
```yaml
# Example: models/staging/stg_stripe/_stripe_sources.yml
version: 2
sources:
- name: stripe
database: "{{ env_var('DBT_PROJECT_ID') }}"
schema: "{{ env_var('AIRBYTE_SCHEMA', 'raw') }}"
loader: airbyte
loaded_at_field: _airbyte_emitted_at
tables:
- name: _airbyte_raw_customers
```

## 🏭 Development Workflow

1. **Set Up Airbyte Source**:
- Configure source in Airbyte UI
- Set destination to BigQuery
- Note the destination schema

2. **Update Environment Variables**:
```bash
DBT_PROJECT_ID=your-project-id
AIRBYTE_SCHEMA=raw
DBT_STAGING_SCHEMA=staging
```

3. **Create Base Models**:
- Parse JSON data from Airbyte
- Use `JSON_EXTRACT_SCALAR` for BigQuery
- Add basic data type conversions

4. **Create Staging Models**:
- Add business logic and cleaning
- Implement standard naming
- Add data quality tests

5. **Build Marts**:
- Combine data from multiple sources
- Create business-specific views
- Optimize for analysis

## πŸ“Š Data Quality

1. **Source Freshness**:
```yaml
sources:
- name: stripe
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
```

2. **Data Tests**:
```yaml
models:
- name: stg_stripe__customers
columns:
- name: customer_id
tests:
- unique
- not_null
```

## πŸ” Monitoring

1. **Airbyte Sync Status**:
- Check Airbyte UI for sync status
- Monitor `_airbyte_emitted_at` for freshness

2. **dbt Run Status**:
- Use `dbt source freshness`
- Check model test results

## πŸ“š Learning Resources

1. **Airbyte Resources**:
- [Airbyte Docs](https://docs.airbyte.io/)
- [BigQuery Setup Guide](https://docs.airbyte.io/integrations/destinations/bigquery)

2. **dbt Resources**:
- [dbt Docs](https://docs.getdbt.com/)
- [BigQuery Specific Guides](https://docs.getdbt.com/reference/resource-configs/bigquery-configs)

3. **Community**:
- [dbt Slack](https://community.getdbt.com/)
- [Airbyte Slack](https://slack.airbyte.io/)

## πŸ†˜ Need Help?

1. Check error messages in Turntable.so
2. Review Airbyte logs for sync issues
3. Visit [dbt Discourse](https://discourse.getdbt.com/)
4. Create an issue in this repository

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request

## πŸ“ License

MIT License - see [LICENSE](LICENSE) file

## BigQuery Setup Guide

For detailed instructions on setting up your BigQuery connection, including OAuth authentication and testing, see our [BigQuery Setup Guide](docs/bigquery_setup.md).

---
# ⭐ Credits & Connect

## πŸš€ About This Repository
This repository is maintained by [Matt Strautmann](https://www.linkedin.com/in/mattstrautmann), an experienced **is working closely with Founder/CEOs to use your Data to improve your bottom line. Period.** Let me help you **trust your data. know your customer. improve your bottom line.**

### Why Star This Repository?
Starring this repository helps me understand which tools, templates, and projects bring the most value to the community. Your support motivates me to keep producing high-quality content and maintain these resources for everyone!

## 🌟 Support This Project
If this repository has helped you:
1. Give it a ⭐ to show your appreciation!
2. Share it with others who might find it useful.

## 🀝 Connect with Me
I’d love to hear how you’re using this repository or discuss how I can help with your next project. Let’s connect:
- **LinkedIn**: [Matt Strautmann](https://www.linkedin.com/in/mattstrautmann)
- **GitHub**: [Matt Strautmann](https://github.com/matt-strautmann)

---