https://github.com/matt-strautmann/dbt-bigquery-ecommerce-quickstart
Welcome to the dbt-BigQuery Quickstart Project! π This repository is designed as a hands-on guide to help you build a modern data stack leveraging powerful tools like Airbyte for ingestion, dbt for transformation, and BigQuery for storage and analytics.
https://github.com/matt-strautmann/dbt-bigquery-ecommerce-quickstart
bigquery dbt e-commerce quickstarts
Last synced: 10 months ago
JSON representation
Welcome to the dbt-BigQuery Quickstart Project! π This repository is designed as a hands-on guide to help you build a modern data stack leveraging powerful tools like Airbyte for ingestion, dbt for transformation, and BigQuery for storage and analytics.
- Host: GitHub
- URL: https://github.com/matt-strautmann/dbt-bigquery-ecommerce-quickstart
- Owner: matt-strautmann
- Created: 2024-12-30T17:03:19.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-16T15:11:43.000Z (over 1 year ago)
- Last Synced: 2025-03-17T23:49:40.620Z (about 1 year ago)
- Topics: bigquery, dbt, e-commerce, quickstarts
- Homepage:
- Size: 46.9 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π Getting Started with dbt and Airbyte
Welcome to your modern data stack template! This project demonstrates how to build a scalable data warehouse using:
- **Airbyte** for data ingestion
- **dbt** for transformation
- **BigQuery** for storage and computing
## Prerequisites
1. **VSCode** installed ([Download here](https://code.visualstudio.com/download))
2. **Turntable.so extension** installed in VSCode ([Install here](https://marketplace.visualstudio.com/items?itemName=Turntable.turntable))
3. **Python** installed (3.8 or higher)
4. **Google Cloud account** with BigQuery enabled
5. **Airbyte** instance set up with sources configured
## ποΈ Project Structure
```
π models/
βββ π staging/ # π οΈ Raw data standardization
β βββ π stg_stripe/ # π³ Payment processing
β β βββ π base/ # π Raw JSON parsing
β β β βββ π base_stripe__customers.sql
β β βββ π stg_stripe__customers.sql
β β βββ π _stripe_sources.yml
β β
β βββ π stg_hubspot/ # π Marketing automation
β β βββ π base/
β β βββ π _hubspot_sources.yml
β β
β βββ π stg_shopify/ # π E-commerce platform
β βββ π base/
β βββ π _shopify_sources.yml
β
βββ π intermediate/ # π Business logic layer
β βββ π finance/
β βββ π marketing/
β βββ π sales/
β
βββ π marts/ # π Business-specific models
βββ π core/ # π Core business entities
βββ π finance/ # π° Finance-specific models
βββ π marketing/ # π£ Marketing-specific models
βββ π sales/ # π Sales-specific models
```
## π Setup Instructions
1. **Clone the Repository**:
```bash
git clone https://github.com/yourusername/dbt-bigquery-quickstart-project.git
cd dbt-bigquery-quickstart-project
```
2. **Set Up Environment Variables**:
```bash
cp .env.example .env
# Edit .env with your configurations
```
3. **Configure dbt Profile**:
```bash
cp profiles.yml.example ~/.dbt/profiles.yml
# Edit profiles.yml with your BigQuery details
```
4. **Install Dependencies**:
```bash
pip install dbt-core dbt-bigquery
dbt deps
```
## π Airbyte Integration
This template is designed to work with Airbyte's BigQuery destination. Key points:
1. **Raw Data Structure**:
- Airbyte creates tables with prefix `_airbyte_raw_`
- Data is stored in JSON format in `_airbyte_data` column
- Each record has `_airbyte_emitted_at` timestamp
2. **Base Models**:
```sql
-- Example: models/staging/stg_stripe/base/base_stripe__customers.sql
select
JSON_EXTRACT_SCALAR(_airbyte_data, '$.id') as customer_id,
JSON_EXTRACT_SCALAR(_airbyte_data, '$.email') as email,
_airbyte_emitted_at as ingested_at
from {{ source('stripe', '_airbyte_raw_customers') }}
```
3. **Source Configuration**:
```yaml
# Example: models/staging/stg_stripe/_stripe_sources.yml
version: 2
sources:
- name: stripe
database: "{{ env_var('DBT_PROJECT_ID') }}"
schema: "{{ env_var('AIRBYTE_SCHEMA', 'raw') }}"
loader: airbyte
loaded_at_field: _airbyte_emitted_at
tables:
- name: _airbyte_raw_customers
```
## π Development Workflow
1. **Set Up Airbyte Source**:
- Configure source in Airbyte UI
- Set destination to BigQuery
- Note the destination schema
2. **Update Environment Variables**:
```bash
DBT_PROJECT_ID=your-project-id
AIRBYTE_SCHEMA=raw
DBT_STAGING_SCHEMA=staging
```
3. **Create Base Models**:
- Parse JSON data from Airbyte
- Use `JSON_EXTRACT_SCALAR` for BigQuery
- Add basic data type conversions
4. **Create Staging Models**:
- Add business logic and cleaning
- Implement standard naming
- Add data quality tests
5. **Build Marts**:
- Combine data from multiple sources
- Create business-specific views
- Optimize for analysis
## π Data Quality
1. **Source Freshness**:
```yaml
sources:
- name: stripe
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
```
2. **Data Tests**:
```yaml
models:
- name: stg_stripe__customers
columns:
- name: customer_id
tests:
- unique
- not_null
```
## π Monitoring
1. **Airbyte Sync Status**:
- Check Airbyte UI for sync status
- Monitor `_airbyte_emitted_at` for freshness
2. **dbt Run Status**:
- Use `dbt source freshness`
- Check model test results
## π Learning Resources
1. **Airbyte Resources**:
- [Airbyte Docs](https://docs.airbyte.io/)
- [BigQuery Setup Guide](https://docs.airbyte.io/integrations/destinations/bigquery)
2. **dbt Resources**:
- [dbt Docs](https://docs.getdbt.com/)
- [BigQuery Specific Guides](https://docs.getdbt.com/reference/resource-configs/bigquery-configs)
3. **Community**:
- [dbt Slack](https://community.getdbt.com/)
- [Airbyte Slack](https://slack.airbyte.io/)
## π Need Help?
1. Check error messages in Turntable.so
2. Review Airbyte logs for sync issues
3. Visit [dbt Discourse](https://discourse.getdbt.com/)
4. Create an issue in this repository
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request
## π License
MIT License - see [LICENSE](LICENSE) file
## BigQuery Setup Guide
For detailed instructions on setting up your BigQuery connection, including OAuth authentication and testing, see our [BigQuery Setup Guide](docs/bigquery_setup.md).
---
# β Credits & Connect
## π About This Repository
This repository is maintained by [Matt Strautmann](https://www.linkedin.com/in/mattstrautmann), an experienced **is working closely with Founder/CEOs to use your Data to improve your bottom line. Period.** Let me help you **trust your data. know your customer. improve your bottom line.**
### Why Star This Repository?
Starring this repository helps me understand which tools, templates, and projects bring the most value to the community. Your support motivates me to keep producing high-quality content and maintain these resources for everyone!
## π Support This Project
If this repository has helped you:
1. Give it a β to show your appreciation!
2. Share it with others who might find it useful.
## π€ Connect with Me
Iβd love to hear how youβre using this repository or discuss how I can help with your next project. Letβs connect:
- **LinkedIn**: [Matt Strautmann](https://www.linkedin.com/in/mattstrautmann)
- **GitHub**: [Matt Strautmann](https://github.com/matt-strautmann)
---