{"id":51418760,"url":"https://github.com/alopezmoreira1989/dbt_project","last_synced_at":"2026-07-04T22:30:32.662Z","repository":{"id":352836910,"uuid":"1216167329","full_name":"alopezmoreira1989/dbt_project","owner":"alopezmoreira1989","description":"End-to-end analytics engineering project built on top of Snowflake's TPCH_SF1 sample dataset. It transforms raw OLTP tables (customers, orders, lineitems, suppliers, parts) into a clean, tested, business-ready star schema using dbt Fusion.  ","archived":false,"fork":false,"pushed_at":"2026-05-22T08:26:22.000Z","size":3661,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-22T15:39:16.830Z","etag":null,"topics":["dbt","sql"],"latest_commit_sha":null,"homepage":"","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alopezmoreira1989.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-20T16:27:45.000Z","updated_at":"2026-05-22T08:26:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/alopezmoreira1989/dbt_project","commit_stats":null,"previous_names":["alejandrolmoreira-lgtm/analytics","alopezmoreira1989/dbt_project"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/alopezmoreira1989/dbt_project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezmoreira1989%2Fdbt_project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezmoreira1989%2Fdbt_project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezmoreira1989%2Fdbt_project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezmoreira1989%2Fdbt_project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alopezmoreira1989","download_url":"https://codeload.github.com/alopezmoreira1989/dbt_project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezmoreira1989%2Fdbt_project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35138074,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-04T02:00:05.987Z","response_time":113,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dbt","sql"],"created_at":"2026-07-04T22:30:31.898Z","updated_at":"2026-07-04T22:30:32.629Z","avatar_url":"https://github.com/alopezmoreira1989.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Analytics — dbt project on TPC-H (Snowflake)\n\nEnd-to-end analytics engineering project built on top of Snowflake's `TPCH_SF1` sample dataset. It transforms raw OLTP tables (customers, orders, lineitems, suppliers, parts) into a clean, tested, business-ready star schema using **dbt Fusion**.\n\nThe goal of the project is to demonstrate a realistic, production-style dbt workflow: layered models, documented sources, generic tests, environment separation (dev / prod), and version control via GitHub.\n\n---\n\n## Overview\n\nThis project simulates a production-ready data transformation layer using dbt on Snowflake.\n\nThe goal is to transform raw data into clean, well-structured analytical models following best practices in modern data stacks.\n\n---\n\n## Why dbt?\n\ndbt allows transforming data using software engineering best practices:\n\n- Version control\n- Testing\n- Documentation\n- Modular design\n\nThis makes data pipelines more robust and production-ready.\n\n---\n\n## Architecture\n\nThe project is structured into three main layers:\n\n- **Staging**: Cleans and standardizes raw data sources\n- **Intermediate**: Applies business logic and transformations\n- **Marts**: Final analytical tables optimized for reporting and analysis\n\nThis layered approach ensures modularity, scalability, and maintainability.\n\n\n---\n\n## Stack\n\n| Layer | Tool |\n|---|---|\n| Warehouse | Snowflake (`SNOWFLAKE_SAMPLE_DATA.TPCH_SF1`) |\n| Transformation | dbt Fusion 2.0 |\n| IDE | VS Code + dbt extension |\n| Version control | Git / GitHub |\n| Orchestration | dbt Cloud (Development + Production environments) |\n\n---\n\n## Source data\n\nSnowflake's open `TPCH_SF1` schema — a TPC benchmark dataset modeling a wholesale supplier:\n\n| Table | Rows |\n|---|---|\n| CUSTOMER | 150K |\n| LINEITEM | 6.0M |\n| NATION | 25 |\n| ORDERS | 1.5M |\n| PART | 200K |\n| PARTSUPP | 800K |\n| REGION | 5 |\n| SUPPLIER | 10K |\n\nSources are declared in `models/s1_staging/_scr_tpch.yml`.\n\n---\n\n## Project architecture\n\nThree-layer model structure, each with its own materialization strategy defined in `dbt_project.yml`:\n\n```\nmodels/\n├── s1_staging/        → materialized as table\n├── s2_intermediate/   → materialized as view\n└── s3_marts/          → materialized as table\n```\n\n### `s1_staging` — clean \u0026 rename\n\nOne staging model per source table. Light transformations only: column renaming (`l_orderkey` → `order_key`), type casting, no joins, no business logic. This is the only layer that reads from `source()`.\n\n### `s2_intermediate` — enrich \u0026 combine\n\n`int_lineitem_enriched` joins lineitems with orders, customers, nations, regions, suppliers, parts and partsupp, then derives the core business metrics (net revenue, discount amount, total cost, profit, price variance, expected revenue) at the line-item grain.\n\n### `s3_marts` — dimensional model\n\nStar schema ready for BI:\n\n- **Fact table:** `fact_lineitem` (grain: one row per order line)\n- **Dimensions:** `dim_customer`, `dim_supplier`, `dim_part`, `dim_part_supplier`, `dim_location`, `dim_date`\n\n### Lineage\n\nThe DAG below is generated from `dbt build` — every staging model feeds the intermediate layer (or marts directly), and the enriched intermediate feeds the fact table.\n\n![Lineage](docs/lineage.png)\n\n---\n\n## Business metrics\n\nDefined once in `int_lineitem_enriched` and inherited by `fact_lineitem`:\n\n| Metric | Formula |\n|---|---|\n| `gross_revenue` | `line_extended_price` |\n| `net_revenue` | `line_extended_price * (1 - line_discount)` |\n| `discount_amount` | `line_extended_price * line_discount` |\n| `expected_revenue` | `part_retail_price * line_quantity` |\n| `price_variance` | `(part_retail_price * line_quantity) - line_extended_price` |\n| `total_cost` | `line_quantity * supply_cost` |\n| `profit` | `net_revenue - total_cost` |\n\n---\n\n## Tests \u0026 documentation\n\nEvery model has a corresponding YAML with column descriptions and generic tests. Currently **40 tests** across the project:\n\n- `unique` and `not_null` on every primary key (staging, intermediate, marts)\n- `not_null` on every business metric in the intermediate and fact layers\n- `relationships` tests on every foreign key in the fact table (→ `dim_customer`, `dim_part`, `dim_supplier`)\n- `relationships` tests on the staging FKs (`stg_orders.cust_key`, `stg_lineitem.order_key`, etc.)\n\nRun them with:\n\n```bash\ndbt test\n```\n\n---\n\n## Project layout\n\n```\nproject/\n├── analyses/                  # ad-hoc SQL (not materialized)\n│   ├── customer_value.sql\n│   ├── pricing_analysis.sql\n│   ├── profitability.sql\n│   ├── revenue_trend.sql\n│   └── supplier_performance.sql\n├── models/\n│   ├── s1_staging/\n│   │   ├── _scr_tpch.yml      # source declarations\n│   │   ├── _stg_tpch.yml      # staging model tests + docs\n│   │   ├── _tpch_docs.md      # shared doc blocks\n│   │   └── stg_*.sql          # 8 staging models\n│   ├── s2_intermediate/\n│   │   ├── _int_tpch.yml\n│   │   └── int_lineitem_enriched.sql\n│   └── s3_marts/\n│       ├── _mart_tpch.yml\n│       ├── fact_lineitem.sql\n│       └── dim_*.sql          # 6 dimensions\n├── dbt_project.yml\n├── packages.yml\n└── profiles.yml\n```\n\n---\n\n## Environments\n\nTwo environments are configured in dbt Cloud, both running on dbt Fusion:\n\n- **Development (`DEV`)** — used for local development from VS Code, writes to a personal dev schema.\n- **Production (`PROD`)** — built from the `main` branch, writes to `analytics_prod`.\n\nThe dbt profile is `mi_proyecto_dbt`, defined in `profiles.yml`.\n\n---\n\n## Running the project\n\nPrerequisites: dbt Fusion installed, Snowflake credentials configured in `~/.dbt/profiles.yml`.\n\n```bash\n# Install dependencies\ndbt deps\n\n# Run all models\ndbt run\n\n# Run all tests\ndbt test\n\n# Run + test in one shot\ndbt build\n\n# Generate \u0026 serve docs\ndbt docs generate\ndbt docs serve\n```\n\nLast successful build: **16 models, 40 tests, 0 failures.**\n\n---\n\n## Repository\n\n[github.com/alopezmoreira1989/dbt_project](https://github.com/alopezmoreira1989/dbt_project)\n\n---\n\n## Author\n\n**Alejandro López Moreira** — Analytics Engineer\n\nBuilt as a portfolio project to practice end-to-end analytics engineering with dbt Fusion and Snowflake.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falopezmoreira1989%2Fdbt_project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falopezmoreira1989%2Fdbt_project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falopezmoreira1989%2Fdbt_project/lists"}