https://github.com/sinclairn101/dbt-nz-banking-loans
A personal dbt project transforming a synthetic New Zealand banking loans dataset into a star schema with staging, dimension, fact, and reporting models.
https://github.com/sinclairn101/dbt-nz-banking-loans
analytics business-intelligence dbt portfolio-project postgres sql
Last synced: about 1 month ago
JSON representation
A personal dbt project transforming a synthetic New Zealand banking loans dataset into a star schema with staging, dimension, fact, and reporting models.
- Host: GitHub
- URL: https://github.com/sinclairn101/dbt-nz-banking-loans
- Owner: SinclairN101
- Created: 2025-09-21T22:07:03.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-21T22:23:55.000Z (9 months ago)
- Last Synced: 2025-09-21T23:31:25.074Z (9 months ago)
- Topics: analytics, business-intelligence, dbt, portfolio-project, postgres, sql
- Language: YAML
- Homepage:
- Size: 1.58 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🏦 NZ Banking Loan & Fraud Risk — dbt Project
This is a portfolio project where I practice **SQL** and **dbt** using a dataset in the lending space.
The raw data comes from Kaggle: [**Synthetic NZ Banking Loan and Fraud Risk Dataset**](https://www.kaggle.com/datasets/sandycandy7/synthetic-nz-banking-loan-and-fraud-risk-dataset).
I transformed the dataset into a layered analytics warehouse with clearly defined staging, dimension, fact, and reporting models. The goal was to replicate how real-world analytics teams structure data for reliable analysis and decision-making.
### 🎯 Goals
- Build a dbt project starting from raw seeded data
- Design and implement a simple star schema
- Write clear SQL transformations to create clean, analysis-ready tables
- Apply dbt tests and documentation to ensure quality and transparency
- Strengthen my end-to-end data & analytics skills
### 🧱 Project Structure
- **Seed (Raw Data)**
The Kaggle CSV is loaded into the warehouse with `dbt seed`. This acts as the upstream input for staging.
- **Staging (`stg_`)**
Cleans and standardizes the seeded table:
- Renames columns to clear, readable names
- Casts 0/1 flags into true/false booleans
- Normalizes categories (e.g. `application_type`)
- Converts interest rates into percentages with proper numeric casting
- **Dimension (`dim_`)**
Descriptive fields you group by or filter on.
Example: `dim_loan_applications` holds borrower demographics, credit grade, loan purpose, and dates.
- **Fact (`fct_`)**
Numeric outcomes and risk metrics you aggregate.
Example: `fct_loan_performance` stores loan amounts, credit scores, utilization ratios, inquiries, and fraud/default flags.
- **Reporting (`rpt_`)**
Pre-aggregated metrics that answer business questions.
Example: `rpt_loan_summary` shows total loans, averages, default counts, and default rates.
### 📊 Example Metrics from `rpt_loan_summary`
- Total loans: ~5,000
- Total loan amount: ~95M
- Average loan amount: ~18.9K
- Average interest rate: ~19.9%
- Average debt-to-income ratio: ~0.18
- Average funding days: ~7.5
- Total defaults: 48
- Default rate: ~0.96%
### 📚 Documentation and Testing
- Each model has a YAML file with column-level descriptions.
- Documentation is written in `docs.md` blocks for clear, human-readable explanations.
- Tests include `unique` and `not_null` checks on keys like `loan_id`.
- I used `dbt docs generate` and `dbt docs serve` to build an interactive documentation site with lineage graphs and column-level details.
### 🔍 Analysis & Dashboard
To showcase the value of the transformed data, I built a Metabase dashboard directly on top of the dbt models.
This dashboard provides both a high-level snapshot of the loan portfolio and deeper risk insights.
### 📊 Portfolio Snapshot
The top row shows key metrics at a glance:
- **Total Loans**
- **Total Loan Amount**
- **Average Loan Size**
- **Average Credit Score**
- **Default Rate (%)**
- **Average Funding Days**
These KPIs summarize the scale and health of the lending portfolio.
### 🧩 Portfolio Mix
- **Loans by Purpose**
- **Loans by Region**
- **Loans by Credit Grade**
### ⚖️ Risk & Performance
- **Defaults by Credit Grade**
- **Defaults Over Time vs Total Loans**
- **Average Interest Rate by Grade**
### 🚀 Operational Insights
- **Funding Speed Distribution**
- **Top 10 Borrowers by Loan Amount**
### 📸 Dashboard