{"id":23011578,"url":"https://github.com/mrzzy/providence","last_synced_at":"2025-08-26T03:21:10.338Z","repository":{"id":148378643,"uuid":"614174869","full_name":"mrzzy/providence","owner":"mrzzy","description":"Personal Finance Data Pipeline \u0026 Dashboard","archived":false,"fork":false,"pushed_at":"2025-06-18T23:05:44.000Z","size":9287,"stargazers_count":11,"open_issues_count":20,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-19T00:19:33.195Z","etag":null,"topics":["automation","azure","dashboard","data-engineering","data-pipeline","data-visualization","dbt","duckdb","pandas","prefect","sql","superset","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mrzzy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-03-15T03:29:07.000Z","updated_at":"2025-06-16T19:50:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"17c10c48-f38a-4c70-8d24-16ea4164f4f3","html_url":"https://github.com/mrzzy/providence","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mrzzy/providence","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrzzy%2Fprovidence","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrzzy%2Fprovidence/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrzzy%2Fprovidence/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrzzy%2Fprovidence/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mrzzy","download_url":"https://codeload.github.com/mrzzy/providence/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrzzy%2Fprovidence/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260659187,"owners_count":23043457,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","azure","dashboard","data-engineering","data-pipeline","data-visualization","dbt","duckdb","pandas","prefect","sql","superset","web-scraping"],"created_at":"2024-12-15T10:09:43.329Z","updated_at":"2025-07-09T13:09:19.446Z","avatar_url":"https://github.com/mrzzy.png","language":"Python","readme":"![Providence Banner](assets/banner.png)\n\n# Providence\n\nPersonal Finance Data Pipeline \u0026 Dashboard.\n\n## Features\n\nProvidence aims to make personal finance less tedious with automation \u0026 less opaque with visualisation:\n\n- **Data Pipeline** Extract Load Transform (ELT) pipeline to scrape data from various sources:\n  - **YNAB** extracts accounting transactions from JSON budget data from YNAB API\n  - **SimplyGo** extracts public transport trip data from SimplyGo API\n  - **UOB** extract bank statement transactions from Excel export.\n- **Data Model** DBT Dimensional model integrates data from disparate sources together for analysis.\n- **Finance Dashboard** Superset dashboard presets easy to digest metrics on financial health.\n\n## Architecture\n\n```mermaid\n---\ntitle: Providence V2\n---\n\nflowchart LR\n    subgraph p[Prefect]\n      direction LR\n      ynab((YNAB)) \u0026  uob((UOB)) \u0026 simplygo((SimplyGO)) --\u003e|sinks| b2\n      subgraph b2[B2 Bucket]\n          direction LR\n          raw[Raw: JSON, Excel] --\u003e|transform| tfms[[Transforms on ACI]] --\u003e staging[Staging: parquet]\n      end\n      staging --\u003e|load| dbt[[DBT on ACI]] --\u003e dw[(MotherDuck\n DuckDB)]\n      dw --\u003e|transform| dbt\n      dw --\u003e|visualise| viz(((Superset)))\n   end\n```\n\nV2 architecture redesign focuses on lowering the Total Cost of Ownership (TCO)\nby relying on Serverless Compute and free-tier Managed Services:\n\n- **Compute** Azure Container Instances (ACI)\n- **Data Lake** Backblaze B2\n- **Data Warehouse** MotherDuck DuckDB\n- **Orchestration** Prefect\n- **Visualisation** Apache Superset\n\n## Data Model\n\nThe Kimbal / Dimensional data model of fact \u0026 dimension tables is designed for easy analytic querying:\n\n```mermaid\n---\ntitle: Providence Data Model\n---\nerDiagram\n    dim_date {\n        date id PK\n        date date\n        int day_of_month\n        int day_of_week\n        int day_of_year\n        int week_of_month\n        int week_of_year\n        string weekday_name\n        string weekday_short\n        int month_of_year\n        string month_name\n        string month_short\n        int quarter\n        date month_year\n        int year\n        boolean is_weekend\n    }\n\n    fact_public_transport_trip_leg {\n        string id PK\n        timestamp traveled_on\n        int travel_date_id FK\n        decimal cost_sgd\n        string source\n        string destination\n        string transport_mode\n        string bank_card_id FK\n        string account_id FK\n        string billing_ref\n        boolean is_billed\n        timestamp updated_at\n    }\n\n    dim_bank_card {\n        string id PK\n        string name\n        timestamp updated_at\n    }\n\n    fact_public_transport_trip_leg }| -- || dim_account: \"billed to\"\n    fact_public_transport_trip_leg }| -- || dim_bank_card: \"billed on\"\n\n    fact_accounting_transaction {\n        string id PK\n        string super_id FK\n        decimal amount\n        string description\n        string clearing_status\n        boolean is_approved\n        boolean is_deleted\n        int budget_id FK\n        int account_id FK\n        int payee_id FK\n        int transfer_account_id FK\n        int date_id FK\n        timestamp updated_at\n    }\n\n    dim_budget {\n        string id PK\n        string name\n        timestamp modified_at\n        string currency_code\n        string currency_symbol\n        timestamp updated_at\n    }\n\n    dim_account {\n        string id PK\n        string name\n        boolean is_closed\n        boolean is_deleted\n        boolean is_liquid\n        string budget_type\n        string vendor\n        string vendor_id\n        string vendor_type\n        timestamp updated_at\n    }\n\n    dim_payee {\n        string id PK\n        boolean is_deleted\n        string transfer_account_id FK\n        timestamp updated_at\n    }\n\n    dim_budget_category {\n        string id PK\n        string category_id\n        string name\n        string budget_id\n        string category_group_id\n        string category_group\n        string goal_type\n        decimal goal_amount\n        date goal_due\n        boolean is_deleted\n        timestamp updated_at\n        timestamp effective_at\n        timestamp expired_at\n    }\n\n    fact_accounting_transaction }|--|| fact_accounting_transaction: \"parent\"\n    fact_accounting_transaction }|--|| dim_budget: \"uses\"\n    fact_accounting_transaction }|--|| dim_budget_category: \"classified as\"\n    fact_accounting_transaction }|--|| dim_account: \"on\"\n    fact_accounting_transaction }|--|| dim_account: \"transfer to\"\n    fact_accounting_transaction }|--|| dim_payee: \"paid to\"\n\n    fact_monthly_budget {\n        string id PK\n        int month_date_id FK\n        int budget_id FK\n        int category_id FK\n        decimal amount\n        timestamp updated_at\n    }\n\n    fact_monthly_budget }|--|| dim_budget: \"allocated to\"\n    fact_monthly_budget }|--|| dim_budget_category: \"classified as\"\n\n    fact_vendor_transaction {\n        string id PK\n        string description\n        decimal amount\n        int date_id FK\n        int account_id FK\n        timestamp updated_at\n    }\n\n    fact_vendor_transaction }|--|| dim_account: \"on\"\n\n    fact_bank_statement {\n        string id PK\n        int begin_date_id FK\n        int end_date_id FK\n        int account_id FK\n        decimal balance\n        timestamp updated_at\n    }\n    fact_bank_statement }|--|| dim_account: \"on\"\n```\n\nSee [DBT Docs](https://mrzzy.github.io/providence/#!/overview) for more details.\n\n## License\n\nMIT.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrzzy%2Fprovidence","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrzzy%2Fprovidence","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrzzy%2Fprovidence/lists"}