{"id":20736797,"url":"https://github.com/lu-sketch/google-big-query-sql---credit-risk-analysis","last_synced_at":"2026-05-25T13:40:25.254Z","repository":{"id":263191029,"uuid":"889630983","full_name":"lu-sketch/Google-Big-Query-SQL---Credit-Risk-Analysis","owner":"lu-sketch","description":"Big Query SQL Credit Risk Analysis","archived":false,"fork":false,"pushed_at":"2024-11-18T13:46:40.000Z","size":310,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-11T11:29:34.238Z","etag":null,"topics":["big-data","bigquery","credit-risk","sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lu-sketch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-16T20:18:04.000Z","updated_at":"2024-11-18T13:46:43.000Z","dependencies_parsed_at":"2024-11-16T21:24:59.582Z","dependency_job_id":"bf514de2-b3c7-4d35-b994-a7e51e8f91d6","html_url":"https://github.com/lu-sketch/Google-Big-Query-SQL---Credit-Risk-Analysis","commit_stats":null,"previous_names":["lu-sketch/big-query-sql---credit-risk-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lu-sketch/Google-Big-Query-SQL---Credit-Risk-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lu-sketch%2FGoogle-Big-Query-SQL---Credit-Risk-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lu-sketch%2FGoogle-Big-Query-SQL---Credit-Risk-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lu-sketch%2FGoogle-Big-Query-SQL---Credit-Risk-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lu-sketch%2FGoogle-Big-Query-SQL---Credit-Risk-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lu-sketch","download_url":"https://codeload.github.com/lu-sketch/Google-Big-Query-SQL---Credit-Risk-Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lu-sketch%2FGoogle-Big-Query-SQL---Credit-Risk-Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27763648,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-16T02:00:10.477Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","bigquery","credit-risk","sql"],"created_at":"2024-11-17T06:11:49.600Z","updated_at":"2025-12-16T11:16:20.977Z","avatar_url":"https://github.com/lu-sketch.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Credit Card Default Analysis using SQL on BigQuery\r\n\r\n## Technologies Used\r\n- ![Google BigQuery](https://img.shields.io/badge/Database-Google%20BigQuery-blue?logo=google-cloud\u0026logoColor=white)\r\n- [Google BigQuery](https://cloud.google.com/bigquery)\r\n\r\n\r\n## Project Overview\r\nThis project analyses credit card default patterns using SQL, focusing on demographic factors and payment behaviour to identify trends associated with higher risks of default. The analysis is conducted on the [Credit Card Default dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-data/ml_datasets) available on **Google BigQuery.**\r\n\r\nThrough various SQL queries, this project explores client characteristics, calculates default rates across different demographics, and examines payment histories.\r\n\r\n\r\n## Dataset Information\r\nThe dataset used in this project is hosted in Google BigQuery:\r\n- **Table ID**: `bigquery-public-data.ml_datasets.credit_card_default`\r\n- **Rows**: 2,965\r\n- **Columns**: 24\r\n\r\n### Key Columns\r\n| Column                     | Description                                                                                   |\r\n|----------------------------|-----------------------------------------------------------------------------------------------|\r\n| `id`                       | Unique identifier for each client                                                             |\r\n| `limit_balance`            | Credit limit in NT dollars                                                                    |\r\n| `sex`                      | Gender (1 = Male, 2 = Female)                                                                 |\r\n| `education_level`          | Education level (1 = Graduate, 2 = University, 3 = High School, etc.)                        |\r\n| `marital_status`           | Marital status (1 = Married, 2 = Single, 3 = Other)                                          |\r\n| `age`                      | Client's age                                                                                  |\r\n| `pay_0` to `pay_6`         | Payment delay status for the last six months (-1 = pay duly, 1-9 = months of delay)          |\r\n| `bill_amt_1` to `bill_amt_6` | Bill statement amount over the last six months in NT dollars                             |\r\n| `pay_amt_1` to `pay_amt_6` | Previous payment amounts over the last six months in NT dollars                              |\r\n| `default_payment_next_month` | Default payment indicator for next month (1 = Yes, 0 = No)                               |\r\n\r\n## Analysis and Key Queries\r\nThe analysis progresses from basic SQL queries to more advanced data processing techniques. Below are some highlights of the analysis:\r\n\r\n1. **Demographic Distribution**:\r\n   - Counts of clients by gender, marital status, and education level to understand the composition of the client base.\r\n   \r\n2. **Credit Limit and Age Analysis**:\r\n   - Calculation of average credit limits by age and demographic factors to explore lending patterns.\r\n\r\n3. **Default Rate by Demographic**:\r\n   - Queries calculating default rates across education levels, marital status, and gender to assess risk factors.\r\n\r\n4. **Payment Delay Analysis**:\r\n   - Analysis of average payment delays and outstanding bill amounts to gauge payment behaviours that indicate risk.\r\n\r\n### Example Query: Default Rate by Education Level and Marital Status\r\n```sql\r\nSELECT\r\n    CASE \r\n        WHEN education_level = '1' THEN 'Graduate School'\r\n        WHEN education_level = '2' THEN 'University'\r\n        WHEN education_level = '3' THEN 'High School'\r\n        WHEN education_level = '4' THEN 'Other'\r\n        WHEN education_level IN ('5', '6') THEN 'Unknown'\r\n        ELSE 'Unspecified'\r\n    END AS education_level_label,\r\n    \r\n    CASE \r\n        WHEN marital_status = '1' THEN 'Married'\r\n        WHEN marital_status = '2' THEN 'Single'\r\n        WHEN marital_status = '3' THEN 'Other'\r\n        ELSE 'Unknown'\r\n    END AS marital_status_label,\r\n    \r\n    AVG(CAST(default_payment_next_month AS FLOAT64)) AS default_rate\r\nFROM `bigquery-public-data.ml_datasets.credit_card_default`\r\nGROUP BY education_level_label, marital_status_label\r\nORDER BY default_rate DESC;\r\n```\r\n\r\n## Project Structure\r\n- **SQL Queries**: All SQL queries are organized by their purpose and complexity.\r\n- **Insights**: Insights generated from each query, with interpretations and visualizations (if applicable).\r\n- **README.md**: Project documentation.\r\n\r\n## Running the Analysis\r\n1. **Prerequisites**:\r\n   - Access to Google BigQuery.\r\n   - Familiarity with SQL syntax and basic SQL operations.\r\n\r\n2. **Steps**:\r\n   - Run queries directly on BigQuery against the dataset `bigquery-public-data.ml_datasets.credit_card_default`.\r\n   - Review each query’s results to interpret findings on credit card default patterns.\r\n\r\n## Conclusion\r\nThis project provides insights into the demographic and behavioural factors associated with credit card default risk. The analysis highlights potential indicators that could guide banks in assessing client risk profiles and designing strategies to mitigate defaults.\r\n\r\n## Future Improvements\r\n- **Additional Features**: Incorporate time-series analysis for payment behaviour trends.\r\n- **Machine Learning**: Extend analysis to predict default risk using SQL ML.\r\n\r\n## Project documentation\r\nFor a detailed report showcasing SQL queries, visualizations, \r\nand insights from this analysis, please see the **Notion Report**.\r\n\u003ca href=\"https://mountain-dungeon-fa8.notion.site/SQL-CREDIT-RISK-ANALYSIS-14056e4be2ae80a5bd13c9274d93d529\" style=\"text-decoration: none;\"\u003e\r\n    \u003cbutton style=\"background-color: blue; color: white; padding: 10px 20px; border: none; border-radius: 5px; cursor: pointer;\"\u003e\r\n        REPORT\r\n    \u003c/button\u003e\r\n\u003c/a\u003e\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flu-sketch%2Fgoogle-big-query-sql---credit-risk-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flu-sketch%2Fgoogle-big-query-sql---credit-risk-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flu-sketch%2Fgoogle-big-query-sql---credit-risk-analysis/lists"}