{"id":20855783,"url":"https://github.com/jovicdev97/Financial-Loan-DataScience-Notebook","last_synced_at":"2025-03-12T13:41:10.195Z","repository":{"id":260703029,"uuid":"876587665","full_name":"jovicdev97/financial-data-analytics","owner":"jovicdev97","description":"using numpy and pandas to analyze a synthetic loan dataset with python","archived":false,"fork":false,"pushed_at":"2025-01-04T17:19:27.000Z","size":13180,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-04T18:25:47.568Z","etag":null,"topics":["data-analysis","matlabplot","numpy","pandas","plotting","python","seaborn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jovicdev97.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-22T08:18:28.000Z","updated_at":"2025-01-04T17:19:30.000Z","dependencies_parsed_at":"2024-11-02T00:17:02.569Z","dependency_job_id":"48e08eaf-6b70-4ec3-8fbc-7bf08202a072","html_url":"https://github.com/jovicdev97/financial-data-analytics","commit_stats":null,"previous_names":["jovicdev97/financial-data-analytics"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jovicdev97%2Ffinancial-data-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jovicdev97%2Ffinancial-data-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jovicdev97%2Ffinancial-data-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jovicdev97%2Ffinancial-data-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jovicdev97","download_url":"https://codeload.github.com/jovicdev97/financial-data-analytics/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234609499,"owners_count":18859869,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","matlabplot","numpy","pandas","plotting","python","seaborn"],"created_at":"2024-11-18T04:25:27.064Z","updated_at":"2025-03-12T13:41:09.831Z","avatar_url":"https://github.com/jovicdev97.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"### Source\n- Dataset: Synthetic Loan Dataset\n- Platform: Kaggle\n- Link: [financial-risk-for-loan-approval](https://www.kaggle.com/datasets/lorenzozoppelletto/financial-risk-for-loan-approval)\n- Type: Synthetic/Generated Data\n- Records: 20,000\n- Features: 36 columns\n\n### Synthetic Data\n- Protecting individual privacy\n- Avoiding ethical concerns related to financial data\n- Allowing open sharing and collaboration\n- Maintaining realistic data patterns while eliminating sensitive information\n\n### Dataset Features\n- Application details (date, loan amount, duration)\n- Personal information (age, employment status, education)\n- Financial metrics (annual income, credit score, interest rates)\n- Risk assessment (risk score, loan approval status)\n\n## Analysis Features\n\n### 1. Data Loading and Initial Exploration\n- Loading dataset using Pandas\n- Basic data examination with head() function\n- Data cleaning and preprocessing\n\n### 2. Array Operations with NumPy\n- Creating and manipulating different data type arrays\n- Filtering operations\n- Statistical calculations\n\n### 3. Financial Analysis\n- Debt-to-Income ratio calculations\n- Monthly payment analysis\n- Interest rate examination\n- Credit risk assessment\n\n### 4. Data Visualization\n- Line chart: Interest rates over time\n- Bar chart: Distribution of employment status\n- Histogram: Annual income distribution\n- Box plot: Interest rates by education level\n- Scatter plot: Credit score vs interest rate correlation\n\n## Key Insights\n- Interest rates remain relatively stable over the analyzed time period\n- Most loan applicants are employed\n- Majority of applicants have annual income under $100,000\n- Higher credit scores correlate with lower interest rates\n- Education level shows minimal impact on base interest rates\n\n## Technical Requirements\n### Dependencies\n- Python 3.x\n- NumPy \u003e= 1.19.2\n- Pandas \u003e= 1.2.0\n- Matplotlib \u003e= 3.3.2\n- optional: (Seaborn \u003e= 0.11.0)\n\n### Hardware Requirements\n- Minimum 4GB RAM\n- 1GB free disk space\n\n## Installation\n```bash\n# Create virtual environment (optional but strongly (!) recommended)\npython -m venv env\nsource env/bin/activate  # On Windows: env\\Scripts\\activate\n\n# Install required packages\npip install numpy pandas matplotlib seaborn\n\n# Clone repository\ngit clone https://github.com/jovicdev97/loan-analysis.git\ncd loan-analysis\n# Usage\nClone this repository\nPlace the Loan.csv dataset in the project directory\nRun the Jupyter notebook:\nbash\njupyter notebook loan_analysis.ipynb\n\n# Project Structure\nbasic\n\nloan-analysis/\n│\n├── data/\n│   └── Loan.csv\n│\n├── notebooks/\n│   └── loan_analysis.ipynb\n│\n├── README.md\n└── requirements.txt\n```\n\n# DATA\nSOURCE OF DATA IS KAGGLE\nKaggle https://www.kaggle.com/datasets/lorenzozoppelletto/financial-risk-for-loan-approval\n\nThe original authos provides the Python Snippet to generate the provided data we are using in this project:\n\n```\nimport pandas as pd\nimport numpy as np\nfrom scipy import stats\nfrom datetime import datetime, timedelta\n\n# Number of samples\nnum_samples = 2000\n\n# Seed for reproducibility\nnp.random.seed(42)\n\ndef generate_correlated_features(num_samples):\n    # Generate base features\n    age = np.random.normal(40, 12, num_samples).clip(18, 80).astype(int)\n    experience = (age - 18 - np.random.normal(4, 2, num_samples).clip(0)).clip(0).astype(int)\n    education_level = np.random.choice(['High School', 'Associate', 'Bachelor', 'Master', 'Doctorate'], num_samples, p=[0.3, 0.2, 0.3, 0.15, 0.05])\n    \n    # Education affects income and credit score\n    edu_impact = {'High School': 0, 'Associate': 0.1, 'Bachelor': 0.2, 'Master': 0.3, 'Doctorate': 0.4}\n    edu_factor = np.array([edu_impact[level] for level in education_level])\n    \n    # Generate correlated income, credit score, and employment status\n    base_income = np.random.lognormal(10.5, 0.6, num_samples) * (1 + edu_factor) * (1 + experience / 100)\n    income_noise = np.random.normal(0, 0.1, num_samples)\n    annual_income = (base_income * (1 + income_noise)).clip(15000, 300000).astype(int)\n    \n    credit_score_base = 300 + 300 * stats.beta.rvs(5, 1.5, size=num_samples)\n    credit_score = (credit_score_base + edu_factor * 100 + experience * 1.5 + income_noise * 100).clip(300, 850).astype(int)\n    \n    employment_status_probs = np.column_stack([\n        0.9 - edu_factor * 0.3,  # Employed\n        0.05 + edu_factor * 0.2,  # Self-Employed\n        0.05 + edu_factor * 0.1   # Unemployed\n    ])\n    employment_status = np.array(['Employed', 'Self-Employed', 'Unemployed'])[np.argmax(np.random.random(num_samples)[:, np.newaxis] \u003c employment_status_probs.cumsum(axis=1), axis=1)]\n    \n    return age, experience, education_level, annual_income, credit_score, employment_status\n\ndef generate_time_based_features(num_samples):\n    start_date = datetime(2018, 1, 1)\n    dates = [start_date + timedelta(days=i) for i in range(num_samples)]\n    return dates\n\nage, experience, education_level, annual_income, credit_score, employment_status = generate_correlated_features(num_samples)\napplication_dates = generate_time_based_features(num_samples)\n\ndata = {\n    'ApplicationDate': application_dates,\n    'Age': age,\n    'AnnualIncome': annual_income,\n    'CreditScore': credit_score,\n    'EmploymentStatus': employment_status,\n    'EducationLevel': education_level,\n    'Experience': experience,\n    'LoanAmount': np.random.lognormal(10, 0.5, num_samples).astype(int),\n    'LoanDuration': np.random.choice([12, 24, 36, 48, 60, 72, 84, 96, 108, 120], num_samples, p=[0.05, 0.1, 0.2, 0.2, 0.2, 0.1, 0.05, 0.05, 0.025, 0.025]),\n    'MaritalStatus': np.random.choice(['Single', 'Married', 'Divorced', 'Widowed'], num_samples, p=[0.3, 0.5, 0.15, 0.05]),\n    'NumberOfDependents': np.random.choice([0, 1, 2, 3, 4, 5], num_samples, p=[0.3, 0.25, 0.2, 0.15, 0.07, 0.03]),\n    'HomeOwnershipStatus': np.random.choice(['Own', 'Rent', 'Mortgage', 'Other'], num_samples, p=[0.2, 0.3, 0.4, 0.1]),\n    'MonthlyDebtPayments': np.random.lognormal(6, 0.5, num_samples).astype(int),\n    'CreditCardUtilizationRate': np.random.beta(2, 5, num_samples),\n    'NumberOfOpenCreditLines': np.random.poisson(3, num_samples).clip(0, 15).astype(int),\n    'NumberOfCreditInquiries': np.random.poisson(1, num_samples).clip(0, 10).astype(int),\n    'DebtToIncomeRatio': np.random.beta(2, 5, num_samples),\n    'BankruptcyHistory': np.random.choice([0, 1], num_samples, p=[0.95, 0.05]),\n    'LoanPurpose': np.random.choice(['Home', 'Auto', 'Education', 'Debt Consolidation', 'Other'], num_samples, p=[0.3, 0.2, 0.15, 0.25, 0.1]),\n    'PreviousLoanDefaults': np.random.choice([0, 1], num_samples, p=[0.9, 0.1]),\n    'PaymentHistory': np.random.poisson(24, num_samples).clip(0, 60).astype(int),\n    'LengthOfCreditHistory': np.random.randint(1, 30, num_samples),\n    'SavingsAccountBalance': np.random.lognormal(8, 1, num_samples).astype(int),\n    'CheckingAccountBalance': np.random.lognormal(7, 1, num_samples).astype(int),\n    'TotalAssets': np.random.lognormal(11, 1, num_samples).astype(int),\n    'TotalLiabilities': np.random.lognormal(10, 1, num_samples).astype(int),\n    'MonthlyIncome': annual_income / 12,\n    'UtilityBillsPaymentHistory': np.random.beta(8, 2, num_samples),\n    'JobTenure': np.random.poisson(5, num_samples).clip(0, 40).astype(int),\n}\n\n# Create DataFrame\ndf = pd.DataFrame(data)\n\n# Ensure TotalAssets is always greater than or equal to the sum of SavingsAccountBalance and CheckingAccountBalance\ndf['TotalAssets'] = np.maximum(df['TotalAssets'], df['SavingsAccountBalance'] + df['CheckingAccountBalance'])\n\n# Add more complex derived features\nmin_net_worth = 1000  # Set a minimum net worth\ndf['NetWorth'] = np.maximum(df['TotalAssets'] - df['TotalLiabilities'], min_net_worth)\n\n# More realistic interest rate based on credit score, loan amount, and loan duration\ndf['BaseInterestRate'] = 0.03 + (850 - df['CreditScore']) / 2000 + df['LoanAmount'] / 1000000 + df['LoanDuration'] / 1200\ndf['InterestRate'] = df['BaseInterestRate'] * (1 + np.random.normal(0, 0.1, num_samples)).clip(0.8, 1.2)\n\ndf['MonthlyLoanPayment'] = (df['LoanAmount'] * (df['InterestRate']/12)) / (1 - (1 + df['InterestRate']/12)**(-df['LoanDuration']))\ndf['TotalDebtToIncomeRatio'] = (df['MonthlyDebtPayments'] + df['MonthlyLoanPayment']) / df['MonthlyIncome']\n\n# Create a more complex loan approval rule\ndef loan_approval_rule(row):\n    score = 0\n    score += (row['CreditScore'] - 600) / 250  # Credit score factor\n    score += (100000 - row['AnnualIncome']) / 100000  # Income factor\n    score += (row['TotalDebtToIncomeRatio'] - 0.4) * 2  # DTI factor\n    score += (row['LoanAmount'] - 10000) / 90000  # Loan amount factor\n    score += (row['InterestRate'] - 0.05) * 10  # Interest rate factor\n    score += 0.5 if row['BankruptcyHistory'] == 1 else 0  # Bankruptcy penalty\n    score += 0.3 if row['PreviousLoanDefaults'] == 1 else 0  # Previous default penalty\n    score += 0.2 if row['EmploymentStatus'] == 'Unemployed' else 0  # Employment status factor\n    score -= 0.1 if row['HomeOwnershipStatus'] in ['Own', 'Mortgage'] else 0  # Home ownership factor\n    score -= row['PaymentHistory'] / 120  # Payment history factor\n    score -= row['LengthOfCreditHistory'] / 60  # Length of credit history factor\n    score -= row['NetWorth'] / 500000  # Net worth factor\n    \n    # Age factor (slight preference for middle-aged applicants)\n    score += abs(row['Age'] - 40) / 100\n    \n    # Experience factor\n    score -= row['Experience'] / 200\n    \n    # Education factor\n    edu_score = {'High School': 0.2, 'Associate': 0.1, 'Bachelor': 0, 'Master': -0.1, 'Doctorate': -0.2}\n    score += edu_score[row['EducationLevel']]\n    \n    # Seasonal factor (higher approval rates in spring/summer)\n    month = row['ApplicationDate'].month\n    score -= 0.1 if 3 \u003c= month \u003c= 8 else 0\n    \n    # Random factor to add some unpredictability\n    score += np.random.normal(0, 0.1)\n    \n    return 1 if score \u003c 1 else 0  # Adjust this threshold to change overall approval rate\n\ndf['LoanApproved'] = df.apply(loan_approval_rule, axis=1)\n\n# Add some noise and outliers\nnoise_mask = np.random.choice([True, False], num_samples, p=[0.01, 0.99])\ndf.loc[noise_mask, 'AnnualIncome'] = (df.loc[noise_mask, 'AnnualIncome'] * np.random.uniform(1.5, 2.0, noise_mask.sum())).astype(int)\n\nlow_net_worth_mask = df['NetWorth'] == min_net_worth\ndf.loc[low_net_worth_mask, 'NetWorth'] += np.random.randint(0, 10000, size=low_net_worth_mask.sum())\n\n# Print some statistics\nprint(f\"Loan Approval Rate: {df['LoanApproved'].mean():.2%}\")\nprint(f\"Average Credit Score: {df['CreditScore'].mean():.0f}\")\nprint(f\"Average Annual Income: ${df['AnnualIncome'].mean():.0f}\")\nprint(f\"Average Loan Amount: ${df['LoanAmount'].mean():.0f}\")\nprint(f\"Average Total Debt-to-Income Ratio: {df['TotalDebtToIncomeRatio'].mean():.2f}\")\nprint(f\"Average Interest Rate: {df['InterestRate'].mean():.2%}\")\n\ndef assign_credit_score_risk(credit_score):\n    if credit_score \u003e= 750: return 1\n    elif 700 \u003c= credit_score \u003c 750: return 2\n    elif 650 \u003c= credit_score \u003c 700: return 3\n    elif 600 \u003c= credit_score \u003c 650: return 4\n    else: return 5\n\ndef assign_dti_risk(dti):\n    if dti \u003c 0.20: return 1\n    elif 0.20 \u003c= dti \u003c 0.30: return 2\n    elif 0.30 \u003c= dti \u003c 0.40: return 3\n    elif 0.40 \u003c= dti \u003c 0.50: return 4\n    else: return 5\n\ndef assign_payment_history_risk(payment_history):\n    if payment_history \u003e= 99: return 1\n    elif 97 \u003c= payment_history \u003c 99: return 2\n    elif 95 \u003c= payment_history \u003c 97: return 3\n    elif 90 \u003c= payment_history \u003c 95: return 4\n    else: return 5\n\ndef assign_bankruptcy_risk(bankruptcy_history):\n    return 5 if bankruptcy_history else 1\n\ndef assign_previous_defaults_risk(previous_defaults):\n    if previous_defaults == 0: return 1\n    elif previous_defaults == 1: return 3\n    else: return 5\n\ndef assign_utilization_risk(utilization):\n    if utilization \u003c 0.20: return 1\n    elif 0.20 \u003c= utilization \u003c 0.40: return 2\n    elif 0.40 \u003c= utilization \u003c 0.60: return 3\n    elif 0.60 \u003c= utilization \u003c 0.80: return 4\n    else: return 5\n\ndef assign_credit_history_risk(length_of_history):\n    if length_of_history \u003e= 10: return 1\n    elif 7 \u003c= length_of_history \u003c 10: return 2\n    elif 5 \u003c= length_of_history \u003c 7: return 3\n    elif 3 \u003c= length_of_history \u003c 5: return 4\n    else: return 5\n\ndef assign_income_risk(annual_income):\n    if annual_income \u003e= 120000: return 1\n    elif 80000 \u003c= annual_income \u003c 120000: return 2\n    elif 50000 \u003c= annual_income \u003c 80000: return 3\n    elif 30000 \u003c= annual_income \u003c 50000: return 4\n    else: return 5\n\ndef assign_employment_risk(employment_status):\n    if employment_status == 'Employed': return 1\n    elif employment_status == 'Self-employed': return 2\n    elif employment_status == 'Part-time': return 3\n    else: return 4  # Unemployed or other\n\ndef assign_net_worth_risk(net_worth):\n    if net_worth \u003e= 500000: return 1\n    elif 250000 \u003c= net_worth \u003c 500000: return 2\n    elif 100000 \u003c= net_worth \u003c 250000: return 3\n    elif 50000 \u003c= net_worth \u003c 100000: return 4\n    else: return 5\n\n# Refined overall risk calculation\ndef calculate_overall_risk(row):\n    base_score = (\n        assign_credit_score_risk(row['CreditScore']) * 3 +\n        assign_dti_risk(row['DebtToIncomeRatio']) * 2 +\n        assign_payment_history_risk(row['PaymentHistory']) * 2 +\n        assign_bankruptcy_risk(row['BankruptcyHistory']) * 3 +\n        assign_previous_defaults_risk(row['PreviousLoanDefaults']) * 3 +\n        assign_utilization_risk(row['CreditCardUtilizationRate']) +\n        assign_credit_history_risk(row['LengthOfCreditHistory']) +\n        assign_income_risk(row['AnnualIncome']) +\n        assign_employment_risk(row['EmploymentStatus']) +\n        assign_net_worth_risk(row['NetWorth']) * 2\n    )\n    \n    # Adjust score based on loan approval status\n    if row['LoanApproved'] == 1:  # Assuming 1 means approved\n        base_score *= 0.8  # Reduce risk score for approved loans\n    \n    return base_score\n\n# Apply the refined risk calculation\ndf['RiskScore'] = df.apply(calculate_overall_risk, axis=1)\n\n# Save to CSV\ndf.to_csv('focused_synthetic_loan_data.csv', index=False)\nprint(\"\\nFocused synthetic data saved to 'focused_synthetic_loan_data.csv'\")\n\n# Display final feature count\nprint(f\"\\nTotal number of features (including label): {len(df.columns)}\")\nprint(\"\\nFeatures:\")\nfor column in df.columns:\n    print(f\"- {column}\")\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjovicdev97%2FFinancial-Loan-DataScience-Notebook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjovicdev97%2FFinancial-Loan-DataScience-Notebook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjovicdev97%2FFinancial-Loan-DataScience-Notebook/lists"}