https://github.com/girish119628/loan_approval_model
https://github.com/girish119628/loan_approval_model
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/girish119628/loan_approval_model
- Owner: girish119628
- Created: 2025-04-12T09:20:20.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-04-12T09:25:42.000Z (2 months ago)
- Last Synced: 2025-04-12T10:30:06.586Z (2 months ago)
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **Loan_Approval_Model**
This project presents an end-to-end Data Analytics pipeline for analyzing startup funding trends in India using Python, MySQL, and Power BI. It involves data cleaning, transformation, storage in a relational database, and visual storytelling via a dynamic dashboard.
# [[Team_Explanation_Video](https://drive.google.com/file/d/1zZj66fAu6XQGJF4Jxu7QfcZMNtZHZ-C5/view?usp=sharing)]
# [[Self_Explanation_Video](https://drive.google.com/file/d/1Ju4os0bsRvV5sI3jsAS-ykEPjuFYhbT2/view?usp=drive_link)]# π 1. Data Cleaning & Preprocessing
βοΈ Removal of unwanted spaces in column names and values (str.strip()) is excellent.βοΈ Checked for duplicates and nulls.
βοΈ Created a new meaningful feature (total_assets) and dropped redundant columns.βοΈ Categorical and numerical columns identified clearly.
# Suggestions:
* You could include a missing value imputation step using SimpleImputer for both numeric and categorical data (even if you don't have many nullsβit shows good practice).
* Log-transform or use PowerTransformer for highly skewed features if needed.
# π 2. Exploratory Data Analysis (EDA)
βοΈ Used count plots, scatter plots, and box plots to uncover patterns.βοΈ Outlier detection via IQR method is excellent.
βοΈ Correlation heatmap is included.
# Suggestions:
* For categorical variables, consider using groupby() with .mean() or .value_counts(normalize=True) to understand their relation with the target variable (loan_status).
* Use pairplot or violin plots for deeper distribution insights (optional).
# π§ 3. Model Training
βοΈ Used a variety of classification models: Logistic Regression, Decision Tree, Random Forest, SVM, and Gradient Boosting.βοΈ Built a reusable pipeline with preprocessing and modeling.
βοΈ Evaluated using Accuracy, Precision, Recall, and F1-Score.
βοΈ Created a visual comparison across models (bar plots).
βοΈ Plotted confusion matrices for all models.
# Suggestions:
* Consider using StratifiedKFold cross-validation with cross_val_score for robust model evaluation instead of a single train-test split.
* You might also try XGBoost or LightGBM for better performance in real-world cases.
# π 4. Hyperparameter Tuning
βοΈ Set up param_grids for multiple models using GridSearchCV.# Suggestions:
* You might want to isolate and run the GridSearch for the best-performing model (e.g., RandomForest or GradientBoosting).
* Save the best model using joblib or pickle for deployment or reuse.
# π Project Summary Markdown Cell at the top with:
* Problem Statement
* Business Objective
* Dataset Overview
* Key Steps (EDA, Feature Engg, Modeling)
* Final Conclusion (which model performed best and why)
# π Conclusion/Insights Section:
* Final choice of model
* Business interpretation (e.g., which factors most affect loan approval?)
# Team members:
* Komal Yadav
* Girish Kumar
* Komal Gupta