https://github.com/lovnishverma/loanprediction
A Flask-based web application that predicts loan approval decisions using machine learning. The application uses a Random Forest Classifier to determine loan approval probability based on applicant demographics, employment status, credit history, and property location.
https://github.com/lovnishverma/loanprediction
machinelearning-python
Last synced: 10 months ago
JSON representation
A Flask-based web application that predicts loan approval decisions using machine learning. The application uses a Random Forest Classifier to determine loan approval probability based on applicant demographics, employment status, credit history, and property location.
- Host: GitHub
- URL: https://github.com/lovnishverma/loanprediction
- Owner: lovnishverma
- Created: 2023-10-24T20:27:21.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2025-07-11T14:29:18.000Z (12 months ago)
- Last Synced: 2025-08-30T15:59:34.314Z (10 months ago)
- Topics: machinelearning-python
- Language: HTML
- Homepage: https://loanprediction-wpd2.onrender.com/
- Size: 621 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Loan Approval Prediction Web Application
A Flask-based web application that predicts loan approval decisions using machine learning. The application uses a Random Forest Classifier to determine loan approval probability based on applicant demographics, employment status, credit history, and property location.
## Features
- **Web Interface**: Simple form for inputting loan application details
- **Machine Learning**: Random Forest Classifier for binary loan approval prediction
- **Real-time Decisions**: Instant loan approval/rejection predictions
- **Financial Focus**: Designed for banking and financial institutions
- **Categorical Analysis**: Handles encoded categorical variables efficiently
## Technologies Used
- **Backend**: Python, Flask
- **Machine Learning**: scikit-learn (Random Forest Classifier)
- **Data Processing**: pandas
- **Frontend**: HTML templates (Jinja2)
- **Dataset**: Loan approval historical data (CSV format)
## Project Structure
```
loan-predictor/
│
├── app.py # Main Flask application
├── loan.csv # Historical loan dataset
├── templates/
│ └── loan.html # Main page with form and results
├── static/ # CSS, JS, images (if any)
└── README.md # Project documentation
```
## Installation & Setup
### Prerequisites
- Python 3.7 or higher
- pip (Python package manager)
### Step 1: Clone the Repository
```bash
git clone https://github.com/lovnishverma/loanprediction.git
cd loanprediction
```
### Step 2: Install Dependencies
```bash
pip install flask pandas scikit-learn
```
### Step 3: Prepare the Dataset
Ensure `loan.csv` is in the root directory with the following columns:
- `gender`: Gender (encoded as 0/1)
- `married`: Marital status (encoded as 0/1)
- `education`: Education level (encoded as 0/1)
- `self_employed`: Employment type (encoded as 0/1)
- `credit_history`: Credit history (encoded as 0/1)
- `property_area`: Property location (encoded as 0/1/2)
- `loan_status`: Target variable - Loan approval (0=Rejected, 1=Approved)
### Step 4: Run the Application
```bash
python app.py
```
The application will start on `http://localhost:5000`
## Usage
1. **Access Application**: Navigate to `http://localhost:5000`
2. **Fill Loan Application**: Complete the form with applicant details
3. **Submit**: Click predict to get instant loan decision
4. **View Result**: See approval (1) or rejection (0) prediction
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Main page with loan application form |
| `/predict` | POST | Process loan application and return decision |
## Model Details
- **Algorithm**: Random Forest Classifier
- **Features**: Gender, Marital Status, Education, Employment Type, Credit History, Property Area
- **Target**: Loan Status (Binary Classification: 0=Rejected, 1=Approved)
- **Training**: Model retrains on entire dataset for each prediction
## Input Parameters & Encoding
### Required Fields
| Parameter | Type | Encoding | Description |
|-----------|------|----------|-------------|
| **Gender** | Integer | 0=Female, 1=Male | Applicant's gender |
| **Married** | Integer | 0=No, 1=Yes | Marital status |
| **Education** | Integer | 0=Not Graduate, 1=Graduate | Education qualification |
| **Self Employed** | Integer | 0=No, 1=Yes | Employment type |
| **Credit History** | Integer | 0=Poor/No History, 1=Good | Credit track record |
| **Property Area** | Integer | 0=Rural, 1=Semiurban, 2=Urban | Property location type |
### Form Input Examples
**High Approval Probability Profile:**
- Gender: 1 (Male), Married: 1 (Yes), Education: 1 (Graduate)
- Self Employed: 0 (No), Credit History: 1 (Good), Property Area: 2 (Urban)
**Medium Approval Probability Profile:**
- Gender: 0 (Female), Married: 1 (Yes), Education: 1 (Graduate)
- Self Employed: 1 (Yes), Credit History: 1 (Good), Property Area: 1 (Semiurban)
**Lower Approval Probability Profile:**
- Gender: 0 (Female), Married: 0 (No), Education: 0 (Not Graduate)
- Self Employed: 1 (Yes), Credit History: 0 (Poor), Property Area: 0 (Rural)
## Prediction Results
- **1**: Loan Approved ✅
- **0**: Loan Rejected ❌
## Key Factors Influencing Approval
Based on typical loan approval patterns:
1. **Credit History** (Most Important)
- Good credit history significantly increases approval chances
- Poor/no credit history is a major risk factor
2. **Education Level**
- Graduates typically have higher approval rates
- Indicates stable income potential
3. **Marital Status**
- Married applicants often considered more stable
- May indicate dual income households
4. **Property Area**
- Urban properties may have higher approval rates
- Better infrastructure and resale value
5. **Employment Type**
- Salaried employees often preferred over self-employed
- More predictable income streams
6. **Gender**
- Should ideally have minimal impact in fair lending
- May reflect historical biases in data
## Dataset Requirements
The `loan.csv` file should contain encoded categorical variables:
```csv
gender,married,education,self_employed,credit_history,property_area,loan_status
1,0,1,0,1,2,1
0,1,1,1,1,1,1
1,1,0,0,0,0,0
...
```
## Model Performance Considerations
### Advantages of Random Forest
- Handles categorical variables well
- Reduces overfitting compared to single decision trees
- Provides feature importance rankings
- Robust to outliers
### Potential Issues
- Model retrains on every prediction (inefficient)
- No model validation or accuracy metrics displayed
- May suffer from data imbalance if loan approvals are skewed
## Improvement Suggestions
### 1. Technical Enhancements
```python
# Add model persistence
import joblib
model = joblib.load('loan_model.pkl') # Load pre-trained model
# Add prediction probability
probability = model.predict_proba(new_data)[0]
confidence = max(probability) * 100
```
### 2. Data Quality
- Add input validation and error handling
- Include numerical features (income, loan amount, etc.)
- Handle missing values appropriately
- Address class imbalance in training data
### 3. User Experience
- Show prediction confidence/probability
- Explain key factors affecting decision
- Add form validation with helpful messages
- Include loan amount and income fields
### 4. Model Improvement
- Cross-validation for better accuracy assessment
- Feature importance visualization
- A/B testing for different algorithms
- Regular model retraining with new data
### 5. Business Features
- Integration with credit scoring APIs
- Audit trail for compliance
- Batch processing for multiple applications
- Risk assessment categories
## Compliance & Ethics
### Important Considerations
- **Fair Lending**: Ensure model doesn't discriminate based on protected characteristics
- **Regulatory Compliance**: Adhere to banking regulations (GDPR, Fair Credit Reporting Act)
- **Bias Detection**: Regular audits for algorithmic bias
- **Explainability**: Provide clear reasons for loan decisions
### Recommended Practices
- Regular bias testing across demographic groups
- Human oversight for edge cases
- Clear appeals process for rejected applications
- Documentation of model decisions for audits
## Troubleshooting
### Common Issues
1. **Dependencies Missing**:
```bash
pip install flask pandas scikit-learn
```
2. **CSV Format Issues**:
- Ensure all values are properly encoded as integers
- Check for missing values or incorrect column names
3. **Model Training Errors**:
- Verify dataset has sufficient samples
- Check for data type consistency
4. **Prediction Errors**:
- Validate all form inputs are integers
- Ensure property_area values are 0, 1, or 2
## Security Considerations
- Input validation to prevent injection attacks
- Rate limiting for API endpoints
- Secure handling of sensitive financial data
- Audit logging for all predictions
## Deployment Notes
For production deployment:
- Use environment variables for configuration
- Implement proper error handling and logging
- Add authentication and authorization
- Use HTTPS for secure data transmission
- Consider containerization with Docker
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/enhancement`)
3. Commit changes (`git commit -am 'Add feature'`)
4. Push to branch (`git push origin feature/enhancement`)
5. Open a Pull Request
## License
This project is open source and available under the [MIT License](LICENSE).
## Disclaimer
This application is for educational and demonstration purposes. For production use in financial services:
- Ensure compliance with local banking regulations
- Implement comprehensive bias testing
- Add proper security measures
- Include human oversight in decision process
- Regular model validation and updates
## Contact
For questions, support, or business inquiries, please open an issue in the repository.
---
**Note**: Loan approval decisions should never rely solely on automated systems. Always include human review and comply with applicable financial regulations.