An open API service indexing awesome lists of open source software.

https://github.com/lovnishverma/loanprediction

A Flask-based web application that predicts loan approval decisions using machine learning. The application uses a Random Forest Classifier to determine loan approval probability based on applicant demographics, employment status, credit history, and property location.
https://github.com/lovnishverma/loanprediction

machinelearning-python

Last synced: 10 months ago
JSON representation

A Flask-based web application that predicts loan approval decisions using machine learning. The application uses a Random Forest Classifier to determine loan approval probability based on applicant demographics, employment status, credit history, and property location.

Awesome Lists containing this project

README

          

# Loan Approval Prediction Web Application

A Flask-based web application that predicts loan approval decisions using machine learning. The application uses a Random Forest Classifier to determine loan approval probability based on applicant demographics, employment status, credit history, and property location.

## Features

- **Web Interface**: Simple form for inputting loan application details
- **Machine Learning**: Random Forest Classifier for binary loan approval prediction
- **Real-time Decisions**: Instant loan approval/rejection predictions
- **Financial Focus**: Designed for banking and financial institutions
- **Categorical Analysis**: Handles encoded categorical variables efficiently

## Technologies Used

- **Backend**: Python, Flask
- **Machine Learning**: scikit-learn (Random Forest Classifier)
- **Data Processing**: pandas
- **Frontend**: HTML templates (Jinja2)
- **Dataset**: Loan approval historical data (CSV format)

## Project Structure

```
loan-predictor/

├── app.py # Main Flask application
├── loan.csv # Historical loan dataset
├── templates/
│ └── loan.html # Main page with form and results
├── static/ # CSS, JS, images (if any)
└── README.md # Project documentation
```

## Installation & Setup

### Prerequisites

- Python 3.7 or higher
- pip (Python package manager)

### Step 1: Clone the Repository

```bash
git clone https://github.com/lovnishverma/loanprediction.git
cd loanprediction
```

### Step 2: Install Dependencies

```bash
pip install flask pandas scikit-learn
```

### Step 3: Prepare the Dataset

Ensure `loan.csv` is in the root directory with the following columns:
- `gender`: Gender (encoded as 0/1)
- `married`: Marital status (encoded as 0/1)
- `education`: Education level (encoded as 0/1)
- `self_employed`: Employment type (encoded as 0/1)
- `credit_history`: Credit history (encoded as 0/1)
- `property_area`: Property location (encoded as 0/1/2)
- `loan_status`: Target variable - Loan approval (0=Rejected, 1=Approved)

### Step 4: Run the Application

```bash
python app.py
```

The application will start on `http://localhost:5000`

## Usage

1. **Access Application**: Navigate to `http://localhost:5000`
2. **Fill Loan Application**: Complete the form with applicant details
3. **Submit**: Click predict to get instant loan decision
4. **View Result**: See approval (1) or rejection (0) prediction

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Main page with loan application form |
| `/predict` | POST | Process loan application and return decision |

## Model Details

- **Algorithm**: Random Forest Classifier
- **Features**: Gender, Marital Status, Education, Employment Type, Credit History, Property Area
- **Target**: Loan Status (Binary Classification: 0=Rejected, 1=Approved)
- **Training**: Model retrains on entire dataset for each prediction

## Input Parameters & Encoding

### Required Fields

| Parameter | Type | Encoding | Description |
|-----------|------|----------|-------------|
| **Gender** | Integer | 0=Female, 1=Male | Applicant's gender |
| **Married** | Integer | 0=No, 1=Yes | Marital status |
| **Education** | Integer | 0=Not Graduate, 1=Graduate | Education qualification |
| **Self Employed** | Integer | 0=No, 1=Yes | Employment type |
| **Credit History** | Integer | 0=Poor/No History, 1=Good | Credit track record |
| **Property Area** | Integer | 0=Rural, 1=Semiurban, 2=Urban | Property location type |

### Form Input Examples

**High Approval Probability Profile:**
- Gender: 1 (Male), Married: 1 (Yes), Education: 1 (Graduate)
- Self Employed: 0 (No), Credit History: 1 (Good), Property Area: 2 (Urban)

**Medium Approval Probability Profile:**
- Gender: 0 (Female), Married: 1 (Yes), Education: 1 (Graduate)
- Self Employed: 1 (Yes), Credit History: 1 (Good), Property Area: 1 (Semiurban)

**Lower Approval Probability Profile:**
- Gender: 0 (Female), Married: 0 (No), Education: 0 (Not Graduate)
- Self Employed: 1 (Yes), Credit History: 0 (Poor), Property Area: 0 (Rural)

## Prediction Results

- **1**: Loan Approved ✅
- **0**: Loan Rejected ❌

## Key Factors Influencing Approval

Based on typical loan approval patterns:

1. **Credit History** (Most Important)
- Good credit history significantly increases approval chances
- Poor/no credit history is a major risk factor

2. **Education Level**
- Graduates typically have higher approval rates
- Indicates stable income potential

3. **Marital Status**
- Married applicants often considered more stable
- May indicate dual income households

4. **Property Area**
- Urban properties may have higher approval rates
- Better infrastructure and resale value

5. **Employment Type**
- Salaried employees often preferred over self-employed
- More predictable income streams

6. **Gender**
- Should ideally have minimal impact in fair lending
- May reflect historical biases in data

## Dataset Requirements

The `loan.csv` file should contain encoded categorical variables:

```csv
gender,married,education,self_employed,credit_history,property_area,loan_status
1,0,1,0,1,2,1
0,1,1,1,1,1,1
1,1,0,0,0,0,0
...
```

## Model Performance Considerations

### Advantages of Random Forest
- Handles categorical variables well
- Reduces overfitting compared to single decision trees
- Provides feature importance rankings
- Robust to outliers

### Potential Issues
- Model retrains on every prediction (inefficient)
- No model validation or accuracy metrics displayed
- May suffer from data imbalance if loan approvals are skewed

## Improvement Suggestions

### 1. Technical Enhancements
```python
# Add model persistence
import joblib
model = joblib.load('loan_model.pkl') # Load pre-trained model

# Add prediction probability
probability = model.predict_proba(new_data)[0]
confidence = max(probability) * 100
```

### 2. Data Quality
- Add input validation and error handling
- Include numerical features (income, loan amount, etc.)
- Handle missing values appropriately
- Address class imbalance in training data

### 3. User Experience
- Show prediction confidence/probability
- Explain key factors affecting decision
- Add form validation with helpful messages
- Include loan amount and income fields

### 4. Model Improvement
- Cross-validation for better accuracy assessment
- Feature importance visualization
- A/B testing for different algorithms
- Regular model retraining with new data

### 5. Business Features
- Integration with credit scoring APIs
- Audit trail for compliance
- Batch processing for multiple applications
- Risk assessment categories

## Compliance & Ethics

### Important Considerations
- **Fair Lending**: Ensure model doesn't discriminate based on protected characteristics
- **Regulatory Compliance**: Adhere to banking regulations (GDPR, Fair Credit Reporting Act)
- **Bias Detection**: Regular audits for algorithmic bias
- **Explainability**: Provide clear reasons for loan decisions

### Recommended Practices
- Regular bias testing across demographic groups
- Human oversight for edge cases
- Clear appeals process for rejected applications
- Documentation of model decisions for audits

## Troubleshooting

### Common Issues

1. **Dependencies Missing**:
```bash
pip install flask pandas scikit-learn
```

2. **CSV Format Issues**:
- Ensure all values are properly encoded as integers
- Check for missing values or incorrect column names

3. **Model Training Errors**:
- Verify dataset has sufficient samples
- Check for data type consistency

4. **Prediction Errors**:
- Validate all form inputs are integers
- Ensure property_area values are 0, 1, or 2

## Security Considerations

- Input validation to prevent injection attacks
- Rate limiting for API endpoints
- Secure handling of sensitive financial data
- Audit logging for all predictions

## Deployment Notes

For production deployment:
- Use environment variables for configuration
- Implement proper error handling and logging
- Add authentication and authorization
- Use HTTPS for secure data transmission
- Consider containerization with Docker

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/enhancement`)
3. Commit changes (`git commit -am 'Add feature'`)
4. Push to branch (`git push origin feature/enhancement`)
5. Open a Pull Request

## License

This project is open source and available under the [MIT License](LICENSE).

## Disclaimer

This application is for educational and demonstration purposes. For production use in financial services:
- Ensure compliance with local banking regulations
- Implement comprehensive bias testing
- Add proper security measures
- Include human oversight in decision process
- Regular model validation and updates

## Contact

For questions, support, or business inquiries, please open an issue in the repository.

---

**Note**: Loan approval decisions should never rely solely on automated systems. Always include human review and comply with applicable financial regulations.