Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dataquestio/loan-prediction
Predict which loans will be foreclosed on.
https://github.com/dataquestio/loan-prediction
Last synced: about 8 hours ago
JSON representation
Predict which loans will be foreclosed on.
- Host: GitHub
- URL: https://github.com/dataquestio/loan-prediction
- Owner: dataquestio
- Created: 2016-06-29T00:07:26.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-05-10T09:22:52.000Z (over 1 year ago)
- Last Synced: 2024-11-05T19:34:12.932Z (7 days ago)
- Language: Python
- Homepage:
- Size: 4.88 KB
- Stars: 216
- Watchers: 22
- Forks: 141
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Loan Prediction
-----------------------Predict whether or not loans acquired by Fannie Mae will go into foreclosure. Fannie Mae acquires loans from other lenders as a way of inducing them to lend more. Fannie Mae releases data on the loans it has acquired and their performance afterwards [here](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html).
Installation
----------------------### Download the data
* Clone this repo to your computer.
* Get into the folder using `cd loan-prediction`.
* Run `mkdir data`.
* Switch into the `data` directory using `cd data`.
* Download the data files from Fannie Mae into the `data` directory.
* You can find the data [here](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html).
* You'll need to register with Fannie Mae to download the data.
* It's recommended to download all the data from 2012 Q1 to present.
* Extract all of the `.zip` files you downloaded.
* On OSX, you can run `find ./ -name \*.zip -exec unzip {} \;`.
* At the end, you should have a bunch of text files called `Acquisition_YQX.txt`, and `Performance_YQX.txt`, where `Y` is a year, and `X` is a number from `1` to `4`.
* Remove all the zip files by running `rm *.zip`.
* Switch back into the `loan-prediction` directory using `cd ..`.### Install the requirements
* Install the requirements using `pip install -r requirements.txt`.
* Make sure you use Python 3.
* You may want to use a virtual environment for this.Usage
-----------------------* Run `mkdir processed` to create a directory for our processed datasets.
* Run `python assemble.py` to combine the `Acquisition` and `Performance` datasets.
* This will create `Acquisition.txt` and `Performance.txt` in the `processed` folder.
* Run `python annotate.py`.
* This will create training data from `Acquisition.txt` and `Performance.txt`.
* It will add a file called `train.csv` to the `processed` folder.
* Run `python predict.py`.
* This will run cross validation across the training set, and print the accuracy score.Extending this
-------------------------If you want to extend this work, here are a few places to start:
* Generate more features in `annotate.py`.
* Switch algorithms in `predict.py`.
* Add in a way to make predictions on future data.
* Try seeing if you can predict if a bank should have issued the loan.
* Remove any columns from `train` that the bank wouldn't have known at the time of issuing the loan.
* Some columns are known when Fannie Mae bought the loan, but not before
* Make predictions.
* Explore seeing if you can predict columns other than `foreclosure_status`.
* Can you predict how much the property will be worth at sale time?
* Explore the nuances between performance updates.
* Can you predict how many times the borrower will be late on payments?
* Can you map out the typical loan lifecycle?