https://github.com/jihoonerd/1985_auto_imports_database

Prediction Model of Loss Payment Ratio of Motors, using 1985 Auto Import Database
https://github.com/jihoonerd/1985_auto_imports_database

ensemble-learning machine-learning regression stacking

Last synced: 2 months ago
JSON representation

Prediction Model of Loss Payment Ratio of Motors, using 1985 Auto Import Database

Host: GitHub
URL: https://github.com/jihoonerd/1985_auto_imports_database
Owner: jihoonerd
License: mit
Created: 2017-08-04T06:52:51.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2020-06-23T09:34:26.000Z (over 5 years ago)
Last Synced: 2025-01-15T17:32:07.992Z (about 1 year ago)
Topics: ensemble-learning, machine-learning, regression, stacking
Language: Python
Size: 2.62 MB
Stars: 1
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # 1985 Auto Imports Database

Prediction Model of Loss Payment Ratio of Motors, using 1985 Auto Imports Database

## Overview

The objective of this project is training a prediction model to infer normalized loss ratio of automobiles. This project has four stages. First, in project setup stage, it prepares the data to be ready for data processing. Second, exploratory data analysis is conducted to visualize the data. In the third stage, a prediction model is implemented.  Lastly, performance is recorded and visualized.

### Data Set Information:

>This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses in use as compared to other cars. The second rating corresponds to the degree to which the auto is more risky than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is more risky (or less), this symbol is adjusted by moving it up (or down) the scale. Actuarians call this process "symboling". A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe. The third factor is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/speciality, etc...), and represents the average loss per car per year. Note: Several of the attributes in the database could be used as a "class" attribute.

### Dataset Size:

* Number of Instances: 205

* Number of Attributes: 26 total

    * 15 continuous

    * 1 integer

    * 10 nominal

### Instruction

* Problem name: Automobile

* URL: https://archive.ics.uci.edu/ml/datasets/Automobile

* Target variable: normalized-losses

* Problem type: regression

* Data format: csv (missing header)

* Missing values: denoted by quotation marks (‘?’). Skip data samples with missing values in the

target.

* Features to ignore: ‘symboling’

## Workflow



### Stacking Layer



## Documentation

See `./documents/1985_auto_imports_database.pdf` for documentation.

## Dependencies

* Python: 3.5.3

* TensorFlow: 1.2.1

* Keras: 2.0.6

* Scikit-Learn: 0.18.2

* Numpy: 1.13.1

* Pandas: 0.20.2

* XGBoost: 0.6

## Performance

* You can change random state at: `./code/config.py`

* Performance is automatically recorded in `./log/` directory.

* Performance below can be different with the same random state if algorithm uses stochastic method.

### Performance Figure (random_state = 1)



### Single Model

#### Case1

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|12|Ridge|313.0767|17.6940|

|12|Elastic Net|357.9729|18.9202|

|12|Random Forest|875.4044|29.5872|

|12|Extra Tree|740.1835|27.2063|

|12|XGBoost|817.1096|28.5851|

|12|Neural Network|223.5412|14.9513|

#### Case2

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|18|Ridge|415.5719|20.3856|

|18|Elastic Net|426.7968|20.6591|

|18|Random Forest|844.6947|29.0636|

|18|Extra Tree|685.9701|26.1910|

|18|XGBoost|1001.2611|31.6427|

|18|Neural Network|495.6706|22.2637|

#### Case3

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|72|Ridge|263.9897|16.2478|

|72|Elastic Net|270.0922|16.4345|

|72|Random Forest|631.3527|25.1267|

|72|Extra Tree|667.5146|25.8363|

|72|XGBoost|649.5410|25.4861|

|72|Neural Network|446.9455|21.1411|

#### Case4

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|86|Ridge|383.7710|19.5901|

|86|Elastic Net|344.4253|18.5587|

|86|Random Forest|852.8695|29.2039|

|86|Extra Tree|877.0165|29.6145|

|86|XGBoost|512.2365|22.6326|

|86|Neural Network|524.5276|22.9026|

#### Case5

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|109|Ridge|290.9115|17.0561|

|109|Elastic Net|241.7871|15.5495|

|109|Random Forest|534.7114|23.1238|

|109|Extra Tree|533.6554|23.1010|

|109|XGBoost|396.3412|19.9083|

|109|Neural Network|445.1902|21.0995|

### Stacking

#### Case1

|Random State|Blender|MSE|RMSE|

|---|---|---|---|

|12|Average|553.5240|23.5271|

|12|Linear Regression|386.4739|19.6589|

|12|Neural Network|391.7991|19.7939|

#### Case2

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|18|Average|622.8775|24.9575|

|18|Linear Regression|562.9997|23.7276|

|18|Neural Network|544.7601|23.3401|

#### Case3

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|72|Average|297.4342|17.2463|

|72|Linear Regression|309.2991|17.5869|

|72|Neural Network|330.5245|18.1803|

#### Case4

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|86|Average|451.7685|21.2548|

|86|Linear Regression|428.1003|20.6906|

|86|Neural Network|419.6526|20.4854|

#### Case5

|Random State|Model|MSE|RMSE|

|---|---|---|---|

|109|Average|231.3396|15.2099|

|109|Linear Regression|215.0494|14.6646|

|109|Neural Network|317.1345|17.8083|

## Demo

You can simply execute demo file by:

```bash

$ python main.py

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jihoonerd/1985_auto_imports_database

Awesome Lists containing this project

README