Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/danielstankw/car_classification_and_regression

Regression & Classification model using a car data. Involves EDA, feature engineering, pipelines and more.
https://github.com/danielstankw/car_classification_and_regression

classification learning machine regression scikit-learn tensorflow trim

Last synced: about 5 hours ago
JSON representation

Regression & Classification model using a car data. Involves EDA, feature engineering, pipelines and more.

Awesome Lists containing this project

README

        

# Problem Statement
The goal is to build one or more models, capable of predicting vehicle trim and dealer listing price given the other twenty-six variables provided.

# Models in the dataset
The dataset set consists of two models of a car:
- Caddilac XT5 (*left*)
- Jeep Cherokee (*right*)


Description of the Image
Description of the Image

## Data Structure
The data consists of `Test_Dataset.csv` and `Training_DataSet.csv`.

The structure of the `Test_Dataset.csv` is as follows:

| Parameter | Type | Description |
|----------------------|---------|------------------------------------------------------------------------------------------------------------------|
| ListingID | int64 | Unique key that identifies each listing |
| SellerCity | object | Seller city |
| SellerIsPriv | bool | Boolean that indicates if the listing if from a private seller |
| SellerListSrc | object | Seller listing source identifier |
| SellerName | object | Seller name |
| SellerRating | float64 | Seller rating (continuous over [0,5] with 5 being a favorable rating) |
| SellerRevCnt | int64 | Seller review count |
| SellerState | object | Seller state |
| SellerZip | float64 | Seller zip code |
| VehCertified | bool | Boolean that indicates if the listing has a manufacturer certification (generally indicates extended warranty) |
| VehColorExt | object | Vehicle exterior color |
| VehColorInt | object | Vehicle interior color |
| VehDriveTrain | object | Vehicle drivetrain (rear/front/all wheel drive) |
| VehEngine | object | Vehicle engine (generally includes displacement size, whether it is turbocharged, sometimes includes fuel type) |
| VehFeats | object | Vehicle features as listed by the seller in a semi-structured list format |
| VehFuel | object | Vehicle fuel type |
| VehHistory | object | Vehicle ownership history in a semi-structured format |
| VehListdays | float64 | Duration (in days) the vehicle listing has been active |
| VehMake | object | Vehicle make (manufacturer) |
| VehMileage | float64 | Vehicle mileage |
| VehModel | object | Vehicle model |
| VehPriceLabel | object | A classification label applied by the listing site |
| VehSellerNotes | object | Unstructured text the seller has entered |
| VehSellerStockNum | object | Vehicle seller stock number |
| VehTransmission | object | Vehicle transmission type |
| VehYear | int64 | Vehicle model year (not necessarily the year it was manufactured) |
| Vehicle_Trim | object | Vehicle trim |
| Dealer_Listing_Price | float64 | Vehicle listing price, dependent variable to be predicted. |

# Vehicle Trims
Despite having only two models of a car, each of the models comes in variety of trims. In order to facilitate learning, classes with little representation in the dataset were dropped or merged with similar classes.
Decission on which trim can be merged with which trim, was made based on the literature survey.


Description of the Image
Description of the Image

# Price Distribution


Description of the Image

# Training
Before training the data was heavily prepreocessed using pipeline and appropiate transforms:
- Missing values were imputed
- New features were created
- Categorical features were one-hot-encoded


Description of the Image

The baseline model without hyperparameter tunning resulted in:


Description of the Image


Description of the Image

# Hyperparameter Tunning