An open API service indexing awesome lists of open source software.

https://github.com/anas436/cars-price-prediction-using-python


https://github.com/anas436/cars-price-prediction-using-python

ibm-watson jupyter-notebook linear-regression matplotlib multiple-linear-regression numpy pandas pipline polynomial-regression python3 seaborn sklearn sklearn-metrics standardscaler

Last synced: 5 months ago
JSON representation

Awesome Lists containing this project

README

          

# Cars-Price-Prediction-Using-Python

Decision Making: Determining a Good Model Fit

Now that we have visualized the different models, and generated the R-squared and MSE values for the fits, how do we determine a good model fit?


  • What is a good R-squared value?

When comparing models, the model with the higher R-squared value is a better fit for the data.


  • What is a good MSE?

When comparing models, the model with the smallest MSE value is a better fit for the data.

Let's take a look at the values for the different models.


Simple Linear Regression: Using Highway-mpg as a Predictor Variable of Price.


  • R-squared: 0.49659118843391759

  • MSE: 3.16 x10^7

Multiple Linear Regression: Using Horsepower, Curb-weight, Engine-size, and Highway-mpg as Predictor Variables of Price.


  • R-squared: 0.80896354913783497

  • MSE: 1.2 x10^7

Polynomial Fit: Using Highway-mpg as a Predictor Variable of Price.


  • R-squared: 0.6741946663906514

  • MSE: 2.05 x 10^7

Simple Linear Regression Model (SLR) vs Multiple Linear Regression Model (MLR)

Usually, the more variables you have, the better your model is at predicting, but this is not always true. Sometimes you may not have enough data, you may run into numerical problems, or many of the variables may not be useful and even act as noise. As a result, you should always check the MSE and R^2.

In order to compare the results of the MLR vs SLR models, we look at a combination of both the R-squared and MSE to make the best conclusion about the fit of the model.



  • MSE: The MSE of SLR is 3.16x10^7 while MLR has an MSE of 1.2 x10^7. The MSE of MLR is much smaller.


  • R-squared: In this case, we can also see that there is a big difference between the R-squared of the SLR and the R-squared of the MLR. The R-squared for the SLR (~0.497) is very small compared to the R-squared for the MLR (~0.809).

This R-squared in combination with the MSE show that MLR seems like the better model fit in this case compared to SLR.

Simple Linear Model (SLR) vs. Polynomial Fit



  • MSE: We can see that Polynomial Fit brought down the MSE, since this MSE is smaller than the one from the SLR.


  • R-squared: The R-squared for the Polynomial Fit is larger than the R-squared for the SLR, so the Polynomial Fit also brought up the R-squared quite a bit.


Since the Polynomial Fit resulted in a lower MSE and a higher R-squared, we can conclude that this was a better fit model than the simple linear regression for predicting "price" with "highway-mpg" as a predictor variable.

Multiple Linear Regression (MLR) vs. Polynomial Fit



  • MSE: The MSE for the MLR is smaller than the MSE for the Polynomial Fit.


  • R-squared: The R-squared for the MLR is also much larger than for the Polynomial Fit.

Conclusion

Comparing these three models, we conclude that the MLR model is the best model to be able to predict price from our dataset. This result makes sense since we have 27 variables in total and we know that more than one of those variables are potential predictors of the final car price.