Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/praadnya/car_resale_prediction
Car Price Analysis and Prediction using XGBoost involves preprocessing, exploratory data analysis, and model building to predict car prices based on various features, with hyperparameter tuning and model evaluation using metrics like MAE, MSE, RMSE, and R-squared.
https://github.com/praadnya/car_resale_prediction
evaluation-metrics visualization xgboost
Last synced: 19 days ago
JSON representation
Car Price Analysis and Prediction using XGBoost involves preprocessing, exploratory data analysis, and model building to predict car prices based on various features, with hyperparameter tuning and model evaluation using metrics like MAE, MSE, RMSE, and R-squared.
- Host: GitHub
- URL: https://github.com/praadnya/car_resale_prediction
- Owner: Praadnya
- Created: 2024-08-20T05:04:30.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-20T05:23:33.000Z (5 months ago)
- Last Synced: 2024-11-10T09:09:30.451Z (3 months ago)
- Topics: evaluation-metrics, visualization, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 384 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Car Price Analysis and Prediction Using XGBoost
## Overview
This project aims to analyze and predict car prices using various machine learning techniques. The dataset contains information about cars, including their maker, model year, distance covered, city, fuel type, and more. The goal is to train a model that can accurately predict the price of a car based on these features.
## Table of Contents
- [Project Description](#project-description)
- [Dataset](#dataset)
- [Data Preprocessing](#data-preprocessing)
- [Exploratory Data Analysis](#exploratory-data-analysis)
- [Model Building](#model-building)
- [Hyperparameter Tuning](#hyperparameter-tuning)
- [Model Evaluation](#model-evaluation)
- [Visualization](#visualization)
- [Requirements](#requirements)## Project Description
The project involves the following steps:
1. **Data Preprocessing:** Clean the dataset by handling missing values, converting categorical variables to numerical values, and formatting the data.
2. **Exploratory Data Analysis (EDA):** Perform various visualizations to understand the data distribution and relationships between different variables.
3. **Model Building:** Use the XGBoost Regressor to predict car prices. The model is trained on 80% of the dataset, with 20% reserved for testing.
4. **Hyperparameter Tuning:** Utilize GridSearchCV to find the best hyperparameters for the XGBoost model.
5. **Model Evaluation:** Assess the performance of the model using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²).## Dataset
The dataset `Cars.csv` contains the following columns:
- `maker`: The car manufacturer.
- `model_name`: The model of the car.
- `model_year`: The year the car model was released.
- `distance_covered (km)`: The total distance the car has covered.
- `city`: The city where the car is being sold.
- `fuel_type`: The type of fuel the car uses (e.g., Petrol, Diesel).
- `pre_owner`: The number of previous owners.
- `price (₹)`: The price of the car in Indian Rupees.## Data Preprocessing
Data preprocessing steps include:
1. Handling missing values by checking for null values.
2. Converting the `model_year` column to datetime format.
3. Cleaning the `price (₹)` column by removing the currency symbol and converting the values to float.
4. Label encoding the categorical variables for model training.## Exploratory Data Analysis
The following analyses and visualizations were performed:
- **Count of Cars by Model Year:** A bar chart showing the number of cars for each model year.
- **Count of Cars by Maker:** A bar chart displaying the number of cars from each manufacturer.
- **Percentage Share of Car Makers:** A pie chart representing the market share of different car makers.
- **Top 10 Car Models by Count:** A bar chart showing the most common car models in the dataset.
- **Distribution of Distance Covered:** A histogram displaying the distribution of the distance covered by cars.
- **Box Plot of Distance Covered:** A box plot showing the spread of distances covered by cars.
- **Count of Cars by Fuel Type:** A bar chart displaying the count of cars by fuel type.
- **Count of Cars by Number of Previous Owners:** A bar chart showing the number of cars by the number of previous owners.
- **Distribution of Car Prices:** A histogram showing the distribution of car prices.## Model Building
The XGBoost Regressor was used to predict the car prices. The model was trained on 80% of the data, and the remaining 20% was used for testing.
## Hyperparameter Tuning
GridSearchCV was employed to fine-tune the hyperparameters of the XGBoost model. The parameters tuned include:
- `n_estimators`
- `learning_rate`
- `max_depth`
- `subsample`
- `colsample_bytree`## Model Evaluation
The performance of the model was evaluated using the following metrics:
- **Mean Absolute Error (MAE):** Measures the average magnitude of the errors in a set of predictions.
- **Mean Squared Error (MSE):** Measures the average of the squares of the errors.
- **Root Mean Squared Error (RMSE):** The square root of the average of squared differences between prediction and actual observation.
- **R-squared (R²):** Indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.## Visualization
The project also includes a visualization that compares the actual vs. predicted car prices using a line plot.
## Requirements
The following Python libraries are required to run the project:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- xgboost