An open API service indexing awesome lists of open source software.

https://github.com/yuvrajsaraogi/car-price-prediction-with-machine-learning

The price of a car depends on a lot of factors like the goodwill of the brand of the car, features of the car, horsepower and the mileage it gives and many more. Car price prediction is one of the major research areas in machine learning. So, if you want to learn how to train a car price prediction model then this project is for you.
https://github.com/yuvrajsaraogi/car-price-prediction-with-machine-learning

car-price-prediction-with-machine-learning data data-science deep-learning deep-neural-networks engineer github learning machine-learning mini-project natural-language-processing prediction predictive-modeling project python3 sql

Last synced: 3 months ago
JSON representation

The price of a car depends on a lot of factors like the goodwill of the brand of the car, features of the car, horsepower and the mileage it gives and many more. Car price prediction is one of the major research areas in machine learning. So, if you want to learn how to train a car price prediction model then this project is for you.

Awesome Lists containing this project

README

        

# πŸš— Car Price Prediction with Machine Learning

![Python](https://img.shields.io/badge/Python-3.8-blue?style=for-the-badge&logo=python)
![Machine Learning](https://img.shields.io/badge/Machine%20Learning-Scikit--Learn-orange?style=for-the-badge&logo=scikitlearn)
![Jupyter Notebook](https://img.shields.io/badge/Notebook-Jupyter-informational?style=for-the-badge&logo=jupyter)

## πŸ“Œ Project Overview
This project aims to **predict the selling price of used cars** based on various features such as the car’s **age, kilometers driven, fuel type, transmission, and number of previous owners**. By using **Machine Learning models**, we can help car buyers and sellers make informed pricing decisions.

πŸš€ **Key Features:**
βœ”οΈ Data Preprocessing (Handling categorical & numerical data)
βœ”οΈ Exploratory Data Analysis (EDA)
βœ”οΈ Feature Engineering & Selection
βœ”οΈ Model Training & Evaluation

---

## πŸ“‚ Dataset Overview
The dataset contains **301 entries** with the following **9 features**:

| Feature | Description |
|---------|------------|
| `Car_Name` | Name of the car (string) |
| `Year` | Manufacturing year (integer) |
| `Selling_Price` | Price at which the car is being sold (Target variable) |
| `Present_Price` | Price of the car when it was new |
| `Driven_kms` | Kilometers driven |
| `Fuel_Type` | Type of fuel (Petrol, Diesel, CNG) |
| `Selling_type` | Seller type (Dealer or Individual) |
| `Transmission` | Manual or Automatic |
| `Owner` | Number of previous owners |

πŸ“Œ **Insights from EDA:**
βœ… Selling price is **right-skewed** (most cars are lower-priced).
βœ… **Present Price** has the highest correlation with **Selling Price**.
βœ… **Fuel Type:** Petrol cars dominate, followed by Diesel.
βœ… **Transmission Type:** Manual cars are more common than automatic.

---

## πŸ”§ Data Preprocessing
βœ”οΈ One-hot encoding for categorical features.
βœ”οΈ Feature scaling for numerical values.
βœ”οΈ Dropped irrelevant features like `Car_Name`.
βœ”οΈ Splitting dataset into **80% Training** and **20% Testing**.

```python
# Splitting data into train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

---

## πŸ€– Model Training
We experimented with different models:
βœ… **Linear Regression**
βœ… **Random Forest Regressor**
βœ… **Decision Tree**
βœ… **XGBoost**

πŸ“Š **Performance Metrics Used:**
- **RΒ² Score** (How well the model fits the data)
- **Mean Absolute Error (MAE)**

---

## πŸ“ˆ Results & Findings
| Model | RΒ² Score (Test) | MAE (Test) |
|--------|-------------|-------------|
| Linear Regression | 0.86 | 1.2 Lakhs |
| Random Forest | 0.92 | 0.9 Lakhs |
| Decision Tree | 0.88 | 1.1 Lakhs |
| XGBoost | 0.94 | 0.8 Lakhs |

πŸ“Œ **Best Model:** **XGBoost** with **94% accuracy** 🎯

---

## πŸš€ How to Run the Project
### 1️⃣ Install Dependencies
```bash
pip install pandas numpy matplotlib seaborn scikit-learn xgboost
```

### 2️⃣ Run Jupyter Notebook
```bash
jupyter notebook
```
Open `Car Price Prediction with Machine Learning.ipynb` and run all cells.

---

## πŸ“Œ Future Improvements
πŸ”Ή Improve feature selection & engineering.
πŸ”Ή Try Deep Learning models.
πŸ”Ή Build a web app using **Flask / Streamlit** for real-time predictions.

---

## πŸ’‘ Conclusion
This project successfully predicts used car prices with **high accuracy** using machine learning techniques. The **XGBoost model** provided the best results with a **94% RΒ² Score**.

---

## 🀝 Connect With Me
πŸ’» [GitHub](https://github.com/yuvrajsaraogi) | 🌐 [LinkedIn](https://linkedin.com/in/yuvraj-saraogi) | βœ‰οΈ [Email]([email protected])