https://github.com/ankit21111/carpredict
This project predicts car prices using machine learning models, including Simple and Multiple Linear Regression. It covers data acquisition, feature selection, and optimization techniques like Ridge Regression. The best model, Multiple Linear Regression, achieved an R² score of 0.84. Check out the full analysis in the repository!
https://github.com/ankit21111/carpredict
data-analysis data-visualization matplotlib numpy pandas pyhton scipy seaborn sklearn
Last synced: 2 months ago
JSON representation
This project predicts car prices using machine learning models, including Simple and Multiple Linear Regression. It covers data acquisition, feature selection, and optimization techniques like Ridge Regression. The best model, Multiple Linear Regression, achieved an R² score of 0.84. Check out the full analysis in the repository!
- Host: GitHub
- URL: https://github.com/ankit21111/carpredict
- Owner: ANKIT21111
- Created: 2024-11-17T14:21:13.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-18T14:49:42.000Z (over 1 year ago)
- Last Synced: 2025-03-13T18:12:35.542Z (over 1 year ago)
- Topics: data-analysis, data-visualization, matplotlib, numpy, pandas, pyhton, scipy, seaborn, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 894 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🚗 CarPredict: Advanced Car Price Estimation
[](https://www.python.org/)
[](https://pandas.pydata.org/)
[](https://scikit-learn.org/)
[](https://jupyter.org/)
An end-to-end Machine Learning project focused on predicting automobile prices using a comprehensive dataset of car characteristics. This project demonstrates a complete data science lifecycle, from raw data acquisition to a high-performing predictive model.
---
## 🎯 Project Objective
The goal is to develop an accurate estimation model for used car prices based on features like engine size, horsepower, curb weight, and fuel efficiency. This assists both buyers and sellers in determining fair market value.
**Key Achievement:** Developed a refined model achieving an **R² score of 0.84**, significantly improving upon baseline linear regression models.
---
## 🛠️ Tech Stack & Skills
- **Data Manipulation**: `Pandas`, `NumPy`
- **Visualization**: `Matplotlib`, `Seaborn` (Regression plots, box plots, heatmaps)
- **Mathematical Modeling**: `Scipy`
- **Machine Learning**: `Scikit-Learn` (Linear Regression, Polynomial Regression, Ridge Regression, Grid Search)
- **Preprocessing**: `StandardScaler`, `LabelEncoder`, `One-Hot Encoding`
---
## 🚀 Project Workflow
### 1. Data Wrangling & Cleaning
- Handled missing values (symbolized by '?') through mean imputation and frequency substitution.
- Corrected data formats for numerical features.
- Normalized quantitative features to ensure uniform scaling.
### 2. Exploratory Data Analysis (EDA)
- Analyzed **Continuous Variables** using regression plots to visualize linear relationships with price.
- Analyzed **Categorical Variables** using box plots to evaluate their predictive power.
- Conducted **Pearson Correlation** and **ANOVA** to identify critical features.
### 3. Model Development
Iteratively built and evaluated multiple regression architectures:
- **Simple Linear Regression (SLR)**: Baseline model.
- **Multiple Linear Regression (MLR)**: Incorporated multiple influential features.
- **Polynomial Regression**: Captured non-linear trends.
- **Data Pipelines**: Streamlined scaling and transformations.
### 4. Model Evaluation & Refinement
- Utilized **Cross-Validation** to ensure model generalizability.
- Performed **Ridge Regression** to mitigate overfitting.
- Optimized hyperparameters using **Grid Search** (tuning `alpha` for Ridge).
---
## 📊 Performance Summary
| Model Type | R² Score | Mean Squared Error (MSE) |
|----------------------------|----------|--------------------------|
| Simple Linear Regression | 0.6418 | 2.25 x 10⁷ |
| Polynomial Regression | 0.6754 | 2.04 x 10⁷ |
| Multiple Linear Regression | 0.8119 | 1.20 x 10⁷ |
| **Refined Ridge Model** | **0.8400** | *Optimized* |
---
## 📁 Repository Structure
```text
├── OLD CAR PRICE DATASET ANALYSIS.ipynb # Detailed Data Wrangling & EDA
├── MODEL DEVELOPMENT AND EVALUATION.ipynb # Model Building & Hyperparameter Tuning
├── auto.csv # Raw Dataset
├── subsimple/ # Compressed project files
└── README.md # Project Documentation
```
---
## 📥 Installation & Usage
1. Clone the repository:
```bash
git clone https://github.com/ANKIT21111/CarPredict.git
```
2. Install dependencies:
```bash
pip install pandas numpy matplotlib seaborn scikit-learn scipy
```
3. Run the Jupyter Notebooks:
```bash
jupyter notebook
```
---
## 💡 Key Insights
- **Engine Size**, **Curb Weight**, and **Horsepower** emerged as the strongest predictors of car price.
- Non-linear relationships were successfully captured using Polynomial transformations, but **Multiple Linear Regression** provided the best balance of complexity and accuracy.
- Hyperparameter tuning through Ridge Regression was essential in reducing variance and achieving the final R² of 0.84.
---
## 👨💻 Author
**Ankit Abhishek**
Data Engineering Professional | Machine Learning Enthusiast
[LinkedIn](https://www.linkedin.com/in/ankitabhishekdataengineering/) | [Portfolio](https://ankitabhishek.com)
---
*If you find this project useful, feel free to ⭐ the repository!*