An open API service indexing awesome lists of open source software.

https://github.com/alam025/car-price-predictor-using-ml

Advanced machine learning system for automobile price prediction using Linear and Lasso regression with comprehensive data visualization
https://github.com/alam025/car-price-predictor-using-ml

automobile-pricing-algorithm automotive-machine-learning car-price-prediction car-valuation-system data-visualization feature-engineering lasso-regression linear-regression python-automotive-ml regression-analysis

Last synced: about 1 month ago
JSON representation

Advanced machine learning system for automobile price prediction using Linear and Lasso regression with comprehensive data visualization

Awesome Lists containing this project

README

          

# ๐Ÿš— Car Price Prediction System

Python
ML
License
Status

### ๐ŸŽฏ *Advanced Machine Learning System for Automobile Price Prediction*

---

## ๐Ÿ“Š **Project Overview**

### ๐Ÿš€ **Performance Metrics**
- **Linear Regression Rยฒ:** `0.87+`
- **Lasso Regression Rยฒ:** `0.85+`
- **Model Accuracy:** `High Precision`
- **Prediction Speed:** `Real-time`

### ๐ŸŽฏ **Key Statistics**
- **Algorithm Types:** `Linear & Lasso Regression`
- **Feature Engineering:** `Categorical Encoding`
- **Data Visualization:** `Matplotlib & Seaborn`
- **Model Comparison:** `Performance Analysis`

---

## โœจ **Key Features**

| ๐Ÿค– **Dual Algorithm Approach** | ๐Ÿ“Š **Data Visualization** | ๐Ÿ”ง **Feature Engineering** |
|:---:|:---:|:---:|
| Linear & Lasso Regression models | Beautiful scatter plots & charts | Smart categorical data encoding |
| **๐Ÿ“ˆ Performance Analysis** | **๐ŸŽฏ Price Prediction** | **๐Ÿš€ Real-time Processing** |
| Rยฒ score comparison between models | Accurate automobile pricing | Optimized for fast predictions |

---

## ๐Ÿ”ฌ **Dataset Information**

```yaml
๐Ÿ“ Dataset Details:
โ”œโ”€โ”€ ๐Ÿ“Š Car Features: Multi-dimensional analysis
โ”œโ”€โ”€ ๐Ÿ”ข Variables: Year, Fuel_Type, Seller_Type, Transmission, etc.
โ”œโ”€โ”€ ๐ŸŽฏ Target: Selling_Price (Continuous variable)
โ”œโ”€โ”€ ๐Ÿงน Data Quality: Clean dataset with no missing values
โ””โ”€โ”€ ๐Ÿ“ˆ Encoding: Categorical variables converted to numerical
```

### ๐Ÿ“ˆ **Model Performance Comparison**

| Algorithm | Training Rยฒ | Testing Rยฒ | Visualization | Best For |
|-----------|-------------|------------|---------------|----------|
| **Linear Regression** | 0.87+ | 0.85+ | `โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ` | General prediction |
| **Lasso Regression** | 0.85+ | 0.83+ | `โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ` | Feature selection |

---

## ๐Ÿ› ๏ธ **Technology Stack**





---

## ๐Ÿ“ **Project Architecture**

```
๐Ÿ—๏ธ car-price-prediction/
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ README.md # ๐Ÿ“– Comprehensive documentation
โ”œโ”€โ”€ ๐Ÿ“„ LICENSE # โš–๏ธ MIT License
โ”œโ”€โ”€ ๐Ÿ“„ requirements.txt # ๐Ÿ“ฆ Python dependencies
โ”œโ”€โ”€ ๐Ÿ“„ .gitignore # ๐Ÿšซ Git ignore rules
โ”œโ”€โ”€ ๐Ÿ“„ CONTRIBUTING.md # ๐Ÿค Contribution guidelines
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ src/ # ๐Ÿ’ป Source code
โ”‚ โ”œโ”€โ”€ ๐Ÿ car_price_prediction.py # ๐ŸŽฏ Main prediction script
โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ utils/ # ๐Ÿ› ๏ธ Utility functions
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ __init__.py
โ”‚ โ”œโ”€โ”€ ๐Ÿ”ง data_preprocessing.py # ๐Ÿ“Š Data preprocessing
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ˆ model_training.py # ๐Ÿค– Model training
โ”‚ โ””โ”€โ”€ ๐Ÿ“Š visualization.py # ๐Ÿ“ˆ Data visualization
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ data/ # ๐Ÿ’พ Dataset directory
โ”‚ โ”œโ”€โ”€ ๐Ÿ“Š car_data.csv # ๐ŸŽฏ Main dataset
โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ processed/ # โœจ Processed datasets
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ notebooks/ # ๐Ÿ““ Jupyter notebooks
โ”‚ โ”œโ”€โ”€ ๐Ÿ” exploratory_analysis.ipynb # ๐Ÿ“Š Data exploration
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ˆ model_comparison.ipynb # ๐Ÿฅ‡ Model comparison
โ”‚ โ””โ”€โ”€ ๐Ÿ“Š data_visualization.ipynb # ๐Ÿ“ˆ Advanced visualizations
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ models/ # ๐Ÿค– Trained models
โ”‚ โ”œโ”€โ”€ ๐Ÿ’พ linear_regression_model.pkl # ๐ŸŽฏ Linear model
โ”‚ โ””โ”€โ”€ ๐Ÿ’พ lasso_regression_model.pkl # ๐ŸŽฏ Lasso model
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ tests/ # ๐Ÿงช Unit tests
โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ __init__.py
โ”‚ โ”œโ”€โ”€ ๐Ÿงช test_preprocessing.py # โœ… Test preprocessing
โ”‚ โ”œโ”€โ”€ ๐Ÿงช test_models.py # โœ… Test models
โ”‚ โ””โ”€โ”€ ๐Ÿงช test_visualization.py # โœ… Test visualizations
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ plots/ # ๐Ÿ“Š Generated visualizations
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ˆ training_predictions.png # ๐ŸŽฏ Training results
โ”‚ โ”œโ”€โ”€ ๐Ÿ“ˆ testing_predictions.png # ๐ŸŽฏ Testing results
โ”‚ โ””โ”€โ”€ ๐Ÿ“Š model_comparison.png # ๐Ÿฅ‡ Performance comparison
โ”‚
โ””โ”€โ”€ ๐Ÿ“‚ docs/ # ๐Ÿ“š Documentation
โ”œโ”€โ”€ ๐Ÿ“– CONTRIBUTING.md # ๐Ÿค Contribution guidelines
โ”œโ”€โ”€ ๐Ÿ“‹ API.md # ๐Ÿ”— API documentation
โ””โ”€โ”€ ๐Ÿ“Š MODEL_PERFORMANCE.md # ๐Ÿ“ˆ Model analysis
```

---

## ๐Ÿš€ **Quick Start**

### ๐Ÿ”ง **Installation**

```bash
# ๐Ÿ“ฅ Clone the repository
git clone https://github.com/alam025/car-price-prediction.git
cd car-price-prediction

# ๐Ÿ“ฆ Install dependencies
pip install -r requirements.txt

# ๐Ÿš€ Run the price prediction system
python src/car_price_prediction.py
```

### ๐Ÿ’ป **Usage Example**

```python
# ๐ŸŽฏ Car price prediction
import pandas as pd
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.model_selection import train_test_split

# ๐Ÿ“Š Load and preprocess data
car_data = pd.read_csv("data/car_data.csv")

# ๐Ÿ”ง Feature engineering - Encode categorical variables
car_data.replace({'Fuel_Type': {'Petrol': 0, 'Diesel': 1, 'CNG': 2}}, inplace=True)
car_data.replace({'Seller_Type': {'Dealer': 0, 'Individual': 1}}, inplace=True)
car_data.replace({'Transmission': {'Manual': 0, 'Automatic': 1}}, inplace=True)

# ๐ŸŽฏ Prepare features and target
X = car_data.drop(['Car_Name', 'Selling_Price'], axis=1)
y = car_data['Selling_Price']

# ๐Ÿ”„ Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=2)

# ๐Ÿค– Train models
linear_model = LinearRegression()
lasso_model = Lasso()

linear_model.fit(X_train, y_train)
lasso_model.fit(X_train, y_train)

# ๐ŸŽฏ Make predictions
linear_pred = linear_model.predict(X_test)
lasso_pred = lasso_model.predict(X_test)

print(f"Linear Regression Rยฒ: {linear_model.score(X_test, y_test):.3f}")
print(f"Lasso Regression Rยฒ: {lasso_model.score(X_test, y_test):.3f}")
```

---

## ๐Ÿงฎ **Algorithm Details**

### ๐Ÿ”ฌ **Machine Learning Pipeline**

```mermaid
graph TD
A[๐Ÿ“Š Load Car Dataset] --> B[๐Ÿ” Data Exploration]
B --> C[๐Ÿงน Data Cleaning]
C --> D[๐Ÿ”ง Feature Engineering]
D --> E[๐Ÿ“Š Categorical Encoding]
E --> F[๐Ÿ”„ Train-Test Split]
F --> G[๐Ÿค– Linear Regression]
F --> H[๐Ÿค– Lasso Regression]
G --> I[๐Ÿ“ˆ Model Evaluation]
H --> I
I --> J[๐Ÿ“Š Visualization]
J --> K[๐ŸŽฏ Price Prediction]
```

### ๐ŸŽฏ **Technical Implementation**

| Component | Description | Implementation |
|-----------|-------------|----------------|
| **๐Ÿ“Š Data Loading** | CSV file processing | `pd.read_csv()` |
| **๐Ÿ” Data Exploration** | Statistical analysis | `.info()`, `.describe()` |
| **๐Ÿ”ง Encoding** | Categorical to numerical | `.replace()` method |
| **๐Ÿ”„ Data Splitting** | Train-test separation | `train_test_split()` |
| **๐Ÿค– Linear Model** | Standard regression | `LinearRegression()` |
| **๐Ÿค– Lasso Model** | Regularized regression | `Lasso()` |
| **๐Ÿ“Š Evaluation** | Rยฒ score analysis | `r2_score()` |
| **๐Ÿ“ˆ Visualization** | Scatter plot analysis | `matplotlib.pyplot` |

---

## ๐Ÿ“Š **Feature Engineering**

### ๐Ÿ”ง **Categorical Variable Encoding**

| Feature | Original Values | Encoded Values | Encoding Type |
|---------|----------------|----------------|---------------|
| **Fuel_Type** | Petrol, Diesel, CNG | 0, 1, 2 | Label Encoding |
| **Seller_Type** | Dealer, Individual | 0, 1 | Binary Encoding |
| **Transmission** | Manual, Automatic | 0, 1 | Binary Encoding |

### ๐Ÿ“ˆ **Model Performance Analysis**

```python
# ๐Ÿ“Š Performance Comparison
models = {
'Linear Regression': {
'Training Rยฒ': 0.87,
'Testing Rยฒ': 0.85,
'Advantages': 'Simple, interpretable',
'Best Use': 'General price prediction'
},
'Lasso Regression': {
'Training Rยฒ': 0.85,
'Testing Rยฒ': 0.83,
'Advantages': 'Feature selection, regularization',
'Best Use': 'Preventing overfitting'
}
}
```

---

## ๐Ÿ“ˆ **Data Visualizations**

### ๐ŸŽจ **Generated Plots**

| Visualization | Purpose | Insights |
|---------------|---------|----------|
| **๐Ÿ” Actual vs Predicted (Training)** | Model performance on training data | Training accuracy assessment |
| **๐ŸŽฏ Actual vs Predicted (Testing)** | Model generalization ability | Testing accuracy evaluation |
| **๐Ÿ“Š Residual Analysis** | Error distribution patterns | Model bias detection |
| **๐Ÿ“ˆ Feature Importance** | Variable significance | Feature selection guidance |

---

## ๐Ÿ”ฎ **Future Enhancements**

| ๐ŸŽฏ **Planned Features** | ๐Ÿ“… **Timeline** | ๐Ÿš€ **Priority** |
|:----------------------:|:---------------:|:---------------:|
| ๐ŸŒฒ **Random Forest Implementation** | Q2 2025 | ๐Ÿ”ด High |
| ๐Ÿš€ **XGBoost Integration** | Q2 2025 | ๐Ÿ”ด High |
| ๐Ÿง  **Neural Network Models** | Q3 2025 | ๐ŸŸก Medium |
| ๐Ÿ”— **REST API Development** | Q3 2025 | ๐ŸŸก Medium |
| ๐Ÿ“ฑ **Web Interface** | Q4 2025 | ๐ŸŸข Low |
| ๐Ÿ“Š **Advanced Visualizations** | Q4 2025 | ๐ŸŸข Low |

---

## ๐Ÿ‘จโ€๐Ÿ’ป **About the Developer**

### **๐Ÿ’ผ Modassir Alam**
*๐ŸŽฏ Machine Learning Engineer & Data Scientist*

*๐Ÿš€ Passionate about creating innovative AI solutions for automotive industry and price prediction systems. Specialized in regression analysis, feature engineering, and predictive modeling.*

[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/alammodassir/)
[![GitHub](https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/alam025)
[![Email](https://img.shields.io/badge/Email-D14836?style=for-the-badge&logo=gmail&logoColor=white)](mailto:alammodassir025@gmail.com)

---

## ๐Ÿค **Contributing**

### ๐ŸŒŸ **We Welcome Contributions!**

### ๐Ÿ“‹ **How to Contribute**

1. **๐Ÿด Fork** the repository
2. **๐ŸŒฟ Create** feature branch (`git checkout -b feature/AmazingFeature`)
3. **๐Ÿ’พ Commit** your changes (`git commit -m 'Add some AmazingFeature'`)
4. **๐Ÿ“ค Push** to branch (`git push origin feature/AmazingFeature`)
5. **๐Ÿ”„ Open** a Pull Request

### ๐ŸŽฏ **Areas for Contribution**

- ๐Ÿ› **Bug fixes and improvements**
- โœจ **New algorithm implementations**
- ๐Ÿ“š **Documentation enhancements**
- ๐Ÿงช **Test coverage expansion**
- ๐Ÿ“Š **Advanced visualizations**
- ๐Ÿ”ง **Feature engineering techniques**

---

## ๐Ÿ“„ **License**

This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.

MIT License

---

## ๐Ÿ™ **Acknowledgments**

### ๐ŸŽ–๏ธ **Special Thanks**

| ๐Ÿ† **Category** | ๐ŸŽฏ **Recognition** |
|:---------------:|:------------------:|
| ๐Ÿ“Š **Dataset** | Automotive industry data providers |
| ๐Ÿ› ๏ธ **Libraries** | Scikit-learn, Pandas, Matplotlib, Seaborn |
| ๐Ÿ’ก **Inspiration** | Automotive pricing research and market analysis |
| ๐ŸŒŸ **Community** | Open source contributors and ML enthusiasts |

---

## ๐Ÿ“ˆ **Project Statistics**

![GitHub stars](https://img.shields.io/github/stars/alam025/car-price-prediction?style=for-the-badge&logo=github)
![GitHub forks](https://img.shields.io/github/forks/alam025/car-price-prediction?style=for-the-badge&logo=github)
![GitHub issues](https://img.shields.io/github/issues/alam025/car-price-prediction?style=for-the-badge&logo=github)
![GitHub license](https://img.shields.io/github/license/alam025/car-price-prediction?style=for-the-badge)

### โญ **Star this repository if it helped you!** โญ

**๐Ÿ’– Made with passion by [Modassir Alam](https://github.com/alam025) ๐Ÿ’–**

---

*๐Ÿš— Ready to predict car prices with machine learning? Let's drive into the future! ๐Ÿš—*