https://github.com/nahom32/regularization-assignment
This repository is for a machine learning assignment on regularization.
https://github.com/nahom32/regularization-assignment
classification machine-learning regression regularization
Last synced: 3 months ago
JSON representation
This repository is for a machine learning assignment on regularization.
- Host: GitHub
- URL: https://github.com/nahom32/regularization-assignment
- Owner: Nahom32
- Created: 2025-01-01T07:58:56.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-18T05:50:20.000Z (5 months ago)
- Last Synced: 2025-01-18T06:25:45.825Z (5 months ago)
- Topics: classification, machine-learning, regression, regularization
- Language: Jupyter Notebook
- Homepage: https://decision-tree-regularizer.streamlit.app/
- Size: 1.79 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Regularization via Noise Injection in Classification
## **Introduction**
This project addresses the issue of overfitting in classification models by implementing a regularization technique called **noise injection**. Overfitting occurs when a model learns specific patterns and noise from the training data, leading to poor generalization on unseen data. Using the **Star dataset**, a Decision Tree classifier is conditioned to overfit, and regularization is applied by injecting Gaussian noise into the training data. The project evaluates the performance of the overfitted and regularized models using metrics and visualizations.
---
## **Objective**
The primary goal of this project is to:
1. Train a Decision Tree classifier to demonstrate overfitting.
2. Regularize the overfitted model using noise injection.
3. Compare the performance of the overfitted and regularized models using training and testing accuracy metrics.
4. Visualize the effects of regularization using decision boundaries, accuracy charts, and other evaluation metrics.---
## **Dataset**
The dataset used in this project is the **Star dataset**, which contains features and classifications of various types of stars. The target variable for classification is **Star Category**, and irrelevant columns such as "Star Type," "Spectral Class," and "Star Color" were excluded from the feature set.
### Dataset Structure
- **Features:** Numerical columns representing properties of stars.
- **Target:** Star Category (e.g., Brown Dwarf, Red Dwarf, White Dwarf, Main Sequence, Supergiant, Hypergiant).---
## **Methodology**
### 1. **Data Preprocessing:**
- Selected relevant features and excluded unnecessary columns.
- Split the data into training and testing sets (80% training, 20% testing).### 2. **Overfitted Model:**
- Trained a Decision Tree classifier with no constraints on depth (`max_depth=None`).
- Recorded training and testing accuracies to demonstrate overfitting.### 3. **Regularized Model:**
- Applied Gaussian noise to the training data (mean=0, standard deviation=0.1).
- Retrained the Decision Tree classifier on the noisy dataset.
- Evaluated the regularized model's performance on training and testing data.### 4. **Evaluation:**
- Compared the training and testing accuracies of the overfitted and regularized models.
- Visualized the performance using bar charts and decision boundaries.---
## **Results**
### Key Observations:
1. **Overfitted Model:**
- Achieved near-perfect accuracy on the training data.
- Performed poorly on testing data, indicating a lack of generalization.
2. **Regularized Model:**- Training accuracy decreased slightly due to noise injection.
- Testing accuracy improved significantly, showcasing better generalization.### Visualizations:
- **Bar Charts:** Highlighted the reduced gap between training and testing accuracies after regularization.
- **Decision Boundaries:** Demonstrated smoother, more generalized boundaries in the regularized model compared to the overfitted model.---
## **Usage**
### Requirements:
- Python 3.x
- Libraries:
- `pandas`
- `numpy`
- `matplotlib`
- `seaborn`
- `sklearn`
- `streamlit`### Running the Code:
1. Clone the repository.
2. Install the required libraries using:
```bash
pip install -r requirements.txt
```
3. Run the main script:
```bash
Streamlit run app.py
```
4. View the visualizations and accuracy results in the console and generated plots.---
## **Conclusion**
Noise injection is an effective regularization technique for reducing overfitting in high-capacity models like Decision Trees. By introducing small perturbations to the training data, the model is encouraged to generalize better and focus on broader trends. This project demonstrates the utility of noise injection in improving the generalization of an overfitted model while maintaining reasonable accuracy.