https://github.com/r-mahesh45/svm-classification-models-for-salary-data-and-forest-fire-size
This project uses SVM to classify salary categories and forest fire sizes. GridSearchCV is applied for hyperparameter tuning, achieving high accuracy on both datasets.
https://github.com/r-mahesh45/svm-classification-models-for-salary-data-and-forest-fire-size
classification extract-transform-load machine-learning-algorithms python3 svm
Last synced: about 1 month ago
JSON representation
This project uses SVM to classify salary categories and forest fire sizes. GridSearchCV is applied for hyperparameter tuning, achieving high accuracy on both datasets.
- Host: GitHub
- URL: https://github.com/r-mahesh45/svm-classification-models-for-salary-data-and-forest-fire-size
- Owner: R-Mahesh45
- Created: 2024-03-07T10:43:34.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-04T13:14:10.000Z (9 months ago)
- Last Synced: 2025-01-30T07:16:11.612Z (8 months ago)
- Topics: classification, extract-transform-load, machine-learning-algorithms, python3, svm
- Language: Jupyter Notebook
- Homepage:
- Size: 2.32 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **SVM Classification Models for Salary Data and Forest Fire Size**
This project demonstrates the application of Support Vector Machines (SVM) for two distinct classification problems:
1. Predicting salary categories (`<=50K` or `>50K`) based on demographic and work-related features.
2. Classifying forest fire size (`Small` or `Large`) based on meteorological and environmental factors.## **Table of Contents**
1. [Overview](#overview)
2. [Datasets](#datasets)
3. [Modeling Approach](#modeling-approach)
4. [Results](#results)
5. [Prerequisites](#prerequisites)
6. [Installation](#installation)
7. [Usage](#usage)
8. [Project Structure](#project-structure)
9. [License](#license)## **Overview**
This project focuses on using SVM, a powerful supervised learning algorithm, to solve two classification problems. GridSearchCV and RandomizedSearchCV are utilized to optimize hyperparameters for improved model performance.- **Salary Prediction Model**: Uses demographic and work-related features to classify individuals into salary categories.
- **Forest Fire Size Classification Model**: Predicts the size of burned forest areas based on meteorological conditions.## **Datasets**
### Salary Dataset
- Features:
- **age**: Age of the individual.
- **workclass**: Type of work classification.
- **education**: Educational level of the individual.
- **maritalstatus**: Marital status of the individual.
- Other features: `occupation`, `relationship`, `race`, `sex`, `capitalgain`, `capitalloss`, `hoursperweek`, `native`.
- Target: **Salary** (`<=50K` or `>50K`).### Forest Fire Dataset
- Features:
- **FFMC**, **DMC**, **DC**, **ISI**: Fire weather indices.
- **temp**: Temperature in Celsius.
- **RH**: Relative humidity (%).
- Other features: `wind`, `rain`, `month`, `day`.
- Target: **Size_Categorie** (`Small` or `Large`).## **Modeling Approach**
1. **Data Preprocessing**:
- Standardization using `StandardScaler`.
- Train-test split using `train_test_split`.2. **SVM Classifier**:
- Kernel Options: Linear, Polynomial, Radial Basis Function (RBF).
- Hyperparameter Optimization:
- **GridSearchCV** for exhaustive search.
- **RandomizedSearchCV** for faster exploration.3. **Evaluation Metrics**:
- **Accuracy Score**.
- **Confusion Matrix**.
- **Classification Report**.## **Results**
- **Salary Prediction Model**:
- Accuracy: **82.7%**.
- Confusion Matrix:
```
[[9726, 78],
[2155, 947]]
```- **Forest Fire Size Classification Model**:
- Accuracy: **91.67%**.
- Confusion Matrix:
```
[[ 25, 11],
[ 2, 118]]
```## **Prerequisites**
- Python 3.7 or above.
- Libraries:
- `numpy`
- `pandas`
- `scikit-learn`## **Installation**
1. Clone the repository:
```bash
git clone https://github.com/R-Mahesh45/svm-classification.git
cd svm-classification
```
2. Install the required libraries:
```bash
pip install -r requirements.txt
```## **Usage**
1. Load the dataset.
2. Run the Jupyter notebooks or Python scripts for Salary Prediction or Forest Fire Size Classification.
3. Evaluate model performance using accuracy and confusion matrices.## **Project Structure**
```plaintext
├── data/
│ ├── salary_data.csv
│ ├── forest_fire_data.csv
├── notebooks/
│ ├── salary_classification.ipynb
│ ├── forest_fire_classification.ipynb
├── scripts/
│ ├── salary_svm.py
│ ├── forest_fire_svm.py
├── requirements.txt
└── README.md
```