Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kalana99/190530h_ml_mini_project
A Mini project to critically evaluate various classifier models along with feature engineering techniques.
https://github.com/kalana99/190530h_ml_mini_project
classification-algorithm feature-engineering feature-extraction feature-selection hyperparameter-tuning kaggle-competition machine-learning pca
Last synced: about 1 month ago
JSON representation
A Mini project to critically evaluate various classifier models along with feature engineering techniques.
- Host: GitHub
- URL: https://github.com/kalana99/190530h_ml_mini_project
- Owner: Kalana99
- Created: 2023-09-23T17:12:52.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-16T11:24:08.000Z (about 1 year ago)
- Last Synced: 2023-10-17T03:23:19.270Z (about 1 year ago)
- Topics: classification-algorithm, feature-engineering, feature-extraction, feature-selection, hyperparameter-tuning, kaggle-competition, machine-learning, pca
- Language: Jupyter Notebook
- Homepage:
- Size: 36.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 190530H_ML_Mini_Project
## Overview
Dataset: AudioMNIST is the dataset used to create the features. Check this link for further
details about the dataset Link.This project has two phases:
1. Individual task - Classification model development and Kaggle competition for each assigned layer.
2. Group submission - 6-page research paperThe above notebooks provide classification models for **Layer 09** and **Layer 10**.
## Data Pre-processing and Feature Engineering Steps
- Load data sets from the Google Drive.
- Identify labels with missing values.
- Drop the entries with missing values and Scale for each label to handle outliers.
- Create dictionaries for each label using Train, Valid, and Test data sets.
- Apply feature selection using **SelectKBest** and **f_classif**.
- Conduct PCA analysis.## Training the Final Model
- **SVM**, **K-NN**, and **Random Forest** are considered possible candidates for the classifier model. Each of these models is tested with pre-processed data. Based on the results, SVM stands out as the most suitable model.
- **Hyper-parameter Tuning** is conducted using a **Random Grid Search** for an SVM instance. The resulting best estimator is used to predict the labels using the Test dataset.
- The result is written into a CSV file in the required format.*The above operation sequence is carried out for each label separately by changing the label in the **“Assigning Labels”** cell.*
```Python
# @title **Assigning Label**train_label = 'label_X'
x_train_df = x_train[train_label].copy()
y_train_df = y_train[train_label].copy()x_valid_df = x_valid[train_label].copy()
y_valid_df = y_valid[train_label].copy()x_test_df = x_test[train_label].copy()
```Replace **X** with **1, 2, 3** or **4** based on the label that needs to be predicted