{"id":28669765,"url":"https://github.com/adeboyeml/multiclass-classification-takehome-assignment","last_synced_at":"2025-06-13T17:30:40.846Z","repository":{"id":294506476,"uuid":"251659120","full_name":"AdeboyeML/MultiClass-Classification-Takehome-Assignment","owner":"AdeboyeML","description":"This project was geared towards performing a multiclass classification task on an imbalanced  multiclass data with large amount of columns thats also needs to be reduce ","archived":false,"fork":false,"pushed_at":"2020-06-23T09:42:46.000Z","size":2297,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-20T19:11:37.651Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AdeboyeML.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-03-31T16:12:27.000Z","updated_at":"2020-06-23T09:42:48.000Z","dependencies_parsed_at":"2025-05-20T19:22:02.815Z","dependency_job_id":null,"html_url":"https://github.com/AdeboyeML/MultiClass-Classification-Takehome-Assignment","commit_stats":null,"previous_names":["adeboyeml/multiclass-classification-takehome-assignment"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AdeboyeML/MultiClass-Classification-Takehome-Assignment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdeboyeML%2FMultiClass-Classification-Takehome-Assignment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdeboyeML%2FMultiClass-Classification-Takehome-Assignment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdeboyeML%2FMultiClass-Classification-Takehome-Assignment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdeboyeML%2FMultiClass-Classification-Takehome-Assignment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AdeboyeML","download_url":"https://codeload.github.com/AdeboyeML/MultiClass-Classification-Takehome-Assignment/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdeboyeML%2FMultiClass-Classification-Takehome-Assignment/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259688032,"owners_count":22896311,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-13T17:30:24.002Z","updated_at":"2025-06-13T17:30:40.837Z","avatar_url":"https://github.com/AdeboyeML.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MultiClass-Classification-Project\n\nThis assignment was geared towards performing a multiclass classification task on an imbalanced  multiclass data with large amount of columns thats also needs to be reduce.\n\n### Python libraries such as Seaborn, Matplotlib, Pandas, Numpy and Sci-kit learn were fully utilized for this project.\n\n## SUMMARY OF BUILDING A MULTICLASS CLASSIFIER\nThe aim of this task was to perform a multiclass classification task on a dataset that comprised of both categorical and numeric columns and that is also imbalanced.\n\nThe following steps were carried out to accomplish the task:\n\n•\t**Data Exploration** – this was done to perform statistical summary on the training data and further check the class and data distribution. \n\n•\t**Data pre-processing** – this was done to prepare that training data for dimensionality reduction. Basically, here the categorical data were one-hot encoded into binary columns and thereafter, the entire independent features were standardized to have mean of zero and standard deviation of One.\n\n•\t**Dimensionality Reduction** – this was performed by Principal Component Analysis (PCA) which is basically used to reduce the dimensionality of the training dataset while preserving its original structure and relationships so that machine learning models can still learn from them and be used to make accurate predictions.\n\n•\t**Creation of a Baseline Model** – this model was used as the basis of comparison to other optimized model results.\n\n•\t**Implementation of Oversampling Method and Machine Learning Algorithm:**\n•\t**Approach 1 – Oversampling of the imbalanced dataset (Borderline – SMOTE) and Implementation of Random Forest Classifier (RDF)** – here, Borderline SMOTE was used to oversample the minority class in our data, thereafter, exhaustive grid search was done on RDF to get the best hyperparameter for this multiclass classification.\n\n•\t**Approach 2 – Implementation of Cost Sensitive RDF** – Here, exhaustive grid search was done on Weighted RDF to get the best hyperparameters for this multiclass classification task, but the outcome was not different from the baseline model.\n\n•\t**Approach 3 - Oversampling of the imbalanced dataset (Borderline – SMOTE) and Implementation of Linear Discriminant Analysis (LDA)** the combination of these two after exhaustive grid search gave better evaluation performance than the baseline on the training dataset.\n\n•\t**Prediction on Test Data** – The best model which is Borderline -SMOTE + Random Forest were used to make predictions on the test data.\n•\t**Save the Test Data in a CSV format** – The test data was save into a csv format file.\n•\t***Evaluation metrics used were recall, precision, f1-score, accuracy and confusion matrix.***\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadeboyeml%2Fmulticlass-classification-takehome-assignment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadeboyeml%2Fmulticlass-classification-takehome-assignment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadeboyeml%2Fmulticlass-classification-takehome-assignment/lists"}