{"id":25450125,"url":"https://github.com/rajnandinithopte/machine-learning_knn-classification","last_synced_at":"2025-11-02T03:30:26.062Z","repository":{"id":275547292,"uuid":"926407541","full_name":"rajnandinithopte/Machine-Learning_KNN-Classification","owner":"rajnandinithopte","description":"KNN classification project for predicting abnormalities in a vertebral column dataset using various distance metrics, EDA, hyperparameter tuning, learning curves, and weighted voting to optimize performance.","archived":false,"fork":false,"pushed_at":"2025-02-03T20:59:42.000Z","size":1285,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-03T21:35:04.134Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rajnandinithopte.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-03T07:40:37.000Z","updated_at":"2025-02-03T20:59:45.000Z","dependencies_parsed_at":"2025-02-03T21:35:06.719Z","dependency_job_id":null,"html_url":"https://github.com/rajnandinithopte/Machine-Learning_KNN-Classification","commit_stats":null,"previous_names":["rajnandinithopte/machine-learning_knn-classification-on-vertebral-column-data","rajnandinithopte/machine-learning_knn-classification"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajnandinithopte%2FMachine-Learning_KNN-Classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajnandinithopte%2FMachine-Learning_KNN-Classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajnandinithopte%2FMachine-Learning_KNN-Classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajnandinithopte%2FMachine-Learning_KNN-Classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rajnandinithopte","download_url":"https://codeload.github.com/rajnandinithopte/Machine-Learning_KNN-Classification/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239366377,"owners_count":19626685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-17T21:22:14.699Z","updated_at":"2025-11-02T03:30:26.030Z","avatar_url":"https://github.com/rajnandinithopte.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Machine Learning: KNN Classification on Vertebral Column Data\n\n## 🔷 Overview\nThis project applies **K-Nearest Neighbors (KNN) classification** to the **Vertebral Column Data Set**, a biomedical dataset that categorizes spinal conditions into **normal** and **abnormal** classes. The implementation involves **data preprocessing, exploratory analysis, model training, evaluation, and experimentation** with different distance metrics and voting methods.\n\n---\n\n## 🔷 Dataset Description\nThe **Vertebral Column Data Set** is obtained from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Vertebral+Column). It consists of **310 samples** with **6 biomechanical attributes** extracted from radiographic images of the spine.\n\n### 🔶 Features in the Dataset\n- Pelvic incidence\n- Pelvic tilt\n- Lumbar lordosis angle\n- Sacral slope\n- Pelvic radius\n- Grade of spondylolisthesis\n\nEach row represents a **patient's spinal measurements**, and the **target variable** classifies the patient as:\n- **Normal (0)**\n- **Abnormal (1)** (includes conditions like herniated discs and spondylolisthesis)\n\n---\n\n## 🔷 Libraries Used\nTo execute this project, the following Python libraries were used:\n\n- **pandas** → Data manipulation and preprocessing\n- **numpy** → Numerical computations\n- **matplotlib** → Data visualization\n- **seaborn** → Statistical data visualization\n- **scikit-learn** → Machine learning algorithms (KNN classifier, distance metrics, model evaluation)\n\n## 🔷 Steps Taken to Accomplish the Project\n\n### 🔶 1. Data Preprocessing and Exploratory Data Analysis (EDA)\n- Converted categorical class labels into binary labels (Normal=0, Abnormal=1).\n- Conducted scatterplot analysis to visualize relationships between independent variables.\n- Created boxplots to identify outliers and distribution differences across the two classes.\n- Split the dataset into training (first 70 rows of Class 0 and first 140 rows of Class 1) and test sets (remaining data).\n\n### 🔶 2. K-Nearest Neighbors (KNN) Classification\n- Implemented KNN with Euclidean distance using either a custom algorithm or scikit-learn’s implementation.\n- Evaluated test errors for different values of k within `{208, 205, ..., 7, 4, 1}`.\n- Selected the optimal k by plotting training and test errors vs. k.\n- Computed performance metrics for the optimal k:\n  - Confusion Matrix\n  - True Positive Rate (Recall)\n  - True Negative Rate\n  - Precision\n  - F1-Score\n\n### 🔶 3. Learning Curve Analysis\n- Investigated the effect of training size on KNN performance.\n- Trained the model using different training set sizes `N = {10, 20, 30, …, 210}`.\n- Selected the optimal k dynamically for each training size.\n- Plotted the learning curve (test error vs. training set size) to analyze model generalization.\n\n### 🔶 4. Experimentation with Different Distance Metrics\n- Replaced Euclidean distance with alternative distance measures:\n  - Minkowski Distance `(p = 1 → Manhattan, log₁₀(p) ∈ {0.1, 0.2, …, 1}, p → ∞ → Chebyshev)`\n  - Mahalanobis Distance (accounting for feature correlations)\n- Compared test errors across distance metrics and summarized results in a table.\n\n### 🔶 5. Weighted Voting in KNN\n- Implemented distance-weighted voting, where closer neighbors contribute more to the decision.\n- Compared performance with Euclidean, Manhattan, and Chebyshev distances.\n- Identified the best test error with `k ∈ {1, 6, 11, …, 196}`.\n\n### 🔶 6. Final Evaluation\n- Reported lowest training error achieved across all experiments.\n- Summarized findings on the best k-value, distance metric, and voting method.\n\n  ---\n## 📌 **Note**\nThis repository contains a **Jupyter Notebook** detailing each step, along with **results and visualizations**.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frajnandinithopte%2Fmachine-learning_knn-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frajnandinithopte%2Fmachine-learning_knn-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frajnandinithopte%2Fmachine-learning_knn-classification/lists"}