{"id":26869802,"url":"https://github.com/yixin0829/multi-label-wine-quality-classification","last_synced_at":"2026-05-08T01:36:32.394Z","repository":{"id":104117110,"uuid":"295185854","full_name":"yixin0829/multi-label-wine-quality-classification","owner":"yixin0829","description":"Multi-label wine classification ML project trained using Kaggle wine quality dataset :bar_chart:","archived":false,"fork":false,"pushed_at":"2021-01-17T01:41:32.000Z","size":26119,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-31T15:28:32.763Z","etag":null,"topics":["analysis","classification-algorithm","data-science","exploratory-data-analysis","machine-learning","python","sklearn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yixin0829.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-09-13T15:58:27.000Z","updated_at":"2021-03-01T21:27:09.000Z","dependencies_parsed_at":"2023-03-14T03:45:33.300Z","dependency_job_id":null,"html_url":"https://github.com/yixin0829/multi-label-wine-quality-classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/yixin0829/multi-label-wine-quality-classification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixin0829%2Fmulti-label-wine-quality-classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixin0829%2Fmulti-label-wine-quality-classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixin0829%2Fmulti-label-wine-quality-classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixin0829%2Fmulti-label-wine-quality-classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yixin0829","download_url":"https://codeload.github.com/yixin0829/multi-label-wine-quality-classification/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yixin0829%2Fmulti-label-wine-quality-classification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32763514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-07T02:14:30.463Z","status":"ssl_error","status_checked_at":"2026-05-07T02:14:29.405Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","classification-algorithm","data-science","exploratory-data-analysis","machine-learning","python","sklearn"],"created_at":"2025-03-31T06:18:46.232Z","updated_at":"2026-05-08T01:36:32.389Z","avatar_url":"https://github.com/yixin0829.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# multi_label_wine_quality_classification:bar_chart:\n\nThis is a project where I practiced training various different multi-label wine quality classifiers with one vs. all method.\n\nThe workflow includes EDA (exploratory analysis, data visualization), data preprocessing (feature selection with chi-square test, oversampling minority classes with synthetic data, feature scaling), and trained data on different classification ML models (logistic regression, linear supported vector machine (SVM), kernel SVM, and K-NN)\n\n**Feel free to click into the .ipynb notebook for detailed analysis.**\n\n\n## EDA\n\nThe dataset is extremely skewed with minority class (i.e. wine quality) like '3' and '8' share less than 1% of the total population. We can see this by plotting a histogram on 'quality' column. \n![quality_count](https://user-images.githubusercontent.com/56566212/103471654-5d1b8200-4d48-11eb-819b-b04e8a6fd0be.png)\n\nA clearer visualization of the correlations between features by plotting out a heatmap:\n![corr_heat](https://user-images.githubusercontent.com/56566212/103471877-91dd0880-4d4b-11eb-9e69-e867528b231e.png)\n\nFurther visualize the relations between features and wine quality. Notice features like \"pH\", \"chlorides\", \"residual sugar\" almost have no impact on classifying the quality of the wine.\n![feature_bar](https://user-images.githubusercontent.com/56566212/103471883-9dc8ca80-4d4b-11eb-9361-922268523d58.png)\n\n## Preprocessing\n* Feature selection using chi-square test\n* Drop irrelevant features\n* Split dataset\n* Apply SMOTE to oversample minority classes data by generating synthetic training data using K-NN. Note we do not oversample testing data.\n* Feature scaling\n\n## Result\n\nBecause of the skewed nature of the dataset. Use F1-score as the performance metric. By applying synthetic minority oversampling technique, KNN model has a notable increase in its weighted F1-score avg from 0.52 to 0.67. The accuracy also went from 51% to 65%. The other models like logistic regression, linear SVM, and kernel SVM did not perform better as expected.\n\n### Logistic Regression\n![log](https://user-images.githubusercontent.com/56566212/103495351-25701100-4e00-11eb-823c-f9929e8286f4.png)\n\n### Linear SVM \u0026 Kernel SVM\n![svm](https://user-images.githubusercontent.com/56566212/103495369-36b91d80-4e00-11eb-8617-d77d1cca36df.png)\n\n### K-NN (Rapid Prototype)\n![knn](https://user-images.githubusercontent.com/56566212/103495373-3caefe80-4e00-11eb-9a1e-d55a8a272745.png)\n\n### K-NN (Final)\n![knn2](https://user-images.githubusercontent.com/56566212/103495431-5ea88100-4e00-11eb-8b9f-71ecb084bcf2.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyixin0829%2Fmulti-label-wine-quality-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyixin0829%2Fmulti-label-wine-quality-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyixin0829%2Fmulti-label-wine-quality-classification/lists"}