{"id":25235693,"url":"https://github.com/kmock930/drug-consumption-machine-learning-analysis","last_synced_at":"2025-07-29T10:08:53.531Z","repository":{"id":267305378,"uuid":"872754831","full_name":"kmock930/Drug-Consumption-Machine-Learning-analysis","owner":"kmock930","description":"This project contains codes and paperwork based on the course CSI5155 at University of Ottawa (delivered by Professor Dr. Herna Viktor).","archived":false,"fork":false,"pushed_at":"2024-12-09T15:02:54.000Z","size":109624,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-05T18:12:53.202Z","etag":null,"topics":["area-under-curve","bagging","boosting","decision-tree","ensemble-model","gradient-boosting","knn","machine-learning","ml-evaluation","ml-pipeline","mlp","random-forest","receiver-operating-characteristic","semi-supervised-learning","shap-analysis","supervised-learning","svm","unsupervised-learning","xai"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kmock930.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-15T02:41:54.000Z","updated_at":"2025-02-27T03:21:02.000Z","dependencies_parsed_at":"2024-12-09T16:36:52.612Z","dependency_job_id":null,"html_url":"https://github.com/kmock930/Drug-Consumption-Machine-Learning-analysis","commit_stats":null,"previous_names":["kmock930/drug-consumption-machine-learning-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kmock930/Drug-Consumption-Machine-Learning-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmock930%2FDrug-Consumption-Machine-Learning-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmock930%2FDrug-Consumption-Machine-Learning-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmock930%2FDrug-Consumption-Machine-Learning-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmock930%2FDrug-Consumption-Machine-Learning-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kmock930","download_url":"https://codeload.github.com/kmock930/Drug-Consumption-Machine-Learning-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmock930%2FDrug-Consumption-Machine-Learning-analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267668687,"owners_count":24124966,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["area-under-curve","bagging","boosting","decision-tree","ensemble-model","gradient-boosting","knn","machine-learning","ml-evaluation","ml-pipeline","mlp","random-forest","receiver-operating-characteristic","semi-supervised-learning","shap-analysis","supervised-learning","svm","unsupervised-learning","xai"],"created_at":"2025-02-11T14:58:51.372Z","updated_at":"2025-07-29T10:08:53.480Z","avatar_url":"https://github.com/kmock930.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Problem Statement\n## Aims\n1. Convert the multi-class problems into binary classification tasks.\n2. Predict whether a person is a consumer of chocolate and magic mushroom.\n3. Choose the best and worst classifiers for each dataset. \n4. Explain AI models in a scientific manner which should be convincable to non-technical people.\n5. Implement models with Semi-Supervised Learning.\n## Preview\n**Comparing a Pipeline of 6 classifiers on 2 datasets**\n![alt text](AUC_diff_choc.png)\n![alt text](AUC_diff_mush.png)\n**Explainable AI**\n![alt text](XAI.png)\n**Semi-Supervised Learning**\n![alt text](semi-supervised-learning-comparison.png)\n# Dataset: Drug Consumption Analysis Dataset\nThe dataset can be found at this link: https://archive.ics.uci.edu/dataset/373/drug+consumption+quantified.\n## Description of the Dataset\n- Contains a row identifier, 12 features describing the user data, and 18 classification problems related to using 18 different drugs.\n- For each drug, it indicates whether a person has 'never used', 'used over a decade ago', 'used in the last decade', 'used in the last year', 'used in the last month', 'used in the last week', or 'used in the last day'.\n# Implementation Details\n- Split the Dataset and Perform Feature Engineering.\n- Perform Supervised Learning using a pipeline of 6 classifiers.\n- Identify potential issues in the dataset / the classifier itself.\n- Provide results from Evaluation with some useful plots and metrics.\n- Summarize the analysis in a report.\n- Explain whether certain classifiers make trustable predictions, with the calculation of SHAP values and some visualization plots.\n- Prepared labelled and unlabelled data, Implemented and Compared different semi-supervised learning algorithms based on the gradient boosting classifier from assignment 1.\n# Project Structure\n- You should expect some reports in `.pdf` format at the root level.\n- The report for the project is inside the project folder, with name: \"Project - Semi Supervised Learning/KeycodeExplaination.pdf\"\n- Please expand the folder at the root level to view codes.\n- This project branches out the analysis into 9 notebooks:\n  1. Modelling - please check the file `CSI5155 Assignment 1 Modelling Part- Kelvin Mock 300453668.ipynb`\n  2. Evaluation - please check the file `CSI5155 Assignment 1 Evaluation Part - Kelvin Mock 300453668.ipynb`\n  3. Calculation of SHAP Values - please check the file `CSI5155 Assignment 2 - Kelvin Mock 300453668.ipynb`\n  4. Visualizing the SHAP Values - please check the file `CSI5155 Assignment 2 Plots - Kelvin Mock 300453668.ipynb`\n  5. Baseline Model (Gradient Boosting classifier) - `CSI5155 Project - baseline.ipynb`\n  6. Self Training method applied on baseline model - `CSI5155 Project - Self Training.ipynb`\n  7. Co-Training method applied on baseline model - `CSI5155 Project - Co Training.ipynb`\n  8. Semi-Boost method applied on baseline model - `CSI5155 Project - Semi Boost.ipynb`\n  9. Label Spreading method applied on baseline model - `CSI5155 Project - Label Spreading.ipynb`\n- Models are data dumped into several `.pkl` files from time-to-time in different phases to maintain the code's maintainability.\n- The training sets and test sets are also data dumped into several `.pkl` files.\n- `choc` directory shows data dumped files related to the Chocolate dataset (which is split from the original dataset).\n- `mushrooms` directory shows data dumped files related to the Mushrooms dataset (which is also split from the original dataset).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkmock930%2Fdrug-consumption-machine-learning-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkmock930%2Fdrug-consumption-machine-learning-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkmock930%2Fdrug-consumption-machine-learning-analysis/lists"}