{"id":28215700,"url":"https://github.com/ankhoa1212/cpsc-483-final-project","last_synced_at":"2026-04-29T01:03:24.043Z","repository":{"id":289814454,"uuid":"972446268","full_name":"ankhoa1212/CPSC-483-Final-Project","owner":"ankhoa1212","description":"This the final project for CPSC 483 (Introduction to Machine Learning). We developed a machine learning model for classifying medical images.","archived":false,"fork":false,"pushed_at":"2025-05-17T06:12:48.000Z","size":10725,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-10T18:49:20.485Z","etag":null,"topics":["autokeras","convolutional-neural-networks","exploratory-data-analysis","image-classification","machine-learning","mnist-dataset","python3","pytorch","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankhoa1212.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-25T05:10:23.000Z","updated_at":"2025-05-17T06:12:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"970cde82-66e2-4b96-b6aa-30c816e11106","html_url":"https://github.com/ankhoa1212/CPSC-483-Final-Project","commit_stats":null,"previous_names":["ankhoa1212/cpsc-483-final-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ankhoa1212/CPSC-483-Final-Project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankhoa1212%2FCPSC-483-Final-Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankhoa1212%2FCPSC-483-Final-Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankhoa1212%2FCPSC-483-Final-Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankhoa1212%2FCPSC-483-Final-Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankhoa1212","download_url":"https://codeload.github.com/ankhoa1212/CPSC-483-Final-Project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankhoa1212%2FCPSC-483-Final-Project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32405904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T19:38:08.556Z","status":"ssl_error","status_checked_at":"2026-04-28T19:37:55.688Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autokeras","convolutional-neural-networks","exploratory-data-analysis","image-classification","machine-learning","mnist-dataset","python3","pytorch","tensorflow"],"created_at":"2025-05-17T22:11:32.698Z","updated_at":"2026-04-29T01:03:24.027Z","avatar_url":"https://github.com/ankhoa1212.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CPSC-483-Final-Project\r\n## Problem Statement\r\nIn general, machine learning can be applied in a wide variety of fields, especially when it comes to enhancing human health in environments such as clinical settings, medical research, healthcare, etc. In these areas, it is especially important to be able to analyze large amounts of data quickly and accurately. As such, training a machine learning model to quickly garner insights from medical images can help reduce human error, streamline the process of analysis, and improve patient outcomes.\r\n\r\n## Table of Contents\r\n1. [Dataset](#medmnist-dataset)\r\n2. [Data Analysis](#data-analysis)\r\n3. [Convolutional Neural Network](#convolutional-neural-network)\r\n4. [Autokeras](#autokeras)\r\n5. [Evaluation](#evaluation)\r\n6. [Conclusion](#conclusion)\r\n7. [Resources](#resources)\r\n\r\n## MedMNIST Dataset\r\nThe [MedMNIST dataset](https://github.com/MedMNIST) was used because it is a standardized, diverse dataset that can be adapted for machine learning use cases. For this project, the BreastMNIST dataset was used due to it having the least amount of data points, which would make the training process a lot quicker.\r\n\r\n## Data Analysis\r\nTo better understand the dataset, exploratory data analysis was conducted. Data was visualized and inspected to confirm data quality.\r\n\r\n[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ankhoa1212/CPSC-483-Final-Project/blob/main/data_analysis_and_modeling.ipynb)\r\n\r\n## Convolutional Neural Network\r\nFor a proof-of-concept, a convolutional neural network was developed. This particular model was chosen since convolutional layers are typically well-suited for image classification tasks.\r\n\r\n[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ankhoa1212/CPSC-483-Final-Project/blob/main/cpsc483_final_project_cnn.ipynb)\r\n\r\n## Autokeras\r\nThe CNN showed promising results, so an automated machine learning library was used to find an ideal architecture for that can perform well on the data.\r\nScript for training and evaluating with Autokeras was adapted from [this code](https://github.com/MedMNIST/experiments/blob/main/MedMNIST2D/train_and_eval_autokeras.py).\r\n\r\n1. Install Python 3.11 ([pyenv](https://github.com/pyenv/pyenv) can be installed and used for multiple Python versions)\r\n2. Create a [Python Virtual Environment](https://docs.python.org/3/library/venv.html) (recommended)\r\n3. Install requirements (can install [TensorFlow with GPU acceleration](https://www.tensorflow.org/install/pip))\r\n```\r\npip install -r requirements.txt\r\n```\r\n4. Run the Autokeras script\r\n```\r\npython3 train_and_eval_autokeras.py --input_root medmnist\r\n```\r\n\r\n## Evaluation\r\nRegarding metrics, both implementation using CNN and Aurokeras on the model were evaluated based on two criteria:\r\n- Area under the curve (AUC): generally, the greater this value the better the model\r\n- Accuracy: measures how often the model predicts the outcome\r\n\r\n### CNN Results\r\nTrain: AUC: 0.850 Accuracy: 0.736\r\n\r\nTest: AUC: 0.771 Accuracy: 0.731\r\n\r\n### Autokeras Results\r\nTrain: AUC: 0.872 Accuracy: 0.844\r\n\r\nTest: AUC: 0.965 Accuracy: 0.926\r\n\r\nValidation: AUC: 0.953 Accuracy: 0.902\r\n\r\nA total of 3 models were executed and averaged to make sure that the results were accurate.\r\nThe architecture of the best model is as follows:\r\n\r\n![breastmnist_autokeras_model3_arch_visualization](https://github.com/user-attachments/assets/949f15f4-cf9e-465c-8c6d-ad0fb0e97101)\r\n\r\nModel architectures were visualized using [this script](https://github.com/ankhoa1212/CPSC-483-Final-Project/blob/main/visualize_model.py).\r\n\r\n## Conclusion\r\nWe developed a machine learning model with an AUC of 0.956 and an accuracy of 0.936, which exceeded the original BreastMNIST 2D [benchmark results](https://medmnist.com/) (AUC: 0.871, accuracy: 0.831).\r\n\r\n### Takeaways\r\nFor image classification problems, having convolutional layers can be effective at extracting features from images. Managing multiple libraries and dependencies in Python can be complicated, so installing them in an isolated environment can help with avoiding conflicts between dependencies. As a result, Google Colab can be useful to quickly get started on a machine learning project, but has the downside of usage limitations, especially on problems that require a lot of computation. To set up and use certain code libraries, it is extremely useful to read existing documentation.\r\n\r\n## Resources\r\n- https://www.nature.com/articles/s41597-022-01721-8\r\n- https://github.com/MedMNIST\r\n- https://autokeras.com/\r\n- https://www.tensorflow.org/install/pip\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankhoa1212%2Fcpsc-483-final-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankhoa1212%2Fcpsc-483-final-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankhoa1212%2Fcpsc-483-final-project/lists"}