{"id":25846310,"url":"https://github.com/oskar-j/medical_classifier","last_synced_at":"2026-02-09T22:33:33.821Z","repository":{"id":71666269,"uuid":"136973492","full_name":"oskar-j/medical_classifier","owner":"oskar-j","description":"Full stack machine learning - a scikit model in action","archived":false,"fork":false,"pushed_at":"2019-09-01T18:27:41.000Z","size":1837,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-01T09:30:35.777Z","etag":null,"topics":["classification","data-science","machine-learning","scikit-model"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oskar-j.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-11T19:46:20.000Z","updated_at":"2020-03-21T17:47:52.000Z","dependencies_parsed_at":"2023-03-05T20:45:37.895Z","dependency_job_id":null,"html_url":"https://github.com/oskar-j/medical_classifier","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/oskar-j/medical_classifier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fmedical_classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fmedical_classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fmedical_classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fmedical_classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oskar-j","download_url":"https://codeload.github.com/oskar-j/medical_classifier/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oskar-j%2Fmedical_classifier/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271747023,"owners_count":24813604,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","data-science","machine-learning","scikit-model"],"created_at":"2025-03-01T09:29:14.420Z","updated_at":"2026-02-09T22:33:33.814Z","avatar_url":"https://github.com/oskar-j.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Medical classifier\n\nFull stack machine learning - a scikit model in action\n\n# Introduction \n\n# Description\n\n## Data\n\n## Exploratory analysis\n\n![T-SNE on the labeled data](https://raw.githubusercontent.com/oskar-j/medical_classifier/master/static/t-sne.png)\n\nAn attempt to cluster the data using t-SNE algorithm (`n_components=3`, `perplexity=50`, `n_iter=300`).\n\n## Solution (model)\n\n### Success metric\n\nIt's better to use the Recall metric. Recall (R) is defined as the number of true positives (T_p) \nover the number of true positives plus the number of false negatives (F_n) - it makes false \nnegatives unwanted - which is good.\n\nThanks to that we'll recognize sick patients (1) even if sometimes it would cause notifying \na healthy patient (0) that he might be potentially sick.\n\n### Best model\n\n### Summary\n\nIn short:\n1. It's possible to build a model which predicts health with 0.965667 recall and 0.973480 F-1 score\n2. Best algorithm of machine learning is the Multi-layer Perceptron classifier\n3. It's possible to hyper tune it and get only 0.1% better F-1 score\n4. Model is robust, which was proved by cross validation and running on a bigger test sample\n5. Dataset needs some feature engineering though\n6. We may play around more with dimensionality reduction by using methods alternative to PCA\n7. It's not obvious which score function to choose, although a harmonic sum of precision and recall should be enough\n8. There may be a slight risk of over-fitting, but I'd need more of your data to verify this","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foskar-j%2Fmedical_classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foskar-j%2Fmedical_classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foskar-j%2Fmedical_classifier/lists"}