{"id":49156402,"url":"https://github.com/jinia2801/mrd-detection-flow","last_synced_at":"2026-04-22T09:06:57.405Z","repository":{"id":326748922,"uuid":"1103620787","full_name":"Jinia2801/mrd-detection-flow","owner":"Jinia2801","description":"Unsupervised MRD detection in flow cytometry data using Variational AutoEncoder (VAE) and Gaussian Mixture Model (GMM).","archived":false,"fork":false,"pushed_at":"2026-04-07T04:18:42.000Z","size":3835,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-07T06:24:25.595Z","etag":null,"topics":["cancer-detection","gmm","mrd","pytorch","sklearn","vae"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jinia2801.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-25T05:39:45.000Z","updated_at":"2026-04-07T04:18:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Jinia2801/mrd-detection-flow","commit_stats":null,"previous_names":["jinia2801/mrd-detection-flow"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Jinia2801/mrd-detection-flow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jinia2801%2Fmrd-detection-flow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jinia2801%2Fmrd-detection-flow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jinia2801%2Fmrd-detection-flow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jinia2801%2Fmrd-detection-flow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jinia2801","download_url":"https://codeload.github.com/Jinia2801/mrd-detection-flow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jinia2801%2Fmrd-detection-flow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32128724,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-22T08:34:57.708Z","status":"ssl_error","status_checked_at":"2026-04-22T08:34:55.583Z","response_time":58,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cancer-detection","gmm","mrd","pytorch","sklearn","vae"],"created_at":"2026-04-22T09:06:56.790Z","updated_at":"2026-04-22T09:06:57.393Z","avatar_url":"https://github.com/Jinia2801.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MRD Detection in Flow Cytometry using VAE and GMM\n\nThis repository presents two complementary approaches for detecting **Minimal Residual Disease (MRD)** in flow cytometry data:\n\n- [`VAE/`](./VAE/): Deep learning-based anomaly detection using **Variational Autoencoders**\n- [`GMM/`](./GMM/): Probabilistic modeling with **Gaussian Mixture Models**\n\n---\n\n## Dataset Overview\n\nWe work with flow cytometry data collected from **12 patients**:\n\n- **Healthy Patients (P1–P6)**:\n  - ~27 million cells\n  - Used for model training\n- **Unhealthy Patients (P7–P12)**:\n  - ~20 million cells\n  - Used for evaluation and MRD prediction\n\nEach cell is represented by **14 features**. The models are trained to learn the healthy distribution and detect anomalous cells in new patient data, which may indicate MRD.\n\n---\n\n## Objective\n\nTo accurately identify cells that are anomalous (i.e., likely cancerous) using only **unsupervised learning** methods trained on healthy patient data. These anomalies collectively form an estimate of MRD (%).\n\n---\n\n## Methods Used\n\n### 1. Variational Autoencoder (VAE)\n\n- Learns latent representations via probabilistic encoding/decoding\n- Detects anomalies based on **reconstruction error (MSE)**\n- Explored different latent dimensions (2 and 4) and β values\n- Uses **Leave-One-Patient-Out (LOPO)** validation with **progressive fine-tuning**\n- Produces per-cell MSE scores → used to estimate MRD\n\n📎 [Explore VAE Approach](./VAE/README.md)  \n📎 [VAE Best Model](./VAE/vae_4dim_part2.ipynb)\n\n---\n\n### 2. Gaussian Mixture Model (GMM)\n\n- Trains a mixture of Gaussians on healthy cell data\n- Evaluates likelihood of each new cell under the model\n- Low-likelihood cells are flagged as anomalies\n- Tried multiple component counts (4, 6, 16)\n- Compared `full` vs `tied` covariance structures\n- Final threshold: **1.5th percentile of healthy scores**\n\n📎 [Explore GMM Approach](./GMM/README.md)  \n📎 [GMM Best Model](./GMM/GMM_s_complete_tied_4.ipynb)\n\n---\n\n## Evaluation Metrics\n\n- **Mean Squared Error (MSE)** between predicted and actual MRD %\n- **Mean Absolute Error (MAE)**\n- **MRD Estimation** for each patient based on anomaly scores\n\nBoth models approximate expert-annotated MRD scores with high accuracy.\n\n---\n\n## Main libraries:\n\n - `pytorch`\n - `scikit-learn`\n - `numpy, pandas`\n - `matplotlib, seaborn`\n - `joblib`\n\n## References\n\n- [PyTorch VAE Tutorial](https://pytorch.org/tutorials/beginner/vae.html)\n- [Uncovering Anomalies with Variational Autoencoders – Towards Data Science](https://towardsdatascience.com/uncovering-anomalies-with-variational-autoencoders-vae-a-deep-dive-into-the-world-of-1b2bce47e2e9/)\n- [Hands-On Anomaly Detection with Variational Autoencoders – Medium](https://medium.com/data-science/hands-on-anomaly-detection-with-variational-autoencoders-d4044672acd5)\n- scikit-learn GMM: [https://scikit-learn.org/stable/modules/mixture.html](https://scikit-learn.org/stable/modules/mixture.html)\n- [Understanding Gaussian Mixture Models – Number Analytics Blog](https://www.numberanalytics.com/blog/understanding-gaussian-mixture-models-data-analysis)\n- [PMC Article on GMM \u0026 MRD](https://pmc.ncbi.nlm.nih.gov/articles/PMC11659572/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinia2801%2Fmrd-detection-flow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjinia2801%2Fmrd-detection-flow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinia2801%2Fmrd-detection-flow/lists"}