{"id":20644629,"url":"https://github.com/epistasislab/syntwin","last_synced_at":"2026-03-16T11:01:18.419Z","repository":{"id":243952533,"uuid":"697019545","full_name":"EpistasisLab/SynTwin","owner":"EpistasisLab","description":"SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients","archived":false,"fork":false,"pushed_at":"2025-05-29T21:50:23.000Z","size":1112,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-24T02:22:06.217Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EpistasisLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-09-26T22:19:57.000Z","updated_at":"2024-10-07T09:54:03.000Z","dependencies_parsed_at":"2024-06-12T07:00:05.651Z","dependency_job_id":"c4f198fe-d96d-4d42-b181-ae091e4fe707","html_url":"https://github.com/EpistasisLab/SynTwin","commit_stats":null,"previous_names":["epistasislab/syntwin"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/EpistasisLab/SynTwin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EpistasisLab%2FSynTwin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EpistasisLab%2FSynTwin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EpistasisLab%2FSynTwin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EpistasisLab%2FSynTwin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EpistasisLab","download_url":"https://codeload.github.com/EpistasisLab/SynTwin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EpistasisLab%2FSynTwin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273100031,"owners_count":25045697,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T16:17:04.467Z","updated_at":"2026-03-16T11:01:18.322Z","avatar_url":"https://github.com/EpistasisLab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SynTwin\nSynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients\n\nMethodology for generating and using digital twins for clinical outcome prediction. An approach that combines synthetic data and network science to create digital twins for precision medicine.\n\n## Contents\n- [synthetic_algorithms_comparison](https://github.com/EpistasisLab/SynTwin/tree/main/synthetic_algorithms_comparison) \n  - [step1_encoding_sampling ](https://github.com/EpistasisLab/SynTwin/blob/main/synthetic_algorithms_comparison/step1_encoding_sampling.ipynb)\n  - [step2_synthetic_algorithms](https://github.com/EpistasisLab/SynTwin/tree/main/synthetic_algorithms_comparison/step2_synthetic_algorithms) \n  - [step3_synthetic_algorithms_comparision](https://github.com/EpistasisLab/SynTwin/blob/main/synthetic_algorithms_comparison/step3_synthetic_algorithms_comparision.ipynb)\n\n- [SynTwin](https://github.com/EpistasisLab/SynTwin/tree/main/SynTwin)\n  - [step1_data_cleaning_sampling](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step1_data_cleaning_sampling.ipynb)\n  - [step2_mpom_synthetic_dataset](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step2_mpom_synthetic_dataset.ipynb)\n  - [step3_data_preprocessing](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step3_data_preprocessing.ipynb)\n  - [step4a_calc_distance_metrics_categorical](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step4a_calc_distance_matrices_categorical.ipynb)\n  - [step4b_cdist_gower](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step4b_cdist_gower.py)\n  - [step5a_percolation_threshold](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step5a_percolation_threshold.ipynb)\n  - [step5b_percolation_threshold_calculation](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step5b_percolation_threshold_calculation.ipynb)\n  - [step6a_get_resolution](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step6a_get_resolution.py)\n  - [step6b_resolution_summarization](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step6b_resolution_summarization.ipynb)\n  - [step7_vital_prediction](https://github.com/EpistasisLab/SynTwin/blob/main/SynTwin/step7_vital_prediction.ipynb)\n\nWe chose a population-based cancer registry from the Surveillance, Epidemiology, and End Results ([SEER](https://seer.cancer.gov)) program from the National Cancer Institute (USA) for this study due to its large sample size and ease of access by simple registration with an email address to allow for reproducibility. \n\nFollow the steps in [SynTwin](https://github.com/EpistasisLab/SynTwin/tree/main/SynTwin) to repeat the work from the paper. step2_mpom_synthetic_dataset can be replaced with any synthetic data generation algorithms that work best for your data. We evaluated three synthetic data generation algorithms, categorical latent Gaussian process (CLGP), mixture of product of multinomials (MPoM), and medical generative adversarial network (MC-MedGAN) by utilizing the code from [SYNDATA](https://github.com/LLNL/SYNDATA) and [multi-categorical-gans](https://github.com/rcamino/multi-categorical-gans). Please take a look at [synthetic_algorithms_comparison](https://github.com/EpistasisLab/SynTwin/tree/main/synthetic_algorithms_comparison) for details.\n\n## Reference\nMoore JH, Li X, Chang J-H, Tatonetti NP, Theodorescu D, Chen Y, Asselbergs F, Venkatesan M, Wang Z. Pacific Symposium on Biocomputing, in press (2024).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepistasislab%2Fsyntwin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepistasislab%2Fsyntwin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepistasislab%2Fsyntwin/lists"}