{"id":19066176,"url":"https://github.com/epfml/phantomedicus","last_synced_at":"2025-10-16T13:41:26.754Z","repository":{"id":39882319,"uuid":"468061997","full_name":"epfml/phantomedicus","owner":"epfml","description":"MedSurge: medical survey generator","archived":false,"fork":false,"pushed_at":"2022-06-25T07:51:11.000Z","size":17197,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-06-26T23:59:04.565Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epfml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-09T19:15:50.000Z","updated_at":"2022-03-31T22:09:54.000Z","dependencies_parsed_at":"2022-09-19T14:31:49.632Z","dependency_job_id":null,"html_url":"https://github.com/epfml/phantomedicus","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/epfml/phantomedicus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fphantomedicus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fphantomedicus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fphantomedicus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fphantomedicus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epfml","download_url":"https://codeload.github.com/epfml/phantomedicus/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Fphantomedicus/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274589553,"owners_count":25312971,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-11T02:00:13.660Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T00:55:02.428Z","updated_at":"2025-10-16T13:41:21.705Z","avatar_url":"https://github.com/epfml.png","language":"Jupyter Notebook","readme":"# PhantoMedicus - Medical Survey Generator\nPhantomedicus is an early stage framework for simulating patients and consultations. Two methods are currently supported:\n- Manually assigned probabilities\n- Data driven probabilities\n\nEither of these methods can be run by changing a CLI: `python main.py --bayes manual_probs` is used to generate a simulator given manually designated probabilities, an example of which can be found in `metadata.json`, and `python main.py --bayes data_driven_probs` makes use of an already existing dataset to derive the probabilistic interdepencies between different base attributes, diseases, and symptoms. To create the environment run `conda env create -f environment.yml`. \n\n##  Bayesian Network Structure\nThe graph dependencies can be broadly summarized as base features influencing the likelihood of certain diseases, which in turn influence a patient's symptoms.\nThe approach for defining the structure and corresponding probabilities is outlined below.\n\n## Manual Probabilities\nThe metadata structure which is currently used is a dictionary of the following form:\n\n```\nmetadata_dict = {\n    \"disease_list\": considered_diseases,\n    \"symptom_list\": considered_symptoms,\n    \"node_states\": {\n        \"patient_attributes\": base_features_state_dict,\n        \"diseases\": disease_state_dict,\n        \"symptoms\": symptom_state_dict,\n    },\n    \"patient_attribute_disease_probs\": base_feature_disease_prob_dict,\n    \"disease_symptom_probs\": disease_symptom_prob_dict,\n    \"doctors\": doctors,\n}\n```\n\n- `disease_list` contains the list of diseases that you wish to include in your model, all prefixed by `disease` e.g. `disease_pneumonia`\n- `symptom_list` contains the list of symptoms that you wish to include in your model, all prefixed by `symptom` e.g. `symptom_pneumonia`\n- `node_states` contains descriptive features for the random variables (nodes) in the graph. Note that these vary between the patient attributes and symptoms/ diseases as we do not assign marginal probabilities to the symptoms/ diseases. For this we need to define a structure of probabilistic dependencies as outlined below. This has three subdictionaries:\n    - `patient_attributes` - here we have 4 key-value pairs:\n        - `dtype` i.e. the datatype, can be one of `binary`, `categorical`, or `continuous`\n        - `state_name` i.e. the names the random variable may assume\n        - `vals` i.e. the values assumed for each of the states (often just the state names themselves)  \n        - `prob` i.e. the probability of sampling any one of these states\n    - `diseases` - here we have 2 key-value pairs\n       - `dtype` as described above\n       - `state_name` as described above\n    - `symptoms` - here we also have 2 key-value pairs\n       - `dtype` as described above\n       - `state_name` as described above\n- `patient_attribute_disease_probs` - here, for each patient attribute we define a subdictionary. Each subdictionary will contain \n  the diseases which are influenced by each patient attribute (i.e. edges in the Bayesian network), alongside the associated probabilities of the diseases due to \n  each possible state of each given patient attribute. For instance if we have a patient attribute `base_country` for which \n  4 possible states i.e. countries are assigned, we may define the subdictionary corresponding the `base_country` as follows:\n  ```\n    \"base_country\": {\n            \"disease_urti\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_bronchiolitis\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_bronchitis\": [0.07, 0.04, 0.05, 0.04],\n            \"disease_pneumonia\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_asthma\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_tb\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_covid\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_malaria\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_dengue\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_diarrhea\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_ebola\": [0.07, 0.04, 0.05, 0.04], \n            \"disease_severe\": [0.07, 0.04, 0.05, 0.04]\n        },\n  ```\n- `disease_symptom_probs` is much the same as `patient_attribute_disease_probs` except we now define the associated probabilities\n  of symptoms based on diseases.\n- `doctors` contains a subdictionary with the following fields:\n    - `doctor_types` - list of the names associated with the doctor types and can be found in `config.py`\n    - `country` contains a further subdictionary with all the countries you are simulating. For each country we assign a probability\n      distribution of the doctor profiles, as well as doctor specific parameters for each doctor (serves to simulate differences \n      in doctors across different regions)\n      \nA comprehensive example of the above can be found in `metadata.json`, which is a metadata file with manually assigned probabilities.\n    \n## Data Driven\nThe data driven approach makes use of the same metadata structure as above, the only difference being that now the probabilities are \nderived from a dataset. The procedure can be found in `generate_prob_dict.py`. Note that if another dataset is used, it will \nrequire some modifications to pick the specific patient attributes/ diseases/ symptoms of interest.\n\n##  Doctor Profiles for Consultations\nThe defined doctor profiles can be found in `src/doctor.py`. Note that the doctor profiles are used in `main.py` when simulating \npatients and conducting consultations.\n\n## Repository Structure\n- `src/doctor.py` contains the defined doctor profiles\n- `src/patient_simulator.py` contains the `PatientSimulator` class which defines the Bayesian network structure and aggregates the probabilities \n  using the metadata described above\n- `src/utils.py` contains utility functions for manipulating patient data and for the doctor profiles\n- `config.py` contains some configuration parameters for the simulation and paths for reading/outputting data\n- `generate_prob_dict.py` - contains the code for generating the metadata based on the raw data\n- `main.py` contains the entire procedure for simulating batches of patients and their consultations and outputs the consultations\nin a `pkl` file \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Fphantomedicus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepfml%2Fphantomedicus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Fphantomedicus/lists"}