{"id":22731087,"url":"https://github.com/genentech/gpmhc","last_synced_at":"2025-03-30T01:29:50.756Z","repository":{"id":219806019,"uuid":"704241346","full_name":"Genentech/gpmhc","owner":"Genentech","description":null,"archived":false,"fork":false,"pushed_at":"2023-10-12T21:35:56.000Z","size":5652,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-05T03:27:55.172Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Genentech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-12T20:57:39.000Z","updated_at":"2024-10-01T11:57:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"f7ed765c-990a-402b-9f39-3234b1ad595f","html_url":"https://github.com/Genentech/gpmhc","commit_stats":null,"previous_names":["genentech/gpmhc"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Genentech%2Fgpmhc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Genentech%2Fgpmhc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Genentech%2Fgpmhc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Genentech%2Fgpmhc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Genentech","download_url":"https://codeload.github.com/Genentech/gpmhc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246264480,"owners_count":20749473,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-10T19:19:23.218Z","updated_at":"2025-03-30T01:29:50.737Z","avatar_url":"https://github.com/Genentech.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"This is the repo for performing inference using the baseline graph-pmhc model introduced in https://www.biorxiv.org/content/10.1101/2023.01.19.524779v1\r\n\r\n# Inference Instructions\r\nFirst clone the repo and enter a terminal in the repo folder\r\nNext, create your virtual environment: \r\n```\r\nmamba env create -f environment.yml\r\n```\r\n[!NOTE] Conda was not properly installing libraries and dependencies, so please use mamba (I used mamba 1.4.1) You will likely have to choose different torch/dgl packages to support your version of CUDA, I used 11.8. If you do so, delete the existing versions from environment.yml and install them according to the instructions on their website after creating the environment. To ensure that the code functions properly, use the same versions of DGL/torch used here (1.1.0 and 1.13.1). Sorry, this library doesn't support cpu inference.\r\n\r\nThen, download the model file (link to be created) and put it in the models/baseline_model directory.\r\nNext, download the CSV you'd like to get inference on (link to be created) or supply your own (ensure that the csv formatting matches the template)\r\n\r\nYou are now ready for inference. In terminal launch the infer file:\r\n```\r\npython ./gpmhc/infer.py\r\n```\r\nIf you'd like to perform inference on a different file than the template add a keyword:\r\n```\r\npython ./gpmhc/infer.py --csv 'path/to/file.csv'\r\n```\r\nKeywords for using different model parameters and model architectures are not currently supported.\r\n\r\n# Fun Facts\r\nThe model_json is the way that most arguments/settings/hyperparameters are handled in this repo, as we are only supporting inference, these arguments are not be be changed, and will be set by the model_json provided. Regardless, you may be curious about what the different keys in the model_json provided correspond to.\r\n\r\nnumber of epochs\r\n```\r\n\"epochs\": 30, \r\n```\r\nRandom seed\r\n```\r\n\"seed\": 2906,\r\n``` \r\nThis is a list of adjacency matrices, one for DR, DP, and DQ, each sublist represents a position on the binding core, and each index represents the residue location in the psuedosequence\r\n```\r\n\"mhc_adj\": [[[0, 4, 5, 6, 7, 8, 9, 10, 38, 39, 40, 41, 42], [4, 10, 36, 37, 38, 39], [1, 3, 4, 10, 11, 12, 13, 37, 39], [1, 4, 13, 23, 24, 25, 34, 35, 37], [13, 33, 34], [2, 13, 14, 15, 17, 22, 23, 26, 34], [14, 17, 25, 26, 28, 31, 32, 34], [14, 16, 17, 30, 31], [17, 18, 19, 20, 21, 27, 29, 30, 31]], [[0, 3, 4, 5, 6, 7, 8, 9, 39, 40, 41, 42], [0, 9, 37, 38, 39, 40], [0, 2, 9, 10, 11, 12, 38, 40], [0, 12, 23, 24, 25, 34, 35, 36, 38], [12, 13, 23, 25, 34, 35], [1, 12, 13, 14, 16, 21, 22, 23, 25, 26, 27, 35], [13, 16, 25, 27, 29, 32, 33, 35], [13, 15, 16, 31, 32, 33], [13, 16, 17, 18, 19, 20, 28, 30, 31, 32]], [[4, 5, 6, 7, 8, 29, 30, 31], [0, 4, 7, 8, 27, 28, 29, 30], [0, 1, 3, 4, 8, 10, 27, 28, 29, 30], [0, 1, 2, 3, 10, 17, 18, 19, 26, 28], [9, 10, 17, 25, 26], [2, 9, 10, 11, 13, 16, 17, 20], [11, 12, 13, 20, 23, 24, 25], [11, 12, 13, 22, 23], [12, 13, 14, 15, 21, 22, 23]]], \r\n```\r\nThis is the length of the psuedosequences for each gene\r\n```\r\n\"mhc_lens\": [43, 43, 32],\r\n```\r\nThe loss options are used for different loss functions\r\n```\r\n\"loss_options\": {\"loss_func\": \"MaskedBCEWithLogitsLoss\"}\r\n```\r\nDataloader options contains several subset options (this would make more sense in the context of the training code)\r\n```\r\n\"dataloader_options\": {\"csv_to_df\":\r\n``` \r\nSchema options gives the indices from the full MHC chain that are used in the psuedosequence for each gene, the max psuedosequence length, whether or not netmhcpan's psuedosequence is used (it isn't self consistent and so required some hard coding), and the amount of padding. Currently, netmhcpan's psuedosequence and padding are not supported in this library\r\n```\r\n{\"option\": \"schema_options\": [[6, 8, 10, 21, 23, 30, 31, 42, 51, 52, 53, 57, 58, 61, 64, 65, 67, 68, 71, 72, 75], [8, 10, 12, 25, 27, 29, 36, 46, 56, 59, 60, 66, 69, 70, 73, 76, 77, 80, 81, 84, 85, 88], [8, 10, 21, 23, 30, 31, 42, 51, 52, 53, 57, 58, 61, 64, 65, 67, 68, 71, 72, 75], [8, 10, 11, 12, 23, 25, 26, 27, 34, 44, 54, 57, 58, 64, 67, 68, 71, 74, 75, 78, 79, 82, 83], [10, 11, 13, 24, 26, 34, 54, 55, 56, 63, 64, 67, 70, 71, 74, 78], [11, 13, 15, 30, 32, 59, 62, 63, 69, 72, 76, 79, 80, 83, 84, 87], 43, \"normal\", 0]}},\r\n```\r\nmodel_hyper_opts define the model hyperparameters which are used while generating the model, the following were used in the baseline model.\r\n```\r\n\"model_hyper_opts\": {\"batch_size\": 64, \"lr\": 0.0005, \"node_feat_size\": 64, \"graph_feat_size\": 128, \"edge_feat_size\": 3, \"gnn_layers\": 2, \"gnn_dropout\": 0.1, \"timesteps\": 2, \"rnn_dropout\": 0.2, \"classifier_dropout\": 0.4, \"bc_pad\": 0, \"readout\": \"recurrent\", \"posenc\": 1, \"time_steps\": 2}}\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenentech%2Fgpmhc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgenentech%2Fgpmhc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenentech%2Fgpmhc/lists"}