{"id":19669910,"url":"https://github.com/dohlee/spliceai-pytorch","last_synced_at":"2025-07-05T03:33:08.301Z","repository":{"id":65919785,"uuid":"601970375","full_name":"dohlee/spliceai-pytorch","owner":"dohlee","description":"Implementation of SpliceAI, Illumina's deep neural network to predict variant effects on splicing, in PyTorch.","archived":false,"fork":false,"pushed_at":"2023-03-03T00:23:21.000Z","size":154,"stargazers_count":13,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-23T13:13:33.415Z","etag":null,"topics":["artificial-intelligence","bioinformatics","biology","computational-biology","deep-learning","deep-neural-networks","reproduction","reproduction-code","splicing","variant-effect-prediction"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dohlee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-02-15T08:10:59.000Z","updated_at":"2025-03-09T08:59:41.000Z","dependencies_parsed_at":"2023-03-14T13:00:31.146Z","dependency_job_id":null,"html_url":"https://github.com/dohlee/spliceai-pytorch","commit_stats":{"total_commits":32,"total_committers":1,"mean_commits":32.0,"dds":0.0,"last_synced_commit":"8e4a2bc2a943774fc78feb5c7926fc9a330efa62"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Fspliceai-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Fspliceai-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Fspliceai-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Fspliceai-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dohlee","download_url":"https://codeload.github.com/dohlee/spliceai-pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251410252,"owners_count":21584987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","bioinformatics","biology","computational-biology","deep-learning","deep-neural-networks","reproduction","reproduction-code","splicing","variant-effect-prediction"],"created_at":"2024-11-11T17:03:03.398Z","updated_at":"2025-04-29T00:31:11.722Z","avatar_url":"https://github.com/dohlee.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# spliceai-pytorch\n\n![model](img/banner.png)\n\nImplementation of SpliceAI, Illumina's deep neural network to predict variant effects on splicing, in PyTorch. You can find the Illumina's official implementation [here](https://github.com/Illumina/SpliceAI).\n\n## Installation\n\n```bash\npip install spliceai-pytorch\n```\n\n## Usage\n```python\nimport torch\nfrom spliceai_pytorch import SpliceAI\n\nmodel_80nt = SpliceAI.from_preconfigured('80nt')\n# model_400nt = SpliceAI.from_preconfigured('400nt')\n# model_2k = SpliceAI.from_preconfigured('2k')\n# model_10k = SpliceAI.from_preconfigured('10k')\n\n# ... training ...\n\nx = torch.randn([1, 4, 80 + 5000])  # Predicts Donor/Acceptor probs only for core 5000nt region.\n\nprobs = model_80nt(x)  # (1, 5000, 3)\n```\n\n## Generating train/test sets\n\nFirst, download 'SpliceAI train code' directory from [here](https://basespace.illumina.com/s/5u6ThOblecrh) and unzip it to `spliceai_train_code` directory.\nAlso, download human reference genome (version hg19) to `spliceai_train_code/reference` directory.\n\nThen, run the following command to generate train/test sets after moving into `spliceai_train_code/Canonical`.\n\n```bash\n# Before running `grab_sequence.sh`,\n# make sure that the variable CL_max is configured properly in `constants.py` (80, 400, 2000 or 10000)\nchmod 755 grab_sequence.sh\n./grab_sequence.sh\n\n# Requires Python 2.7, with numpy, h5py, scikit-learn installed\npython create_datafile.py train all  # ~4 miniutes, creates datafile_train_all.h5 (27G)\npython create_datafile.py test 0     # ~1 minute, creates datafile_test_0.h5 (2.4G)\n\npython create_dataset.py train all   # ~11 minutes, creates dataset_train_all.h5 (5.4G)\npython create_dataset.py test 0      # ~1 minute, creates dataset_test_0.h5 (0.5G)\n```\n\n## Training\n```shell\n$ python -m spliceai_pytorch.train --model 80nt \\  # 80nt, 400nt, 2k, 10k\n  --train-h5 spliceai_train_code/Canonical/dataset_train_all.h5 \\\n  --test-h5 spliceai_train_code/Canonical/dataset_test_0.h5 \\\n  --use-wandb  # Optional, for logging.\n```\n\n## Reproduction status (wip)\n\nCurrently on the reproduction of Figure 1E. Results are as below, and you can view [model training logs here (W\u0026B)](https://wandb.ai/dohlee/spliceai-pytorch/reports/SpliceAI-reproduction-Single-model---VmlldzozNjAyNTE5?accessToken=mfmsivay143tqauivt18mxvuna3j1s7ff54c6lg749hjuf11r8xnsllj3ecs1okm).\n\nNOTE: Target results are from ensemble of 5 models, while reproduced results are from a single model.\n\n|Model|Top-k acc. (target)|PR-AUC (target)|Top-k acc. (reproduced)|PR-AUC (reproduced)|\n|-----|:-----------------:|:-------------:|:---------------------:|:-----------------:|\nSpliceAI-80nt|0.57|0.60|0.54355|0.56435|\nSpliceAI-400nt|0.90|0.95|0.87265|0.93160|\nSpliceAI-2k|0.93|0.97|0.9083|0.9541|\nSpliceAI-10k|0.95|0.98|0.9286|0.96475|\n\n## Citation\n```bibtex\n@article{jaganathan2019predicting,\n  title={Predicting splicing from primary sequence with deep learning},\n  author={Jaganathan, Kishore and Panagiotopoulou, Sofia Kyriazopoulou and McRae, Jeremy F and Darbandi, Siavash Fazel and Knowles, David and Li, Yang I and Kosmicki, Jack A and Arbelaez, Juan and Cui, Wenwu and Schwartz, Grace B and others},\n  journal={Cell},\n  volume={176},\n  number={3},\n  pages={535--548},\n  year={2019},\n  publisher={Elsevier}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdohlee%2Fspliceai-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdohlee%2Fspliceai-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdohlee%2Fspliceai-pytorch/lists"}