{"id":13958594,"url":"https://github.com/uw-ipd/RoseTTAFold2NA","last_synced_at":"2025-07-21T00:31:18.905Z","repository":{"id":59081479,"uuid":"534357046","full_name":"uw-ipd/RoseTTAFold2NA","owner":"uw-ipd","description":"RoseTTAFold2 protein/nucleic acid complex prediction","archived":false,"fork":false,"pushed_at":"2024-06-03T19:47:43.000Z","size":1139,"stargazers_count":335,"open_issues_count":76,"forks_count":77,"subscribers_count":16,"default_branch":"main","last_synced_at":"2024-11-28T02:34:51.086Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uw-ipd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-08T19:06:59.000Z","updated_at":"2024-11-26T20:31:35.000Z","dependencies_parsed_at":"2024-11-28T02:32:23.820Z","dependency_job_id":"bba62d58-35ca-4b01-ae7a-f05320083cfb","html_url":"https://github.com/uw-ipd/RoseTTAFold2NA","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/uw-ipd/RoseTTAFold2NA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uw-ipd%2FRoseTTAFold2NA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uw-ipd%2FRoseTTAFold2NA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uw-ipd%2FRoseTTAFold2NA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uw-ipd%2FRoseTTAFold2NA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uw-ipd","download_url":"https://codeload.github.com/uw-ipd/RoseTTAFold2NA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uw-ipd%2FRoseTTAFold2NA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266221260,"owners_count":23894965,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-08T13:01:45.724Z","updated_at":"2025-07-21T00:31:15.710Z","avatar_url":"https://github.com/uw-ipd.png","language":"Python","funding_links":[],"categories":["蛋白质结构"],"sub_categories":["网络服务_其他"],"readme":"# RF2NA\nGitHub repo for RoseTTAFold2 with nucleic acids\n\n**New: April 13, 2023 v0.2**\n* Updated weights (https://files.ipd.uw.edu/dimaio/RF2NA_apr23.tgz) for better prediction of homodimer:DNA interactions and better DNA-specific sequence recognition\n* Bugfixes in MSA generation pipeline\n* Support for paired protein/RNA MSAs\n\n## Installation\n\n1. Clone the package\n```\ngit clone https://github.com/uw-ipd/RoseTTAFold2NA.git\ncd RoseTTAFold2NA\n```\n\n2. Create conda environment\nAll external dependencies are contained in `RF2na-linux.yml`\n```\n# create conda environment for RoseTTAFold2NA\nconda env create -f RF2na-linux.yml\n```\nYou also need to install NVIDIA's SE(3)-Transformer (**please use SE3Transformer in this repo to install**).\n```\nconda activate RF2NA\ncd SE3Transformer\npip install --no-cache-dir -r requirements.txt\npython setup.py install\ncd ..\n```\n\n3. Download pre-trained weights under network directory\n```\ncd network\nwget https://files.ipd.uw.edu/dimaio/RF2NA_apr23.tgz\ntar xvfz RF2NA_apr23.tgz\nls weights/ # it should contain a 1.1GB weights file\ncd ..\n```\n\n4. Download sequence and structure databases\n```\n# uniref30 [46G]\nwget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz\nmkdir -p UniRef30_2020_06\ntar xfz UniRef30_2020_06_hhsuite.tar.gz -C ./UniRef30_2020_06\n\n# BFD [272G]\nwget https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz\nmkdir -p bfd\ntar xfz bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz -C ./bfd\n\n# structure templates (including *_a3m.ffdata, *_a3m.ffindex)\nwget https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2021Mar03.tar.gz\ntar xfz pdb100_2021Mar03.tar.gz\n\n# RNA databases\nmkdir -p RNA\ncd RNA\n\n# Rfam [300M]\nwget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.full_region.gz\nwget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz\ngunzip Rfam.cm.gz\ncmpress Rfam.cm\n\n# RNAcentral [12G]\nwget ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/rfam/rfam_annotations.tsv.gz\nwget ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/id_mapping/id_mapping.tsv.gz\nwget ftp://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/sequences/rnacentral_species_specific_ids.fasta.gz\n../input_prep/reprocess_rnac.pl id_mapping.tsv.gz rfam_annotations.tsv.gz   # ~8 minutes\ngunzip -c rnacentral_species_specific_ids.fasta.gz | makeblastdb -in - -dbtype nucl  -parse_seqids -out rnacentral.fasta -title \"RNACentral\"\n\n# nt [151G]\nupdate_blastdb.pl --decompress nt\ncd ..\n```\n\n## Usage\n```\nconda activate RF2NA\ncd example\n# run Protein/RNA prediction\n../run_RF2NA.sh rna_pred rna_binding_protein.fa R:RNA.fa\n# run Protein/DNA prediction\n../run_RF2NA.sh dna_pred dna_binding_protein.fa D:DNA.fa\n```\n### Inputs\n* The first argument to the script is the output folder\n* The remaining arguments are fasta files for individual chains in the structure.  Use the tags `P:xxx.fa` `R:xxx.fa` `D:xxx.fa` `S:xxx.fa` to specify protein, RNA, double-stranded DNA, and single-stranded DNA, respectively.  Use the tag `PR:xxx.fa` to specify paired protein/RNA.    Each chain is a separate file; 'D' will automatically generate a complementary DNA strand to the input strand.  \n\n### Expected outputs\n* Outputs are written to the folder provided as the first argument (`dna_pred` and `rna_pred`).\n* Model outputs are placed in a subfolder, `models` (e.g., `dna_pred.models`)\n* You will get a predicted structre with estimated per-residue LDDT in the B-factor column (`models/model_00.pdb`)\n* You will get a numpy `.npz` file (`models/model_00.npz`).  This can be read with `numpy.load` and contains three tables (L=complex length):\n   - dist (L x L x 37) - the predicted distogram\n   - lddt (L) - the per-residue predicted lddt\n   - pae (L x L) - the per-residue pair predicted error\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuw-ipd%2FRoseTTAFold2NA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuw-ipd%2FRoseTTAFold2NA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuw-ipd%2FRoseTTAFold2NA/lists"}