{"id":23515075,"url":"https://github.com/rcedgar/reseek","last_synced_at":"2025-04-19T14:53:38.563Z","repository":{"id":241024406,"uuid":"804064047","full_name":"rcedgar/reseek","owner":"rcedgar","description":"Protein structure alignment and search algorithm","archived":false,"fork":false,"pushed_at":"2025-03-22T22:11:44.000Z","size":25163,"stargazers_count":59,"open_issues_count":4,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-22T23:19:57.867Z","etag":null,"topics":["bioinformatics","bioinformatics-algorithms","bioinformatics-tool","computational-biology","protein-sequences","protein-structure","search-algorithm"],"latest_commit_sha":null,"homepage":"https://drive5.com/reseek","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rcedgar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-21T22:14:10.000Z","updated_at":"2025-03-21T13:11:59.000Z","dependencies_parsed_at":"2024-05-22T01:42:39.499Z","dependency_job_id":"1ace7288-1926-4d3a-b994-5c6e14953738","html_url":"https://github.com/rcedgar/reseek","commit_stats":null,"previous_names":["rcedgar/reseek"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcedgar%2Freseek","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcedgar%2Freseek/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcedgar%2Freseek/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcedgar%2Freseek/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rcedgar","download_url":"https://codeload.github.com/rcedgar/reseek/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249718494,"owners_count":21315083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","bioinformatics-algorithms","bioinformatics-tool","computational-biology","protein-sequences","protein-structure","search-algorithm"],"created_at":"2024-12-25T14:12:51.153Z","updated_at":"2025-04-19T14:53:38.550Z","avatar_url":"https://github.com/rcedgar.png","language":"C++","readme":"![Reseek](http://drive5.com/images/reseek_logo.jpg)\n\n![Reseek](https://drive5.com/images/reseek_v2.3_fixed.jpg)\n\n[Reseek](http://drive5.com/reseek) is a protein structure search and alignment algorithm which improves sensitivity in protein homolog detection\ncompared to state-of-the-art methods including DALI, TM-align and Foldseek with similar speed to Foldseek.\n\nReseek is based on sequence alignment where each residue in the protein backbone is represented by a \nletter in a novel “mega-alphabet” of 85,899,345,920 (∼10\u003csup\u003e11\u003c/sup\u003e) distinct states.\n\nMethod sensitivity was measured on the SCOP40 benchmark using superfamily as the truth standard, focusing\non the regime with false-positive error rates \u003c10 per query, corresponding to E\u003c10 for an ideal E-value.\n\n[\u003cimg src=\"https://drive5.com/reseek/youtube_snip.gif\" width=\"150\"\u003e](https://www.youtube.com/watch?v=BzIgqdm9xDs)\n![Reseek](https://drive5.com/images/reseek_readme.jpg)\n\n### Command line\n\u003cpre\u003e\n  -search        # Alignment (e.g. DB search, pairwise, all-vs-all)\n  -convert       # Convert file formats (e.g. create DB)\n  -alignpair     # Pair-wise alignment and superposition\n\nSearch against database\n    reseek -search STRUCTS -db STRUCTS -output hits.txt\n                 # STRUCTS specifies structure(s), see below\n\nRecommended format for large database is .bca, e.g.\n    reseek -convert /data/PDB_mirror/ -bca PDB.bca\n\nAlign and superpose two structures\n    reseek -alignpair 1XYZ.pdb -input2 2ABC.pdb\n           -aln FILE     # Sequence alignment (text)\n           -output FILE  # Rotated 1XYZ (PDB format)\n\nAll-vs-all alignment\n    reseek -search STRUCTS -output hits.txt\n\nOutput options for -search\n   -aln FILE     # Alignments in human-readable format\n   -output FILE  # Hits in tabbed text format\n   -columns name1+name2+name3...\n                 # Output columns, names are\n                 #   query   Query label\n                 #   target  Target label\n                 #   qlo     Start of aligment in query\n                 #   qhi     End of aligment in query\n                 #   tlo     Start of aligment in target\n                 #   thi     End of aligment in target\n                 #   ql      Query length\n                 #   tl      Target length\n                 #   pctid   Percent identity of alignment\n                 #   cigar   CIGAR string\n                 #   evalue  You can guess this one\n                 #   aq      AQ (aln. qual., 0 to 1, \u003e0.5 suggests homology)\n                 #   qrow    Aligned query sequence with gaps (local)\n                 #   trow    Aligned target sequence with gaps (local)\n                 #   qrowg   Aligned query sequence with gaps (global)\n                 #   trowg   Aligned target sequence with gaps (global)\n                 #   std     query+target+qlo+qhi+ql+tlo+thi+tl+pctid+evalue\n                 # default aq+query+target+evalue\n\nSearch and alignment options\n  -fast, -sensitive or -verysensitive     # Required\n  -evalue E      # Max E-value (default 10 unless -verysensitive)\n  -omega X       # Omega accelerator (floating-point)\n  -minu U        # K-mer accelerator (integer)\n  -gapopen X     # Gap-open penalty (floating-point \u003e= 0)\n  -gapext X      # Gap-extend penalty (floating-point \u003e= 0)\n  -dbsize D      # DB size (nr. chains) for E-value (default actual size)\n\nConvert between file formats\n    reseek -convert STRUCTS [one or more output options]\n           -cal FILENAME    # .cal format, text with a.a. and C-alpha x,y,z\n           -bca FILENAME    # .bca format, binary .cal, recommended for DBs\n           -fasta FILENAME  # FASTA format\n\nCreate input for Muscle-3D multiple structure alignment:\n    reseek -pdb2mega STRUCTS -output structs.mega\n\nSTRUCTS argument is one of:\n   NAME.cif or NAME.mmcif     # PDBx/mmCIF file\n   NAME.pdb                   # Legacy format PDB file\n   NAME.cal                   # C-alpha tabbed text format with chain(s)\n   NAME.bca                   # Binary C-alpha, recommended for larger DBs\n   NAME.files                 # Text file with one STRUCT per line,\n                              #   may be filename, directory or .files\n   DIRECTORYNAME              # Directory (and its sub-directories) is searched\n                              #   for known file types including .pdb, .files etc.\n\nOther options:\n   -log FILENAME              # Log file with errors, warnings, time and memory.\n   -threads N                 # Number of threads, default number of CPU cores.\n\u003c/pre\u003e\n\n#### Build from source on Linux x86\n\u003cpre\u003e\ncd src/; chmod +x build_linux_x86.bash ; ./build_linux_x86.bash\n\u003c/pre\u003e\n\n#### Build from source on OSX x86\n\u003cpre\u003e\ncd src/ ; chmod +x build_osx_x86.bash ; ./build_osx_x86.bash\n\u003c/pre\u003e\n\n#### Build from source on Windows\nLoad `reseek.vcxproj` into Microsoft Visual Studio and use the Build command.\n\n#### Static link warning\nDon't worry about a warning something like this, it's expected:\n\u003cpre\u003e\nwarning: Using 'dlopen' in statically linked applications requires\n  at runtime the shared libraries from the glibc version used for linking\n\u003c/pre\u003e\n### More documentation\n\n[https://drive5.com/reseek](https://drive5.com/reseek)\n\n### Reference\n\nEdgar, Robert C. (2024) \"Sequence alignment using large protein structure alphabets improves sensitivity to remote homologs\" [https://www.biorxiv.org/content/10.1101/2024.05.24.595840v2](https://www.biorxiv.org/content/10.1101/2024.05.24.595840v2)\n\n\n### SCOP40 benchmark code and results\n\nhttps://github.com/rcedgar/reseek_bench\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frcedgar%2Freseek","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frcedgar%2Freseek","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frcedgar%2Freseek/lists"}