{"id":27164865,"url":"https://github.com/steineggerlab/foldseek","last_synced_at":"2025-10-05T22:42:09.449Z","repository":{"id":37711997,"uuid":"166769317","full_name":"steineggerlab/foldseek","owner":"steineggerlab","description":"Foldseek enables fast and sensitive comparisons of large structure sets.","archived":false,"fork":false,"pushed_at":"2025-09-03T08:13:17.000Z","size":35593,"stargazers_count":1046,"open_issues_count":141,"forks_count":132,"subscribers_count":20,"default_branch":"master","last_synced_at":"2025-09-03T10:13:42.020Z","etag":null,"topics":["alignments","bioinformatics","clustering","protein-structure"],"latest_commit_sha":null,"homepage":"https://foldseek.com","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/steineggerlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-01-21T07:39:25.000Z","updated_at":"2025-09-03T07:49:15.000Z","dependencies_parsed_at":"2023-02-17T17:31:27.485Z","dependency_job_id":"455d3b86-d324-434c-a7d6-ef4aa564b6a2","html_url":"https://github.com/steineggerlab/foldseek","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/steineggerlab/foldseek","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldseek","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldseek/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldseek/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldseek/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/steineggerlab","download_url":"https://codeload.github.com/steineggerlab/foldseek/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldseek/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278532360,"owners_count":26002345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignments","bioinformatics","clustering","protein-structure"],"created_at":"2025-04-09T02:41:01.077Z","updated_at":"2025-10-05T22:42:09.438Z","avatar_url":"https://github.com/steineggerlab.png","language":"C","funding_links":[],"categories":["AI for Science","🔬 Functional Annotation Tools","Phylogenetics","🔬 Domain-Specific Applications"],"sub_categories":["Protein","Structure-Based Annotation","Software","🧬 Biology \u0026 Medicine"],"readme":"\n# Foldseek \nFoldseek enables fast and sensitive comparisons of large protein structure sets, supporting monomer and multimer searches, as well as clustering. It runs on CPU, supports GPU acceleration for faster searches, and optionally allows ultra-fast and sensitive comparisons directly from protein sequence inputs using a language model, bypassing the need for structures.\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://github.com/steineggerlab/foldseek/blob/master/.github/foldseek.png\" height=\"250\"/\u003e\u003c/p\u003e\n\n## Publications\n[van Kempen M, Kim S, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, Söding J, and Steinegger M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)](https://www.nature.com/articles/s41587-023-01773-0)\n\n[Barrio-Hernandez I, Yeo J, Jänes J, Mirdita M, Gilchrist CLM, Wein T, Varadi M, Velankar S, Beltrao P and Steinegger M. Clustering predicted structures at the scale of the known protein universe. Nature, doi:10.1038/s41586-023-06510-w (2023)](https://www.nature.com/articles/s41586-023-06510-w)\n\n[Kim W, Mirdita M, Levy Karin E, Gilchrist CLM, Schweke H, Söding J, Levy E, and Steinegger M. Rapid and sensitive protein complex alignment with Foldseek-Multimer. Nature Methods, doi:10.1038/s41592-025-02593-7 (2025)](https://www.nature.com/articles/s41592-025-02593-7)\n\n[Kallenborn F, Chacon A, Hundt C, Sirelkhatim H, Didi K, Cha S, Dallago C, Mirdita M, Schmidt B, Steinegger M: GPU-accelerated homology search with MMseqs2. bioRxiv, doi: 10.1101/2024.11.13.623350 (2024)](https://www.biorxiv.org/content/10.1101/2024.11.13.623350v1)\n\n[![BioConda Install](https://img.shields.io/conda/dn/bioconda/foldseek.svg?style=flag\u0026label=BioConda%20install)](https://anaconda.org/bioconda/foldseek) \n[![Github All Releases](https://img.shields.io/github/downloads/steineggerlab/foldseek/total.svg)](https://github.com/steineggerlab/foldseek/releases/latest) \n[![Biocontainer Pulls](https://img.shields.io/endpoint?url=https%3A%2F%2Fmmseqs.com%2Fbiocontainer.php%3Fcontainer%3Dfoldseek)](https://biocontainers.pro/#/tools/foldseek) \n[![Build Status](https://dev.azure.com/themartinsteinegger/foldseek/_apis/build/status/steineggerlab.foldseek?branchName=master)](https://dev.azure.com/themartinsteinegger/foldseek/_build/latest?definitionId=2\u0026branchName=master)\n\n# Table of Contents\n\n- [Foldseek](#foldseek)\n  - [Publications](#publications)\n- [Table of Contents](#table-of-contents)\n  - [Webserver](#webserver)\n  - [Installation](#installation)\n  - [Memory requirements](#memory-requirements)\n  - [Tutorial Video](#tutorial-video)\n  - [Documentation](#documentation)\n  - [Quick start](#quick-start)\n    - [Search](#search)\n      - [Output Search](#output-search)\n        - [Tab-separated](#tab-separated)\n        - [Superpositioned Cα only PDB files](#superpositioned-cα-only-pdb-files)\n        - [Interactive HTML](#interactive-html)\n      - [Important search parameters](#important-search-parameters)\n      - [Alignment Mode](#alignment-mode)\n    - [Databases](#databases)\n      - [Create custom databases and indexes](#create-custom-databases-and-indexes)\n      - [Create custom database from protein sequence (FASTA)](#create-custom-database-from-protein-sequence-fasta)\n      - [Pad database for fast GPU search](#pad-database-for-fast-gpu-search)\n    - [Cluster](#cluster)\n      - [Output Cluster](#output-cluster)\n        - [Tab-separated cluster](#tab-separated-cluster)\n        - [Representative fasta](#representative-fasta)\n        - [All member fasta](#all-member-fasta)\n      - [Important cluster parameters](#important-cluster-parameters)\n    - [Multimersearch](#multimersearch)\n      - [Using Multimersearch](#using-multimersearch)\n      - [Multimer Search Output](#multimer-search-output)\n        - [Tab-separated-complex](#tab-separated-complex)\n        - [Complex Report](#complex-report)\n    - [Multimercluster](#multimercluster)\n      - [Output MultimerCluster](#output-multimercluster)\n        - [Tab-separated multimercluster](#tab-separated-multimercluster)\n        - [Representative multimer fasta](#representative-multimer-fasta)\n        - [Filtered search result](#filtered-search-result)\n      - [Important multimer cluster parameters](#important-multimer-cluster-parameters)\n  - [Main Modules](#main-modules)\n  - [Examples](#examples)\n    - [Faster Search with GPU Acceleration](#faster-search-with-gpu-acceleration)\n    - [Fast Structure Search from FASTA input](#fast-structure-search-from-fasta-input)\n    - [Rescore aligments using TMscore](#rescore-aligments-using-tmscore)\n    - [Query centered multiple sequence alignment](#query-centered-multiple-sequence-alignment)\n\n## Webserver \nSearch your protein structures against the [AlphaFoldDB](https://alphafold.ebi.ac.uk/) and [PDB](https://www.rcsb.org/) in seconds using the Foldseek webserver ([code](https://github.com/soedinglab/mmseqs2-app)): [search.foldseek.com](https://search.foldseek.com) 🚀\n\n## Installation\n```\n# Linux AVX2 build (check using: cat /proc/cpuinfo | grep avx2)\nwget https://mmseqs.com/foldseek/foldseek-linux-avx2.tar.gz; tar xvzf foldseek-linux-avx2.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH\n\n# Linux ARM64 build\nwget https://mmseqs.com/foldseek/foldseek-linux-arm64.tar.gz; tar xvzf foldseek-linux-arm64.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH\n\n# Linux AVX2 \u0026 GPU build (req. glibc \u003e= 2.17 and nvidia driver \u003e=525.60.13)\nwget https://mmseqs.com/foldseek/foldseek-linux-gpu.tar.gz; tar xvfz foldseek-linux-gpu.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH\n\n# MacOS\nwget https://mmseqs.com/foldseek/foldseek-osx-universal.tar.gz; tar xvzf foldseek-osx-universal.tar.gz; export PATH=$(pwd)/foldseek/bin/:$PATH\n\n# Conda installer (Linux and macOS)\nconda install -c conda-forge -c bioconda foldseek\n```\nOther precompiled binaries are available at [https://mmseqs.com/foldseek](https://mmseqs.com/foldseek).\n\n\u003e [!NOTE]\n\u003e We recently added support for GPU-accelerated protein sequence and profile searches. This requires an NVIDIA GPU of the Ampere generation or newer for full speed, however, also works at reduced speed for Turing-generation GPUs. The bioconda- and precompiled binaries will not work on older GPU generations (e.g. Volta or Pascal).\n\n## Memory requirements \nFor optimal software performance, consider three options based on your RAM and search requirements:\n\n1. **With Cα info (default).** \n   Use this formula to calculate RAM - `(6 bytes Cα + 1 3Di byte + 1 AA byte) * (database residues)`. The 54M AFDB50 entries require 151GB.\n\n2. **Without Cα info.** \n   By disabling `--sort-by-structure-bits 0`, RAM requirement reduces to 35GB. However, this alters hit rankings and final scores but not E-values. Structure bits are mostly relevant for hit ranking for E-value \u003e 10^-1.\n\n3. **Single query searches.** \n   Use the `--prefilter-mode 1`, which isn't memory-limited and computes all optimal ungapped alignments. This option optimally utilizes foldseek's multithreading capabilities for single queries and supports GPU acceleration.\n\n## Tutorial Video\nA Foldseek tutorial covering the webserver and command-line usage is available [here](https://www.youtube.com/watch?v=k5Rbi22TtOA). \u003ca href=\"https://www.youtube.com/watch?v=k5Rbi22TtOA\"\u003e\u003cimg src=\"https://img.shields.io/youtube/views/k5Rbi22TtOA?style=social\"\u003e\u003c/a\u003e\n\n## Documentation\nMany of Foldseek's modules (subprograms) rely on MMseqs2. For more information about these modules, refer to the [MMseqs2 wiki](https://github.com/soedinglab/MMseqs2/wiki). For documentation specific to Foldseek, checkout the Foldseek wiki [here](https://github.com/steineggerlab/foldseek/wiki).\n\n## Quick start\n\n### Search\nThe `easy-search` module allows to query one or more single-chain proteins, formatted in as protein structures in PDB/mmCIF format (flat or gzipped) or as protein sequnece in [fasta](#create-custom-database-from-protein-sequence-fasta), against a target database, folder or individual single-chain protein structures (for multi-chain proteins see [complexsearch](#complexsearch)). The default alignment information output is a [tab-separated file](#tab-separated) but Foldseek also supports [Superposed Cα PDBs](#superpositioned-cα-only-pdb-files) and [HTML](#interactive-html).\n\n    foldseek easy-search example/d1asha_ example/ aln tmpFolder\n    \n#### Output Search\n##### Tab-separated\n  \nThe default output fields are: `query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits` but they can be customized with the `--format-output` option e.g., `--format-output \"query,target,qaln,taln\"` returns the query and target accessions and the pairwise alignments in tab-separated format. You can choose many different output columns.\n\n| Code | Description |\n| --- | --- |\n|query | Query sequence identifier |\n|target | Target sequence identifier |\n|qca        | Calpha coordinates of the query |\n|tca        | Calpha coordinates of the target |\n|alntmscore | TM-score of the alignment | \n|qtmscore   | TM-score normalized by the query length |\n|ttmscore   | TM-score normalized by the target length |\n|u          | Rotation matrix (computed to by TM-score) |\n|t          | Translation vector (computed to by TM-score) |\n|lddt       | Average LDDT of the alignment |\n|lddtfull   | LDDT per aligned position |\n|prob       | Estimated probability for query and target to be homologous (e.g. being within the same SCOPe superfamily) |\n\nCheck out the [MMseqs2 documentation for additional output format codes](https://github.com/soedinglab/MMseqs2/wiki#custom-alignment-format-with-convertalis).\n\n##### Superpositioned Cα only PDB files\nFoldseek's `--format-mode 5` generates PDB files with all target Cα atoms superimposed onto the query structure based on the aligned coordinates. \nFor each pairwise alignment it will write its own PDB file, so be careful when using this options for large searches. \n\n##### Interactive HTML\nLocally run Foldseek can generate an HTML search result, similar to the one produced by the [webserver](https://search.foldseek.com) by specifying `--format-mode 3`\n\n```\nfoldseek easy-search example/d1asha_ example/ result.html tmp --format-mode 3\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./.github/results.png\" height=\"400\"/\u003e\u003c/p\u003e\n\n#### Important search parameters\n\n| Option              | Category    | Description                                                                                                                            |\n|---------------------|-------------|----------------------------------------------------------------------------------------------------------------------------------------|\n| -s                  | Sensitivity | Adjust sensitivity to speed trade-off; lower is faster, higher more sensitive (fast: 7.5, default: 9.5)                                |\n| --num-iterations    | Sensitivity | Enables iterative search to find more distantly related hits (Default: off). Recommended `--num-iterations 0` optimized version                        |\n| --exhaustive-search | Sensitivity | Skips prefilter and performs an all-vs-all alignment (more sensitive but much slower)                                                  |\n| --max-seqs          | Sensitivity | Adjust the amount of prefilter handed to alignment; increasing it can lead to more hits (default: 1000)                                |\n| -e                  | Sensitivity | List matches below this E-value (range 0.0-inf, default: 0.001); increasing it reports more distant structures                         |\n| --cluster-search     | Sensitivity   | For clustered databases like AFDB50, CATH50 trigger a cluster search: 0: search only representatives (fast), 1: align and report also all members of a cluster (default: 0)                                                           |\n| --alignment-type    | Alignment   | 0: 3Di Gotoh-Smith-Waterman (local, not recommended), 1: TMalign (global, slow), 2: 3Di+AA Gotoh-Smith-Waterman (local, default)       |\n| -c                  | Alignment   | List matches above this fraction of aligned (covered) residues (see --cov-mode) (default: 0.0); higher coverage = more global alignment |\n| --cov-mode          | Alignment   | 0: coverage of query and target, 1: coverage of target, 2: coverage of query                                                           |\n\n| --gpu               | Performance | Enables fast GPU-accelerated ungapped prefilter (`--prefilter-mode 1`) (default: off), ignores `-s`. Use `--gpu 1` to enable.          |\n\n#### Alignment Mode\nBy default, Foldseek uses its local 3Di+AA structural alignment but it also supports realigning hits using the global TMalign as well as rescoring alignments using TMscore. \n\n    foldseek easy-search example/d1asha_ example/ aln tmp --alignment-type 1\n\nIf alignment type is set to tmalign (`--alignment-type 1`), the results will be sorted by the TMscore normalized by query length. The TMscore is used for reporting two fields: the e-value=(qTMscore+tTMscore)/2 and the score=(qTMscore*100). All output fields (e.g., pident, fident, and alnlen) are calculated based on the TMalign alignment.\n\n### Databases \nThe `databases` command downloads pre-generated databases like PDB or AlphaFoldDB.\n    \n    # pdb  \n    foldseek databases PDB pdb tmp \n    # alphafold db\n    foldseek databases Alphafold/Proteome afdb tmp \n\nWe currently support the following databases: \n```\n  Name                   \tType     \tTaxonomy\tUrl\n- Alphafold/UniProt   \tAminoacid\t     yes\thttps://alphafold.ebi.ac.uk/\n- Alphafold/UniProt50 \tAminoacid\t     yes\thttps://alphafold.ebi.ac.uk/\n- Alphafold/Proteome  \tAminoacid\t     yes\thttps://alphafold.ebi.ac.uk/\n- Alphafold/Swiss-Prot\tAminoacid\t     yes\thttps://alphafold.ebi.ac.uk/\n- ESMAtlas30          \tAminoacid\t       -\thttps://esmatlas.com\n- PDB                 \tAminoacid\t     yes\thttps://www.rcsb.org\n```\n\n#### Create custom databases and indexes\nThe target database can be pre-processed by `createdb`. This is useful when searching multiple times against the same set of target structures. \n \n    foldseek createdb example/ targetDB\n    foldseek createindex targetDB tmp  #OPTIONAL generates and stores the index on disk\n    foldseek easy-search example/d1asha_ targetDB aln.m8 tmpFolder\n\n#### Create custom database from protein sequence (FASTA)\nCreate a structural database from FASTA files using the [ProstT5](https://academic.oup.com/nargab/article/6/4/lqae150/7901286) protein language model. It runs by default on CPU and is about 400-4000x compared to predicted structures by [ColabFold](https://github.com/sokrypton/ColabFold).\nHowever, this database will contain only the predicted 3Di structural sequences without additional structural details.\nAs a result, it supports monomer search and clustering, but does not enable features requiring Cα information, such as `--alignment-type 1`, TM-score or LDDT output.\n```\nfoldseek databases ProstT5 weights tmp\nfoldseek createdb db.fasta db --prostt5-model weights\n```\n\nAccelerate inference by one to two magnitudes using GPU(s) (`--gpu 1`) \n\n```\nfoldseek createdb db.fasta db --prostt5-model weights --gpu 1\n```\n- Use the `CUDA_VISIBLE_DEVICES` variable to select the GPU device(s).\n  - `CUDA_VISIBLE_DEVICES=0` to use GPU 0.\n  - `CUDA_VISIBLE_DEVICES=0,1` to use GPUs 0 and 1.\n \n#### Pad database for fast GPU search\nGPU searches require the database to be reformatted, with padding added to each sequence using the `makepaddedseqdb` command. The padded database can be used for both CPU and GPU searches.\n```\n# Prepare the database for GPU search\nfoldseek makepaddedseqdb db db_pad\n# Perform GPU search\nfoldseek search db db_pad result_dir --gpu 1\n```\n\n### Cluster\nThe `easy-cluster` algorithm is designed for structural clustering by assigning structures to a representative protein structure using structural alignment. It accepts input in either as protein structures as PDB/mmCIF or protein sequences as [fasta](#create-custom-database-from-protein-sequence-fasta) format, with support for both flat and gzipped files. By default, easy-cluster generates three output files with the following prefixes: (1) `_clu.tsv`, (2) `_repseq.fasta`, and (3) `_allseq.fasta`. The first file (1) is a [tab-separated](#tab-separated-cluster) file describing the mapping from representative to member, while the second file (2) contains only [representative sequences](#representative-fasta), and the third file (3) includes all [cluster member sequences](#all-member-fasta).\n\n    foldseek easy-cluster example/ res tmp -c 0.9 \n    \n#### Output Cluster\n##### Tab-separated cluster\nThe provided format represents protein structure clustering in a tab-separated, two-column layout (representative and member). Each line denotes a cluster-representative and cluster-member relationship, signifying that the member shares significant structural similarity with the representative, and thus belongs to the same cluster.\n```\nQ0KJ32\tQ0KJ32\nQ0KJ32\tC0W539\nQ0KJ32\tD6KVP9\nE3HQM9\tE3HQM9\nE3HQM9\tF0YHT8\n```\n\n##### Representative fasta\nThe `_repseq.fasta` contains all representative protein sequences of the clustering.\n```\n\u003eQ0KJ32\nMAGA....R\n\u003eE3HQM9\nMCAT...Q\n```\n\n##### All member fasta\nIn the `_allseq.fasta` file all sequences of the cluster are present. A new cluster is marked by two identical name lines of the representative sequence, where the first line stands for the cluster and the second is the name line of the first cluster sequence. It is followed by the fasta formatted sequences of all its members.\n\n```\n\u003eQ0KJ32\t\n\u003eQ0KJ32\nMAGA....R\n\u003eC0W539\nMVGA....R\n\u003eD6KVP9\nMVGA....R\n\u003eE3HQM9\t\n\u003eE3HQM9\nMCAT...Q\n\u003eQ223C0\nMCAR...Q\n```\n\n#### Important cluster parameters\n\n| Option                   | Category    | Description                                                                                                                                 |\n|--------------------------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------|\n| -e                       | Sensitivity | List matches below this E-value (range 0.0-inf, default: 0.001); increasing it reports more distant structures                              |\n| --alignment-type         | Alignment   | 0: 3Di Gotoh-Smith-Waterman (local, not recommended), 1: TMalign (global, slow), 2: 3Di+AA Gotoh-Smith-Waterman (local, default)            |\n| -c                       | Alignment   | List matches above this fraction of aligned (covered) residues (see --cov-mode) (default: 0.0); higher coverage = more global alignment     |\n| --cov-mode               | Alignment   | 0: coverage of query and target, 1: coverage of target, 2: coverage of query                                                                |\n| --min-seq-id             | Alignment   | the minimum sequence identity to be clustered                                                                                               |\n| --tmscore-threshold      | Alignment   | accept alignments with an alignment TMscore \u003e thr                                                                                           |\n| --tmscore-threshold-mode | Alignment   | normalize TMscore by 0: alignment, 1: representative, 2: member length                                                                      |\n| --lddt-threshold         | Alignment   | accept alignments with an alignment LDDT score \u003e thr                                                                                        |\n\n### Multimersearch\nThe `easy-multimersearch` module is designed for querying one or more protein complex (multi-chain) structures (supported input formats: PDB/mmCIF, flat or gzipped) against a target database of protein complex structures. It reports the similarity metrices between the complexes (e.g., the TMscore).\n\n#### Using Multimersearch\nThe examples below use files that can be found in the `example` directory, which is part of the Foldseek repo, if you clone it. \nIf you use the precompiled version of the software, you can download the files directly: [1tim.pdb.gz](https://github.com/steineggerlab/foldseek/raw/master/example/1tim.pdb.gz) and [8tim.pdb.gz](https://github.com/steineggerlab/foldseek/raw/master/example/8tim.pdb.gz).\n\nFor a pairwise alignment of complexes using `easy-multimersearch`, run the following command:\n```\nfoldseek easy-multimersearch example/1tim.pdb.gz example/8tim.pdb.gz result tmpFolder\n```\nFoldseek `easy-multimersearch` can also be used for searching one or more query complexes against a target database: \n```\nfoldseek databases PDB pdb tmp \nfoldseek easy-multimersearch example/1tim.pdb.gz pdb result tmpFolder\n```\n\n#### Multimer Search Output\n##### Tab-separated-complex\nBy default, `easy-multimersearch` reports the output alignment in a tab-separated file.\nThe default output fields are: `query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,complexassignid` but they can be customized with the `--format-output` option e.g., `--format-output \"query,target,complexqtmscore,complexttmscore,complexassignid\"` alters the output to show specific scores and identifiers.\n\n| Code | Description |\n| --- | --- |\n| **Commons** |\n|query | Query sequence identifier |\n|target | Target sequence identifier |\n| **Only for scorecomplex** |\n|complexqtmscore| TM-score of Complex alignment normalized by the query length |\n|complexttmscore| TM-score of Complex alignment normalized by the target length |\n|complexu       | Rotation matrix of Complex alignment (computed to by TM-score) |\n|complext       | Translation vector of Complex alignment (computed to by TM-score) |\n|complexassignid| Index of Complex alignment |\n\n**Example Output:**\n```\n1tim.pdb.gz_A   8tim.pdb.gz_A   0.967   247 8   0   1   247 1   247 5.412E-43   1527    0\n1tim.pdb.gz_B   8tim.pdb.gz_B   0.967   247 8   0   1   247 1   247 1.050E-43   1551    0\n```\n\n##### Complex Report\n`easy-multimersearch` also generates a report (prefixed `_report`), which provides a summary of the inter-complex chain matching, including identifiers, chains, TMscores, rotation matrices, translation vectors, and assignment IDs. The report includes the following fields:\n| Column | Description |\n| --- | --- |\n| 1 | Identifier of the query complex |\n| 2 | Identifier of the target complex |\n| 3 | Comma separated matched chains in the query complex |\n| 4 | Comma separated matched chains in the target complex |\n| 5 | TM score normalized by query length [0-1] |\n| 6 | TM score normalized by target length [0-1] |\n| 7 | Comma separated nine rotation matrix (U) values |\n| 8 | Comma separated three translation vector (T) values |\n| 9 | Complex alignment ID |\n\n**Example Output:**\n```\n1tim.pdb.gz 8tim.pdb.gz A,B A,B 0.98941 0.98941 0.999983,0.000332,0.005813,-0.000373,0.999976,0.006884,-0.005811,-0.006886,0.999959 0.298992,0.060047,0.565875  0\n```\n\n### Multimercluster\nThe `easy-multimercluster` module is designed for multimer-level structural clustering(supported input formats: PDB/mmCIF, flat or gzipped). By default, easy-multimercluster generates three output files with the following prefixes: (1) `_cluster.tsv`, (2) `_rep_seq.fasta` and (3) `_cluster_report`.  The first file (1) is a [tab-separated](#tab-separated-multimercluster) file describing the mapping from representative multimer to member, while the second file (2) contains only [representative sequences](#representative-multimer-fasta). The third file (3) is also a [tab-separated](#filtered-search-result) file describing filtered alignments.\n\nMake sure chain names in PDB/mmcIF files does not contain underscores(_).\n\n    foldseek easy-multimercluster example/ clu tmp --multimer-tm-threshold 0.65 --chain-tm-threshold 0.5 --interface-lddt-threshold 0.65\n\n#### Output MultimerCluster\n##### Tab-separated multimercluster\n```\n5o002\t   5o002\n194l2\t   194l2\n194l2\t   193l2\n10mh121\t 10mh121\n10mh121\t 10mh114\n10mh121\t 10mh119\n```\n##### Representative multimer fasta\n```\n#5o002\n\u003e5o002_A\nSHGK...R\n\u003e5o002_B\nSHGK...R\n#194l2\n\u003e194l2_A0\nKVFG...L\n\u003e194l2_A6\nKVFG...L\n#10mh121\n...\n```\n##### Filtered search result\nThe `_cluster_report` contains `qcoverage, tcoverage, multimer qTm, multimer tTm, interface lddt, ustring, tstring` of alignments after filtering and before clustering. \n```\n5o0f2\t5o0f2\t1.000\t1.000\t1.000\t1.000\t1.000\t1.000,0.000,0.000,0.000,1.000,0.000,0.000,0.000,1.000\t0.000,0.000,0.000\n5o0f2\t5o0d2\t1.000\t1.000\t0.999\t0.992\t1.000\t0.999,0.000,-0.000,-0.000,0.999,-0.000,0.000,0.000,0.999\t-0.004,-0.001,0.084\n5o0f2\t5o082\t1.000\t0.990\t0.978\t0.962\t0.921\t0.999,-0.025,-0.002,0.025,0.999,-0.001,0.002,0.001,0.999\t-0.039,0.000,-0.253\n```\nThe query and target coverages here represent the sum of the coverages of all aligned chains, divided by the total query and target multimer length respectively.\n\n#### Important multimer cluster parameters\n\n| Option                     | Category    | Description                                                                                                                             |\n|----------------------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------|\n| -e                         | Sensitivity | List matches below this E-value (range 0.0-inf, default: 0.001); increasing it reports more distant structures                          |\n| --alignment-type           | Alignment   | 0: 3Di Gotoh-Smith-Waterman (local, not recommended), 1: TMalign (global, slow), 2: 3Di+AA Gotoh-Smith-Waterman (local, default)        |\n| -c                         | Alignment   | List matches above this fraction of aligned (covered) residues (see --cov-mode) (default: 0.0); higher coverage = more global alignment |\n| --cov-mode                 | Alignment   | 0: coverage of query and target (cluster multimers only with same chain numbers), 1: coverage of target, 2: coverage of query           |\n| --multimer-tm-threshold    | Alignment   | accept alignments with multimer alignment TMscore \u003e thr                                                                                 |\n| --chain-tm-threshold       | Alignment   | accept alignments if every single chain TMscore \u003e thr                                                                                   |\n| --interface-lddt-threshold | Alignment   | accept alignments with an interface LDDT score \u003e thr                                                                                    |\n\n## Main Modules\n- `easy-search`       fast protein structure search  \n- `easy-cluster`      fast protein structure clustering  \n- `easy-multimersearch`       fast protein multimer-level structure search  \n- `easy-multimercluster`       fast protein multimer-level structure clustering  \n- `createdb`          create a database from protein structures (PDB,mmCIF, mmJSON)\n- `databases`         download pre-assembled databases\n\n## Examples\n#### Faster Search with GPU Acceleration\nFoldseek's prefilter on a 4090 GPU is four times faster than a 64-core CPU. To use GPU-based ungapped alignment for faster prefiltering, ensure you have a CUDA-enabled GPU and specify the `--gpu` option:\n```\nfoldseek easy-search example/d1asha_ example/ aln tmp --gpu 1 --prefilter-mode 1\n```\n- Use the `CUDA_VISIBLE_DEVICES` variable to select the GPU device(s).\n  - `CUDA_VISIBLE_DEVICES=0` to use GPU 0.\n  - `CUDA_VISIBLE_DEVICES=0,1` to use GPUs 0 and 1.\n\n#### Fast structure search from FASTA input\nProtein sequences can be directly searched without requiring existing protein structures by using [ProstT5](https://academic.oup.com/nargab/article/6/4/lqae150/7901286), which is approximately 400–4000x faster than predicting structures with ColabFold.\nRead more [here](#create-custom-database-from-protein-sequence-fasta).\n```\nfoldseek databases ProstT5 weights tmp\nfoldseek databases PDB pdb tmp\nfoldseek easy-search QUERY.fasta pdb res.m8 tmp --prostt5-model weights\n```\nThe translation with ProstT5 can be accelerated by using GPU(s) (`--gpu 1`) and multiple GPUs can be used by setting the `CUDA_VISIBLE_DEVICES` variable.\n\n\n### Rescore aligments using TMscore\nThe easiest way to get the alignment TMscore normalized by min(alnLen,qLen,targetLen) as well as a rotation matrix is through the following command:\n```\nfoldseek easy-search example/ example/ aln tmp --format-output query,target,alntmscore,u,t\n```\n\nAlternatively, it is possible to compute TMscores for the kind of alignment output (e.g., 3Di+AA) using the following commands: \n```\nfoldseek createdb example/ targetDB\nfoldseek createdb example/ queryDB\nfoldseek search queryDB targetDB aln tmpFolder -a\nfoldseek aln2tmscore queryDB targetDB aln aln_tmscore\nfoldseek createtsv queryDB targetDB aln_tmscore aln_tmscore.tsv\n```\n\nOutput format `aln_tmscore.tsv`: query and target identifiers, TMscore, translation(3) and rotation vector=(3x3)\n\n### Query centered multiple sequence alignment \nFoldseek can output multiple sequence alignments in a3m format using the following commands. \nTo convert a3m to FASTA format, the following script can be used [reformat.pl](https://raw.githubusercontent.com/soedinglab/hh-suite/master/scripts/reformat.pl) (`reformat.pl in.a3m out.fas`).\n\n```\nfoldseek createdb example/ targetDB\nfoldseek createdb example/ queryDB\nfoldseek search queryDB targetDB aln tmpFolder -a\nfoldseek result2msa queryDB targetDB aln msa --msa-format-mode 6\nfoldseek unpackdb msa msa_output --unpack-suffix a3m --unpack-name-mode 0\n```\nFor a non-query centered multiple sequence alignment please check out [Foldmason](https://github.com/steineggerlab/foldmason).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteineggerlab%2Ffoldseek","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsteineggerlab%2Ffoldseek","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteineggerlab%2Ffoldseek/lists"}