{"id":27164860,"url":"https://github.com/steineggerlab/foldmason","last_synced_at":"2025-10-06T11:37:12.440Z","repository":{"id":210646740,"uuid":"724003173","full_name":"steineggerlab/foldmason","owner":"steineggerlab","description":"Multiple Protein Structure Alignment at Scale with FoldMason","archived":false,"fork":false,"pushed_at":"2025-03-28T09:26:24.000Z","size":33246,"stargazers_count":164,"open_issues_count":15,"forks_count":16,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-28T10:32:08.555Z","etag":null,"topics":["bioinformatics","msa","protein-structure"],"latest_commit_sha":null,"homepage":"https://search.foldseek.com/foldmason","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/steineggerlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-27T07:40:31.000Z","updated_at":"2025-03-27T21:07:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"5276ddec-794d-451c-9bd1-fb272c7f5d17","html_url":"https://github.com/steineggerlab/foldmason","commit_stats":null,"previous_names":["steineggerlab/foldmason"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldmason","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldmason/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldmason/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steineggerlab%2Ffoldmason/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/steineggerlab","download_url":"https://codeload.github.com/steineggerlab/foldmason/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247968268,"owners_count":21025795,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","msa","protein-structure"],"created_at":"2025-04-09T02:40:59.613Z","updated_at":"2025-10-06T11:37:12.374Z","avatar_url":"https://github.com/steineggerlab.png","language":"C","funding_links":[],"categories":["Phylogenetics"],"sub_categories":["Software"],"readme":"# FoldMason\nFoldMason is a software tool for constructing accurate multiple alignments from large sets of protein structures.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://github.com/steineggerlab/foldmason/blob/main/.github/foldmason_logo.png\" height=\"250\" /\u003e\u003c/p\u003e\n\n## Publications\n[Gilchrist CLM, Mirdita M, and Steinegger M. Multiple Protein Structure Alignment at Scale with FoldMason. bioRxiv, doi:10.1101/2024.08.01.606130 (2024)](https://www.biorxiv.org/content/10.1101/2024.08.01.606130v1)\n\n[![build workflow](https://github.com/steineggerlab/foldmason/actions/workflows/build.yml/badge.svg)](https://github.com/steineggerlab/foldmason/actions/workflows/build.yml)\n\n# Table of Contents\n\n- [FoldMason](#foldmason)\n- [Webserver](#webserver)\n- [Installation](#installation)\n- [Documentation](#documentation)\n- [Quick Start](#quick-start)\n  - [Multiple alignment](#multiple-alignment)\n    - [Output](#output)\n    - [Important Parameters](#important-parameters)\n  - [Databases](#databases)\n    - [Create Custom Databases and Indexes](#create-custom-databases-and-indexes)\n- [Main Modules](#main-modules)\n- [Examples](#examples)\n\n## Webserver \nAlign your protein structures quickly using our [FoldMason webserver](https://search.foldseek.com/foldmason).\n\n## Installation\n```\n# Linux AVX2 build (check using: cat /proc/cpuinfo | grep avx2)\nwget https://mmseqs.com/foldmason/foldmason-linux-avx2.tar.gz; tar xvzf foldmason-linux-avx2.tar.gz; export PATH=$(pwd)/foldmason/bin/:$PATH\n\n# Linux SSE2 build (check using: cat /proc/cpuinfo | grep sse2)\nwget https://mmseqs.com/foldmason/foldmason-linux-sse2.tar.gz; tar xvzf foldmason-linux-sse2.tar.gz; export PATH=$(pwd)/foldmason/bin/:$PATH\n\n# Linux ARM64 build\nwget https://mmseqs.com/foldmason/foldmason-linux-arm64.tar.gz; tar xvzf foldmason-linux-arm64.tar.gz; export PATH=$(pwd)/foldmason/bin/:$PATH\n\n# MacOS\nwget https://mmseqs.com/foldmason/foldmason-osx-universal.tar.gz; tar xvzf foldmason-osx-universal.tar.gz; export PATH=$(pwd)/foldmason/bin/:$PATH\n\n# Conda installer (Linux and macOS)\nconda install -c conda-forge -c bioconda foldmason\n```\nOther precompiled binaries for ARM64 amd SSE2 are available at [https://mmseqs.com/foldmason](https://mmseqs.com/foldmason).\n\n\u003c!-- ## Documentation\nMany of Foldseek's modules (subprograms) rely on MMseqs2. For more information about these modules, refer to the [MMseqs2 wiki](https://github.com/soedinglab/MMseqs2/wiki). For documentation specific to Foldseek, checkout the Foldseek wiki [here](https://github.com/steineggerlab/foldseek/wiki).\n --\u003e\n\n## Quick start\n\n### Multiple alignment\nThe `easy-msa` module allows you to align multiple query structures formatted in PDB/mmCIF format (flat or gzipped). By default it outputs the alignment as a [FASTA-format file](#fasta-alignment) as well as an interactive [HTML](#interactive-html) output.\n\n```\nfoldmason easy-msa \u003cPDB/mmCIF files\u003e result.fasta tmpFolder --report-mode 1\n```\n\nTo generate the example output on the webserver:\n```\nfoldmason easy-msa ./lib/foldseek/example/d* example.fasta tmpFolder --report-mode 1\n```\n \n#### Output\n##### FASTA alignment\nFoldMason generates alignments in FASTA-format, with both amino acid and 3Di alphabets (`_aa.fa` and `_3di.fa` suffixes, respectively).\n\n##### Interactive HTML\nFoldMason generates a HTML MSA visualisation when using `easy-msa` with `--report-mode 1`. The following will produce `result.fasta` and `result.html`.\n\n```\nfoldmason easy-msa \u003cPDB/mmCIF files\u003e result tmpFolder --report-mode 1\n```\n\nInternally, this happens using the `msa2lddtreport` module.\n\n```\nfoldmason msa2lddtreport myDb result.fa result.html\n```\n\nAdditionally, you can generate a JSON data file which can be loaded into the webserver (`--report-mode 2` for `easy-msa`).\n\n```\nfoldmason msa2lddtjson myDb result.fa result.json\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./.github/html.gif\" height=\"400\"/\u003e\u003c/p\u003e\n\n#### Important parameters\n\n| Option             | Category  | Description                                                                                               |\n|--------------------|-----------|-----------------------------------------------------------------------------------------------------------|\n| `--gap-open`       | Alignment | Gap opening penalty (default: 10)\n| `--gap-extend`     | Alignment | Gap extension penalty (default: 1)\n| `--refine-iters`   | Alignment | Number of refinement iterations to run after initial alignment (default: 0)\n| `--output-mode`    | Alignment | 0: Amino acids, 1: 3Di alphabet (default: 0)\n| `--pair-threshold` | Scoring   | Maximum proportion of gaps in column threshold for LDDT calculation (default: 0.0)\n\n#### Create custom databases and indexes\nThe structure database can be pre-processed by `createdb`. Doing this make sense if they inputs should be aligned multiple times. \n \n```\nfoldmason createdb example/ structureDB\n```\n\n## Main Modules\n- `easy-msa`          multiple alignment workflow from structure files\n- `structuremsa`      multiple alignment from structure database\n- `msa2lddt`          calculate structure-based score (LDDT) of a MSA \n- `refinemsa`         iterative MSA refinement\n\n## Examples\n### Basic MSA workflow\nThe easiest way to use FoldMason is to use the `easy-msa` workflow like so:\n\n```\nfoldmason easy-msa \u003cPDB/mmCIF files\u003e result tmpFolder\n```\n\nBy default, `easy-msa` produces multiple alignments in FASTA format (`result_aa.fa` and `result_3di.fa` for amino acid and 3Di alphabets, respectively),\nas well as a Newick format tree (`result.nw`). This is equivalent to the following sequence of commands:\n\n```\nfoldmason createdb \u003cPDB/mmCIF files\u003e myDb\nfoldmason structuremsa myDb result\n```\n\n`easy-msa` can also compute the average LDDT score of the alignment and generate the interactive HTML visualisation by specifying\n`--report-mode 1`, like so:\n\n```\nfoldmason easy-msa \u003cPDB/mmCIF files\u003e result tmpFolder --report-mode 1\n```\n\nThis is the equivalent of the following sequence of commands:\n\n```\nfoldmason createdb \u003cPDB/mmCIF files\u003e myDb\nfoldmason structuremsa myDb result\nfoldmason msa2lddtreport myDb result_aa.fa result.html --guide-tree result.nw\n```\n\nNote: the generated guide tree is passed to `msa2lddtreport` to display it inside the HTML report.\n\n### Aligning large data sets\nFoldMason can use the clustering capabilities of Foldseek to pre-cluster input structures before alignment by specifying `--precluster`,\nallowing for alignments of large sets of proteins.\n\n```\nfoldmason easy-msa \u003cPDB/mmCIF files\u003e result tmpFolder --precluster\n```\n\n### Computing LDDT of an externally created MSA\nThe `msa2lddt` module computes an average [Local Distance Difference Test (LDDT) score](https://doi.org/10.1093/bioinformatics/btt473)\nover the length of an MSA. This can be done automatically in the `easy-msa` workflow by specifying `--report-mode 1`, but\n`msa2lddt` can be called separately to compute the LDDT of any given alignment, so long as the structures in the MSA are present in the given structure database:\n\n```\nfoldmason msa2lddt myDb result.fa\nfoldmason msa2lddtreport myDb result.fa result.html\n```\n\nAverage MSA LDDT is calculated by computing per-column LDDT scores of every pair of sequences in the MSA, and then averaging them over the length of the MSA.\nSequence comparison order is determined by database keys, thus scores for MSAs produced by different tools are comparable if specifying the same structure database\nin `msa2lddt`.\n\n### MSA Refinement\nThe `refinemsa` module refines an MSA by iteratively splitting and re-aligning it, saving a resulting MSA when an increase in average LDDT is detected.\n\n```\nfoldmason refinemsa myDb result.fasta refined.fasta --refine-iters 1000\n```\n\nRefinement can be run automatically in the `easy-msa` workflow by specifying the `--refine-iters` argument.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteineggerlab%2Ffoldmason","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsteineggerlab%2Ffoldmason","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteineggerlab%2Ffoldmason/lists"}