{"id":37231304,"url":"https://github.com/ratschlab/metagraph","last_synced_at":"2026-01-15T03:42:52.178Z","repository":{"id":37236819,"uuid":"65392849","full_name":"ratschlab/metagraph","owner":"ratschlab","description":"Scalable annotated de Bruijn graphs for DNA indexing, alignment, and assembly","archived":false,"fork":false,"pushed_at":"2026-01-12T12:08:33.000Z","size":78795,"stargazers_count":215,"open_issues_count":22,"forks_count":26,"subscribers_count":16,"default_branch":"master","last_synced_at":"2026-01-12T19:35:42.562Z","etag":null,"topics":["alignment","assembly","dna","graph","metagenomics","search"],"latest_commit_sha":null,"homepage":"http://metagraph.ethz.ch","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ratschlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":"COPYRIGHT","agents":null,"dco":null,"cla":null}},"created_at":"2016-08-10T15:13:50.000Z","updated_at":"2026-01-12T12:05:00.000Z","dependencies_parsed_at":"2023-02-19T20:30:30.522Z","dependency_job_id":"368c0b40-0999-454a-826b-6615f4daa923","html_url":"https://github.com/ratschlab/metagraph","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/ratschlab/metagraph","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ratschlab%2Fmetagraph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ratschlab%2Fmetagraph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ratschlab%2Fmetagraph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ratschlab%2Fmetagraph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ratschlab","download_url":"https://codeload.github.com/ratschlab/metagraph/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ratschlab%2Fmetagraph/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28442321,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-15T00:55:22.719Z","status":"online","status_checked_at":"2026-01-15T02:00:08.019Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","assembly","dna","graph","metagenomics","search"],"created_at":"2026-01-15T03:42:51.621Z","updated_at":"2026-01-15T03:42:52.170Z","avatar_url":"https://github.com/ratschlab.png","language":"C++","readme":"# Metagenome Graph Project\n\n[![GitHub release (latest by date)](https://img.shields.io/github/v/release/ratschlab/metagraph)](https://github.com/ratschlab/metagraph/releases)\n[![bioconda downloads](https://img.shields.io/conda/dn/bioconda/metagraph?color=blue)](https://bioconda.github.io/recipes/metagraph/README.html)\n[![install with conda](https://img.shields.io/badge/install%20with-conda-brightgreen.svg?style=flat)](#conda)\n[![install with docker](https://img.shields.io/badge/install%20with-docker-brightgreen)](#docker)\n[![install from source](https://img.shields.io/badge/install%20from-source-lightgrey)](#install-from-sources)\n[![documentation](https://img.shields.io/badge/-online%20docs-grey)](https://metagraph.ethz.ch/static/docs/index.html)\n\nMetaGraph is a tool for scalable construction of annotated genome graphs and sequence-to-graph alignment.\n\nThe default index representations in MetaGraph are extremely scalable and support building graphs with trillions of nodes and millions of annotation labels.\nAt the same time, the provided workflows and their careful implementation, combined with low-level optimizations of the core data structures, enable exceptional query and alignment performance.\n\n#### Main features:\n* Large-scale indexing of sequences\n* [Python API](https://metagraph.ethz.ch/static/docs/api.html) for querying in the server mode\n* Encoding [**k-mer counts**](https://metagraph.ethz.ch/static/docs/quick_start.html#index-k-mer-counts) (e.g., expression values) and [**k-mer coordinates**](https://metagraph.ethz.ch/static/docs/quick_start.html#index-k-mer-coordinates) in source sequences (e.g., for lossless encoding of genomes)\n* **Sequence alignment** against very large annotated graphs (sub-k seeding allows using arbitrarily short seeds)\n* Scalable cleaning of very large de Bruijn graphs (to remove sequencing errors)\n* Support for custom alphabets (e.g., {A,C,G,T,N} or amino acids)\n* Algorithms for [differential assembly](https://metagraph.ethz.ch/static/docs/sequence_assembly.html#differential-assembly)\n\n#### Design choices in MetaGraph:\n* Use of succinct data structures and efficient representation schemes for extremely high scalability\n* Algorithmic choices that work efficiently with succinct data structures (e.g., always prefer batched operations)\n* Modular support of different graph and annotation representations\n* Use of generic and extensible interfaces to support adding custom index representations / algorithms with little code overhead.\n\n## Documentation\nOnline documentation is available at https://metagraph.ethz.ch/static/docs/index.html. Offline sources are [here](metagraph/docs/source).\n\n## Install\n\n### Conda\n\nInstall the [latest release](https://github.com/ratschlab/metagraph/releases/latest) on Linux or Mac OS X with Anaconda:\n\n```\nconda install -c bioconda -c conda-forge metagraph\n```\n\n### Docker\n\nIf docker is available on the system, immediately get started with\n\n```\ndocker pull ghcr.io/ratschlab/metagraph:master\ndocker run -v ${HOME}:/mnt ghcr.io/ratschlab/metagraph:master \\\n    metagraph build -v -k 10 -o /mnt/transcripts_1000 /mnt/transcripts_1000.fa\n```\nand replace `${HOME}` with a directory on the host system to map it under `/mnt` in the container.\n\nBy default, it executes the binary compiled for the `DNA` alphabet {A,C,G,T}.\nTo run the binary compiled for the `DNA5` or `Protein` alphabet, just replace `metagraph` with `metagraph_DNA5` or `metagraph_Protein`, respectively, e.g.:\n```\ndocker run -v ${HOME}:/mnt ghcr.io/ratschlab/metagraph:master \\\n    metagraph_Protein build -v -k 10 -o /mnt/graph /mnt/protein.fa\n```\n\nOne can see that running MetaGraph with docker is very easy. Also, the following command (or similar) may be handy to see what directory is mounted in the container:\n```\ndocker run -v ${HOME}:/mnt ghcr.io/ratschlab/metagraph:master ls /mnt\n```\n\nFor more complex workflows, consider running docker in the interactive mode:\n```\n$ docker run -it --entrypoint /bin/bash -v ${HOME}:/mnt ghcr.io/ratschlab/metagraph:master\n\nroot@5c42291cc9cf:/# ls /mnt/\nroot@5c42291cc9cf:/# metagraph --version\n```\n\nAll different versions of the container image are listed [here](https://github.com/ratschlab/metagraph/pkgs/container/metagraph).\n\n### Install From Sources\n\nTo compile from source (e.g., for builds with custom alphabet or other configurations), see [documentation online](https://metagraph.ethz.ch/static/docs/installation.html#install-from-source).\n\n\n## Typical workflow\n1. Build de Bruijn graph from Fasta files, FastQ files, or [KMC k-mer counters](https://github.com/refresh-bio/KMC/):\\\n`./metagraph build`\n2. Annotate graph using the column compressed annotation:\\\n`./metagraph annotate`\n3. Transform the built annotation to a different annotation scheme:\\\n`./metagraph transform_anno`\n4. Query annotated graph\\\n`./metagraph query`\n\n### Example\n```\nDATA=\"../tests/data/transcripts_1000.fa\"\n\n./metagraph build -k 12 -o transcripts_1000 $DATA\n\n./metagraph annotate -i transcripts_1000.dbg --anno-filename -o transcripts_1000 $DATA\n\n./metagraph query -i transcripts_1000.dbg -a transcripts_1000.column.annodbg $DATA\n\n./metagraph stats -a transcripts_1000.column.annodbg transcripts_1000.dbg\n```\n\n### Print usage\n`./metagraph`\n\n### Build graph\n\n* #### Simple build\n```bash\n./metagraph build -v --parallel 30 -k 20 --mem-cap-gb 10 \\\n                        -o \u003cGRAPH_DIR\u003e/graph \u003cDATA_DIR\u003e/*.fasta.gz \\\n2\u003e\u00261 | tee \u003cLOG_DIR\u003e/log.txt\n```\n\n* #### Build with disk swap (use to limit the RAM usage)\n```bash\n./metagraph build -v --parallel 30 -k 20 --mem-cap-gb 10 --disk-swap \u003cGRAPH_DIR\u003e \\\n                        -o \u003cGRAPH_DIR\u003e/graph \u003cDATA_DIR\u003e/*.fasta.gz \\\n2\u003e\u00261 | tee \u003cLOG_DIR\u003e/log.txt\n```\n\n#### Build from k-mers filtered with KMC\n```bash\nK=20\n./KMC/kmc -ci5 -t4 -k$K -m5 -fm \u003cFILE\u003e.fasta.gz \u003cFILE\u003e.cutoff_5 ./KMC\n./metagraph build -v -p 4 -k $K --mem-cap-gb 10 -o graph \u003cFILE\u003e.cutoff_5.kmc_pre\n```\n\n### Annotate graph\n```bash\n./metagraph annotate -v --anno-type row --fasta-anno \\\n                           -i primates.dbg \\\n                           -o primates \\\n                           ~/fasta_zurich/refs_chimpanzee_primates.fa\n```\n\n### Convert annotation to Multi-BRWT\n1) Cluster columns\n```bash\n./metagraph transform_anno -v --linkage --greedy \\\n                           -o linkage.txt \\\n                           --subsample R \\\n                           -p NCORES \\\n                           primates.column.annodbg\n```\nRequires `N*R/8 + 6*N^2` bytes of RAM, where `N` is the number of columns and `R` is the number of rows subsampled.\n\n2) Construct Multi-BRWT\n```bash\n./metagraph transform_anno -v -p NCORES --anno-type brwt \\\n                           --linkage-file linkage.txt \\\n                           -o primates \\\n                           --parallel-nodes V \\\n                           -p NCORES \\\n                           primates.column.annodbg\n```\nRequires `M*V/8 + Size(BRWT)` bytes of RAM, where `M` is the number of rows in the annotation and `V` is the number of nodes merged concurrently.\n\n### Query graph\n```bash\n./metagraph query -v -i \u003cGRAPH_DIR\u003e/graph.dbg \\\n                        -a \u003cGRAPH_DIR\u003e/annotation.column.annodbg \\\n                        --min-kmers-fraction-label 0.8 --labels-delimiter \", \" \\\n                        query_seq.fa\n```\n\n### Align to graph\n```bash\n./metagraph align -v -i \u003cGRAPH_DIR\u003e/graph.dbg query_seq.fa\n```\n\n### Assemble sequences\n```bash\n./metagraph assemble -v \u003cGRAPH_DIR\u003e/graph.dbg \\\n                        -o assembled.fa \\\n                        --unitigs\n```\n\n### Assemble differential sequences\n```bash\n./metagraph assemble -v \u003cGRAPH_DIR\u003e/graph.dbg \\\n                        --unitigs \\\n                        -a \u003cGRAPH_DIR\u003e/annotation.column.annodbg \\\n                        --diff-assembly-rules diff_assembly_rules.json \\\n                        -o diff_assembled.fa\n```\n\nSee [`metagraph/tests/data/example.diff.json`](metagraph/tests/data/example.diff.json) and [`metagraph/tests/data/example_simple.diff.json`](metagraph/tests/data/example_simple.diff.json) for sample files.\n\n### Get stats\nStats for graph\n```bash\n./metagraph stats graph.dbg\n```\nStats for annotation\n```bash\n./metagraph stats -a annotation.column.annodbg\n```\nStats for both\n```bash\n./metagraph stats -a annotation.column.annodbg graph.dbg\n```\n\n## Developer Notes\n\n### Build a docker container\n\nSimply run `docker build .`\n\n### Makefile\n\nThe `Makefile` in the top level source directory can be used to build and test `metagraph` more conveniently. The following\narguments are supported:\n* `env`: environment in which to compile/run (`\"\"`: on the host, `docker`: in a docker container)\n* `alphabet`: compile metagraph for a certain alphabet (e.g. `DNA` or `Protein`, default `DNA`)\n* `additional_cmake_args`: additional arguments to pass to cmake.\n\nExamples:\n\n```\n# compiles metagraph in a docker container for the `DNA` alphabet\nmake build-metagraph env=docker alphabet=DNA\n```\n\n### Update and create a new release\n\nCreating a new version release is done in three steps:\n\n1. Update package.json and set the version\n2. Add a tag with that new version\n3. Make a new release on github\n\n## License\nMetagraph is distributed under the GPLv3 License (see LICENSE).\nPlease find further information in the AUTHORS and COPYRIGHTS files.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fratschlab%2Fmetagraph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fratschlab%2Fmetagraph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fratschlab%2Fmetagraph/lists"}