{"id":21473026,"url":"https://github.com/jasperlinthorst/reveal","last_synced_at":"2025-07-15T08:32:06.388Z","repository":{"id":30007422,"uuid":"33555452","full_name":"jasperlinthorst/reveal","owner":"jasperlinthorst","description":"Graph based multi genome aligner","archived":false,"fork":false,"pushed_at":"2021-09-17T07:24:08.000Z","size":6798,"stargazers_count":44,"open_issues_count":6,"forks_count":3,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-08-09T15:01:11.593Z","etag":null,"topics":["alignment","assembly","genome","gfa","graph","multiple-sequence-alignment","reveal"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jasperlinthorst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-04-07T16:40:14.000Z","updated_at":"2024-06-27T14:25:42.000Z","dependencies_parsed_at":"2022-08-28T01:51:13.064Z","dependency_job_id":null,"html_url":"https://github.com/jasperlinthorst/reveal","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasperlinthorst%2Freveal","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasperlinthorst%2Freveal/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasperlinthorst%2Freveal/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jasperlinthorst%2Freveal/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jasperlinthorst","download_url":"https://codeload.github.com/jasperlinthorst/reveal/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226027841,"owners_count":17562139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","assembly","genome","gfa","graph","multiple-sequence-alignment","reveal"],"created_at":"2024-11-23T10:14:24.185Z","updated_at":"2024-11-23T10:14:25.300Z","avatar_url":"https://github.com/jasperlinthorst.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# REVEAL\n\nREVEAL (REcursiVe Exact-matching ALigner) can be used to (multi) align whole genomes.\n\n## INSTALL\n\nREVEAL is written in Python and C code. To build it, it needs Python version 2.7 and a GCC compiler.\n\nIt uses libdivsufsort for suffix array construction and the probcons code for refinement of the graph.\n\nFurthermore it uses the Python packages networkx (version 2), intervaltree, pysam and matplotlib.\n\nA version of REVEAL can be installed through [pip](https://pypi.org/project/reveal/):\n\n**pip install reveal**\n\nOr through cloning this repository on github and executing the following command:\n\n**python setup.py test install**\n\nTo install without executing the unit tests:\n\n**python setup.py install**\n\nTo install in your user directory:\n\n**python setup.py install --user**\n\n## RUN\n\nTo validate whether everything is correctly installed you can run a test alignment from the tests directory, e.g. by executing the following command:\n\n**reveal align tests/1a.fa tests/1b.fa**\n\nThis will output a shell script that outlines the typical steps to generate some graphs. If you're not interested in changing any parameters or the intermediate steps, you can immediately execute the script by piping it into your shell:\n\n**reveal align tests/1a.fa tests/1b.fa | sh**\n\nIf everything ran correctly, various gfa files should have been produced. Most likely you will be interested in 'prg.unzipped.realigned.gfa'. This file contains a reference graph in GFA format (see [GFA](http://lh3.github.io/2014/07/19/a-proposal-of-the-grapical-fragment-assembly-format/)).\n\nBy default reveal will try to anchor the alignment by simultaneously aligning all genomes, however, if this is unwanted (due to e.g. memory constraints), you can run the following command to generate a shell script that anchors the alignment in a hierarchically way in batches of for instance 5 genomes at a time:\n\n**reveal align tests/1a.fa tests/1b.fa --order=sequential --chunksize=5 | sh**\n\nAll commands in the shell script that are in between the comment lines can be run in parallel in case you're running on a compute cluster.\n\nThere are other subcommands for reveal, for which some are used by the generated shell script:\n\nTo generate an anchor graph using the recursive exact matching approach for more than two sequences you can either call:\n\n**reveal rem tests/1a.fa tests/1b.fa tests/1c.fa**\n\nor by aligning a sequence to an existing gfa graph:\n\n**reveal rem 1a_1b.gfa tests/1c.fa**\n\nor align two graphs:\n\n**reveal rem 1a_1b.gfa 1c_1d.gfa**\n\nImportant parameters for **reveal rem**  are -m and -n. See subcommand help.\n\nWith REVEAL a global alignment between chromosome length assemblies is assumed. To address the issues that follow from draft assemblies, a 'finish' subcommand is supplied that orders and orients contigs/scaffolds with respect to a reference genome and produces pseudo molecules for the draft assembly.\n\n**reveal finish reference.fasta draft.fasta**\n\nTo address large events (like translocations, inversions, but also misassemblies) that prevent a colinear alignment between two genomes, the following command can be used to transform a structurally rearranged (draft) genome such that it conforms to the layout of the reference sequence.\n\n**reveal finish --order=chains reference.fasta draft.fasta**\n\nHave a look at the various parameters, especially: --mineventsize, --minchainsum and -m.\n\nTo obtain a graph-based representation that encodes the original as well as the 'transformed' genome as separate paths through a graph, use:\n\n**reveal transform reference.fasta draft.fasta**\n\nNote that the resulting graphs may contain cycles, but can still be used in subsequent alignments using REVEAL rem, as only the transformed paths that correspond to the reference layout will be used for segmenting the graph. Paths in the graph prefixed with an asterisk (\\*) correspond to the original (non-transformed) input genomes, which are ignored by REVEAL during graph traversal and mainly function as a way to record the structural events that are present in the graph.\n\nTo extract bubbles (a list of source/sink pairs and nodes within the bubble) from a graph run:\n\n**reveal bubbles 1a\u0026#95;1b.gfa**\n\nSimilar to bubbles, but will print the actual varying sequence.\n\n**reveal variants 1a\u0026#95;1b.gfa**\n\nTo output variants to a vcf file:\n\n**reveal variants 1a\u0026#95;1b.gfa --vcf**\n\nTo output statistics with respect to the number of nodes, bubbles, variants, aligned sequence etc.:\n\n**reveal stats 1a\u0026#95;1b.gfa**\n\nTo realign parts of the graph using a basepair resolution multiple sequence alignment method (instead of MUMs):\n\n**reveal refine \\\u003cgraph\\\u003e \\\u003csource-node\\\u003e \\\u003csink-node\\\u003e**\n\nTo realign all bubbles: \n\n**reveal refine \\\u003cgraph\\\u003e --all**\n\nNote that when bubbles are larger than let's say 10000bp, this won't work, so have a look at different filtering options (e.g. --maxsize).\n\nAs the boundaries of Maximal Unique Matches are somewhat greedy, more accurate variant calls are obtained by first 'unzipping' bubbles before applying **reveal refine**. To unzip all bubbles 10bp, run:\n\n**reveal unzip \\\u003cgraph\\\u003e -u10**\n\nTo construct an interactive (-i, for zooming purposes) mumplot of two fasta files:\n\n**reveal plot genome1.fasta genome2.fasta -i**\n\nOr to visualise a graph in a mumplot\n\n**reveal gplot 1a\u0026#95;1b.gfa -i**\n\nNOTE that you need to have matplotlib installed for these commands.\n\nIn case you want to inspect the graph with software like cytoscape or gephi, you can produce a graph in gml format by calling reveal as follows:\n\n**reveal convert prg.gfa**\n\nTo extract a genome/path from the graph:\n\n**reveal extract \\\u003cgraph\\\u003e \\\u003cpathname\\\u003e**\n\nTo extract a subgraph of the graph, to for instance inspect a complex bubble structure:\n\n**reveal subgraph \\\u003cgraph\\\u003e \\\u003cnode1\\\u003e ...  \\\u003cnodeN\\\u003e**\n\nTo merge multiple gfa graphs into a single gfa graph, while maintaining node-id space:\n\n**reveal merge \\\u003cgraph1\\\u003e \\\u003cgraph2\\\u003e ...  \\\u003cgraphN\\\u003e**\n\nOr to do the opposite, split a graph by its connected components:\n\n**reveal split \\\u003cgraph\\\u003e**\n\nFor the rest, most commands should print a help function, when you specify **reveal \\\u003csubcommand\\\u003e -h**\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjasperlinthorst%2Freveal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjasperlinthorst%2Freveal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjasperlinthorst%2Freveal/lists"}