{"id":15060521,"url":"https://github.com/gabaldonlab/redundans","last_synced_at":"2025-08-20T09:30:37.878Z","repository":{"id":29996713,"uuid":"33544197","full_name":"Gabaldonlab/redundans","owner":"Gabaldonlab","description":"Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.","archived":false,"fork":false,"pushed_at":"2023-12-15T13:32:22.000Z","size":66804,"stargazers_count":135,"open_issues_count":3,"forks_count":20,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-12-19T09:08:10.790Z","etag":null,"topics":["assembled-contigs","assembly","bioinformatics","closing","contigs","docker-image","fasta","gap","genome-assembly","genomics","heterozygous","mate-pairs","paired-end","pipeline","polymorphic","python","scaffolding"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Gabaldonlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2015-04-07T13:17:59.000Z","updated_at":"2024-12-16T00:36:17.000Z","dependencies_parsed_at":"2023-12-15T14:45:51.755Z","dependency_job_id":null,"html_url":"https://github.com/Gabaldonlab/redundans","commit_stats":{"total_commits":560,"total_committers":12,"mean_commits":"46.666666666666664","dds":"0.34285714285714286","last_synced_commit":"84d45e1774d9ef3a3974dcee919ac8da195a7ef5"},"previous_names":["lpryszcz/redundans"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gabaldonlab%2Fredundans","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gabaldonlab%2Fredundans/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gabaldonlab%2Fredundans/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Gabaldonlab%2Fredundans/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Gabaldonlab","download_url":"https://codeload.github.com/Gabaldonlab/redundans/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230408171,"owners_count":18220974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assembled-contigs","assembly","bioinformatics","closing","contigs","docker-image","fasta","gap","genome-assembly","genomics","heterozygous","mate-pairs","paired-end","pipeline","polymorphic","python","scaffolding"],"created_at":"2024-09-24T22:59:48.235Z","updated_at":"2025-08-20T09:30:37.871Z","avatar_url":"https://github.com/Gabaldonlab.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Latest Version](https://img.shields.io/github/v/tag/gabaldonlab/redundans?label=Latest%20Version)\n[![BioConda Install](https://img.shields.io/conda/dn/bioconda/redundans.svg?style=flag\u0026label=BioConda%20install)](https://anaconda.org/bioconda/redundans/)\n[![GitHub Clones](https://img.shields.io/badge/dynamic/json?color=success\u0026label=Clone\u0026query=count\u0026url=https://gist.githubusercontent.com/Dfupa/0fc9a42bb90e0b6c38767174bce725db/raw/clone.json\u0026logo=github)](https://github.com/MShawon/github-clone-count-badge)\n![Docker Pulls](https://img.shields.io/docker/pulls/cgenomics/redundans)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000\u0026logo=docker)](https://hub.docker.com/repository/docker/cgenomics/redundans)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://cloud.sylabs.io/library/cgenomics/redundans/redundans)\n### Table of Contents\n- **[Redundans](#redundans)**  \n  - **[Prerequisites](#prerequisites)**  \n    - **[Official conda package](#official-conda-package)**\n    - **[UNIX installer](#unix-installer)**    \n    - **[Docker image](#docker-image)** \n  - **[Running the pipeline](#running-the-pipeline)**  \n    - **[Parameters](#parameters)**  \n    - **[Test run](#test-run)**  \n  - **[Support](#support)**\n  - **[Citation](#citation)**  \n\n# Redundans\n  \nRedundans pipeline assists **an assembly of heterozygous genomes**.  \nProgram takes [as input](#parameters) **assembled contigs**, **sequencing libraries** and/or **reference sequence** and returns **scaffolded homozygous genome assembly**. Final assembly should be **less fragmented** and with total **size smaller** than the input contigs. In addition, Redundans will automatically **close the gaps** resulting from genome assembly or scaffolding. \n\n\u003cimg align=\"right\" src=\"/docs/redundans_flowchart.png\"\u003e\n\nThe pipeline consists of several steps (modules):  \n1. **de novo contig assembly** (optional if no contigs are given)\n2. **redundancy reduction**: detection and selective removal of redundant contigs from an initial *de novo* assembly \n3. **scaffolding**: joining of genome fragments using paired-end reads, mate-pairs, long reads and/or reference chromosomes \n4. **gap closing**: filling the gaps after scaffolding using paired-end and/or mate-pair reads \n\nRedundans is: \n- **fast** \u0026 **lightweight**, multi-core support and memory-optimised, \nso it can be run even on the laptop for small-to-medium size genomes\n- **flexible** toward many sequencing technologies (Illumina, 454, Sanger, PacBio \u0026 Nanopore) and library types (paired-end, mate pairs, fosmids, long reads)\n- **modular**: every step can be omitted or replaced by other tools\n- **reliable**: it has been already used to improve genome assemblies varying in size (several Mb to several Gb) and complexity (fungal, animal \u0026 plants)\n\nFor more information have a look at the [documentation](/docs), [poster](/docs/poster.pdf), [publication](http://nar.oxfordjournals.org/content/44/12/e113), [test dataset](/test) or [manual](http://bit.ly/redundans_manual). \n\n## Prerequisites\nRedundans uses several programs (all except the interpreters and its submodules are provided within this repository):\n\n| Resource | Type | Version |\n| :--- | :--- | :--- |\n| [Python](https://www.python.org/downloads) | Language interpreter | \u003c3.11, ≥ 3.8 |\n| [Platanus](http://platanus.bio.titech.ac.jp/?page_id=14) | Genome assembler | v1.2.4 |\n| [Miniasm](https://github.com/lh3/miniasm) | Genome assembler | ≥ v0.3 (r179) |\n| [Minimap2](https://github.com/lh3/minimap2) | Sequence aligner | ≥ v2.2.4 (r1122) |\n| [LAST](http://last.cbrc.jp/) | Sequence aligner | ≥ v800 |\n| [BWA](http://bio-bwa.sourceforge.net/) | Sequence aligner | ≥ v0.7.12 |\n| [SNAP aligner](https://github.com/amplab/snap) | Sequence aligner | v2.0.1 |\n| [SSPACE3](http://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE) | Scaffolding software | v3.0 |\n| [GapCloser](http://sourceforge.net/projects/soapdenovo2/files/GapCloser/) | Gapclosing software | v1.12 |\n| [GFAstats](https://github.com/vgl-hub/gfastats) | Stats software | ≥ v1.3.6 |\n| [Meryl](https://github.com/marbl/meryl) | K-mer counter software | ≥ v1.3 |\n| [Merqury](https://github.com/marbl/merqury) | Assembly evaluation software | v1.3 |\n| [k8](https://github.com/attractivechaos/k8/) | Javascript shell based on V8 | v0.2.4 |\n| [R](https://cran.r-project.org/) | Language interpreter | ≥ 3.6 |\n| [ggplot2](https://ggplot2.tidyverse.org)| R package | ≥ 3.3.2 |\n| [scales](https://cran.r-project.org/web/packages/scales/) | R package | ≥ 3.3.2 |\n| [argparser](https://cran.r-project.org/web/packages/argparser/) | R package | ≥ 3.6 |\n\n#### WARNING: Some of the third-party requirements are provided precompiled in x86_64 and not readily available for other architectures.\n\nOn most Linux distros, the installation should be as easy as:\n```\ngit clone --recursive https://github.com/Gabaldonlab/redundans/\ncd redundans \u0026\u0026 bin/.compile.sh\n```\n\nIf it fails, make sure you have below dependencies installed: \n- Perl [SSPACE3]\n- make, gcc \u0026 g++ [BWA, GFAstats, Miniasm \u0026 LAST] ie. `sudo apt-get install make gcc g++`\n- [zlib including zlib.h headers](http://zlib.net/) [BWA] ie. `sudo apt-get install zlib1g-dev`\n- [R  ≥ 3.6](https://cran.r-project.org/) and additional packages [ggplot2, scales, argparser] for plotting the Merqury results.\n- optionally for additional plotting `numpy` and `matplotlib` ie. `sudo -H pip install -U matplotlib numpy`\n\nFor user convenience, we provide [UNIX installer](#unix-installer) and [Docker image](#docker-image), that can be used instead of manually installation.  \n\n## Official conda package\nIf you are familiar with conda, this will be by far the easiest way of installing redundans: \n```bash\n# create new Python3 \u003e=3.8,\u003c3.11 environment\nconda create -n redundans python=3.10\n# activate it\nconda activate redundans\n# and install redundans\nconda install -c bioconda redundans \n```\n\n\n## UNIX installer\nUNIX installer will automatically fetch, compile and configure Redundans together with all dependencies.\nIt should work on all modern Linux systems, given Python \u003e= 3, commonly used programmes (ie. wget, make, curl, git, perl, gcc, g++, ldconfig) and libraries (zlib including zlib.h) are installed. \n```bash\nsource \u003c(curl -Ls https://github.com/Gabaldonlab/redundans/raw/master/INSTALL.sh)\n```\n\n### Docker image\nFirst, you  need to install [docker](https://www.docker.com/): `wget -qO- https://get.docker.com/ | sh`  \nThen, you can run the test example by executing: \n```bash\n#Pull the image directly from dockerhub\ndocker pull cgenomics/redundans:latest\n\n# process the data inside the image - all data will be lost at the end\ndocker run -it -w /root/src/redundans cgenomics/redundans:latest ./redundans.py -v -i test/{600,5000}_{1,2}.fq.gz -f test/contigs.fa -o test/run1\n\n# if you wish to process local files, you need to mount the volume with -v\n## make sure you are in redundans repo directory (containing test/ directory)\ndocker run -v `pwd`/test:/test:rw -it cgenomics/redundans:latest /root/src/redundans/redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1\n```\n### Singularity image\nRedundans is also supported by singularity. First install [singularity](https://docs.sylabs.io/guides/3.1/user-guide/quick_start.html#quick-installation-steps).\n\nYou can either use our singularity repository to build the image or to build the image out of the docker image. Then run the first example:\n```\n#Pull from the singularity repo\nsingularity pull --arch amd64 library://cgenomics/redundans/redundans:2.0\n\n#Build the image based on the docker repo\nsingularity build redundans.sif docker://cgenomics/redundans\n\n#Use exec instead of run to account for shell-based wildcarsds * and ?\nsingularity exec redundans.sif bash -c \"/root/src/redundans/redundans.py -v -i /root/src/redundans/test/*_?.fq.gz -f /root/src/redundans/test/contigs.fa -o /tmp/run1\"\n```\n\n## Running the pipeline\nRedundans input consists of any combination of:\n- **assembled contigs** (FastA)\n- **paired-end and/or mate pairs reads** (FastQ*)\n- **long reads** (FastQ/FastA*) - both PacBio and Nanopore are supported for the scaffolding\n- and/or **reference chromosomes/contigs** (FastA). \n* gzipped files are also accepted.\n\nRedundans will return **homozygous genome assembly** in `scaffolds.filled.fa` (FastA). It will also report the heterozygous contigs that were not discarded during the reduction step.\nIn addition, the program reports [statistics for every pipeline step](/test#summary-statistics), including number of contigs that were removed, GC content, N50, N90 and size of gap regions. \n\n### Parameters\nFor the user convenience, Redundans is equipped with a wrapper that **automatically estimates run parameters** and executes all steps/modules.\nYou should specify some sequencing libraries (FastA/FastQ) or reference sequence (FastA) in order to perform scaffolding. \nIf you don't specify `-f` **contigs** (FastA), Redundans will assemble contigs *de novo*, but you'll have to provide **paired-end and/or mate pairs reads** (FastQ).\nMost of the pipeline parameters can be adjusted manually (default values are given in square brackets []):  \n**HINT**: If you run fails, you may try to resume it, by adding `--resume` parameter. \n- General options:\n```\n  -h, --help            show this help message and exit\n  -v, --verbose         verbose\n  --version             show program's version number and exit\n  -i FASTQ, --fastq FASTQ\n                        FASTQ PE / MP files\n  -f FASTA, --fasta FASTA\n                        FASTA file with contigs / scaffolds\n  -o OUTDIR, --outdir OUTDIR\n                        output directory [redundans]\n  -t THREADS, --threads THREADS\n                        no. of threads to run [4]\n  --resume              resume previous run\n  --log LOG             output log to [stderr]\n  --nocleaning\n```\nDe novo assembly options:\n```\n  -m MEM, --mem MEM     max memory to allocate (in GB) for the Platanus assembler [2]\n  --tmp TMP             tmp directory [/tmp]\n```\n- Reduction options:\n```\n  --identity IDENTITY   min. identity [0.51]\n  --overlap OVERLAP     min. overlap  [0.80]\n  --minLength MINLENGTH\n                        min. contig length [200]\n  --minimap2reduce      Use minimap2 for the initial and final Reduction step. Recommended for input assembled contigs from long reads or larger contigs using --preset[asm5] by default. By default LASTal is used for Reduction.\n  -x INDEX, --index INDEX\n                        Minimap2 parameter -i used to load at most INDEX target bases into RAM for indexing [4G]. It has to be provided as a string INDEX ending with k/K/m/M/g/G.\n  --noreduction         Skip reduction\n```\n- Short-read scaffolding options:\n```\n  -j JOINS, --joins JOINS\n                        min pairs to join contigs [5]\n  -a LINKRATIO, --linkratio LINKRATIO\n                        max link ratio between two best contig pairs [0.7]\n  --limit LIMIT         align subset of reads [0.2]\n  -q MAPQ, --mapq MAPQ  min mapping quality [10]\n  --iters ITERS         iterations per library [2]\n  --noscaffolding       Skip short-read scaffolding\n  -b, --usebwa          use bwa mem for alignment [use snap-aligner]\n```\n- Long-read scaffolding options:\n```\n  -l LONGREADS, --longreads LONGREADS\n                        FastQ/FastA files with long reads\n  -s, --populateScaffolds\n                        Run populateScaffolds mode for long read scaffolding, else generate a dirty assembly for reference-based scaffolding. Not recommended for highly repetitive genomes. Default False.\n  --minimap2scaffold         Use Minimap2 for aligning long reads. Preset usage dependant on file name convention (case insensitive): ont, nanopore, pb, pacbio, hifi, hi_fi, hi-fi. ie: s324_nanopore.fq.gz. Else it uses LASTal.\n```\n- Reference-based scaffolding options:\n```\n  -r REFERENCE, --reference REFERENCE\n                        reference FastA file\n  --norearrangements    high identity mode (rearrangements not allowed)\n  -p PRESET, --preset PRESET\n                        Preset option for Minimap2-based Reduction and/or Reference-based scaffolding. Possible options: asm5 (5 percent sequence divergence), asm10 (10 percent sequence divergence) and asm20(20 percent sequence divergence). Default [asm5]\n```\n- Gap closing options:\n```\n  --nogapclosing                        \n```\n- Meryl and Merqury options:\n```\n  --runmerqury           Run meryldb and merqury for assembly kmer multiplicity stats. [False] by default.\n  -k KMER, --kmer KMER  K-mer size for meryl [21]\n```\n\nRedundans is **extremely flexible**. All steps of the pipeline can be ommited using: `--noreduction`, `--noscaffolding`, `--nogapclosing` and/or `--runmerqury` parameters. \n\n### Test run\nTo run the test example, execute: \n```bash\n./redundans.py -v -i test/*_?.fq.gz -f test/contigs.fa -o test/run1\n\n#Test it using minimap2 for the reduction step, increasing performance for large genomes\n./redundans.py -v -i test/*_?.fq.gz -f test/contigs.fa --minimap2reduce -o test/run2\n\n# if your run failed for any reason, you can try to resume it\nrm test/run1/_sspace.2.1.filled.fa\n./redundans.py -v -i test/*_?.fq.gz -f test/contigs.fa -o test/run1 --resume\n\n# if you have no contigs assembled, just run without `-f`\n./redundans.py -v -i test/*_?.fq.gz -o test/run.denovo\n```\n\nNote, the **order of libraries (`-i/--input`) is not important**, as long as `read1` and `read2` from each library are given one after another \ni.e. `-i 600_1.fq.gz 600_2.fq.gz 5000_1.fq.gz 5000_2.fq.gz` would be interpreted the same as `-i 5000_1.fq.gz 5000_2.fq.gz 600_1.fq.gz 600_2.fq.gz`.\n\nYou can play with **any combination of inputs** ie. paired-end, mate pairs, long reads and / or reference-based scaffolding as well as selecting minimap2 for each step or default to LASTal, for example:\n```bash\n# reduction, scaffolding with paired-end, mate pairs and long reads used to generate a miniasm assembly to do reference-based scaffolding, and gap closing with paired-end and mate pairs using as an aligner minimap2\n./redundans.py -v -i test/*_?.fq.gz -l test/nanopore.fa.gz -f test/contigs.fa -o test/run_short_long_ref --minimap2scaffold\n\n# reduction, scaffolding with paired-end, mate pairs and long reads, and gap closing with paired-end and mate pairs using populateScaffolds method using as aligner minimap2\n./redundans.py -v -i test/*_?.fq.gz -l test/pacbio.fq.gz test/nanopore.fa.gz -f test/contigs.fa -o test/run_short_long_populatescaffold --minimap2scaffold --populateScaffolds\n\n# scaffolding and gap closing with paired-end and mate pairs (no reduction)\n./redundans.py -v -i test/*_?.fq.gz -f test/contigs.fa -o test/run_short-scaffolding-closing --noreduction\n\n# reduction, reference-based scaffolding and gap closing with paired-end reads (--noscaffolding disables only short-read scaffolding)\n./redundans.py -v -i test/600_?.fq.gz -r test/ref.fa -f test/contigs.fa -o test/run_ref_pe-closing --noscaffolding\n```\n\nFor more details have a look in [test directory](/test). \n\n## Support \nIf you have any issues or doubts check [documentation](/docs) and [FAQ (Frequently Asked Questions)](/docs#faq). \nYou may want also to sign to [our forum](https://groups.google.com/d/forum/redundans).\n\n## Citation\nLeszek P. Pryszcz and Toni Gabaldón (2016) Redundans: an assembly pipeline for highly heterozygous genomes. NAR. [doi: 10.1093/nar/gkw294](http://nar.oxfordjournals.org/content/44/12/e113)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabaldonlab%2Fredundans","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgabaldonlab%2Fredundans","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgabaldonlab%2Fredundans/lists"}