{"id":15060276,"url":"https://github.com/bcgsc/abyss","last_synced_at":"2025-04-08T01:37:19.489Z","repository":{"id":2770991,"uuid":"3769753","full_name":"bcgsc/abyss","owner":"bcgsc","description":":microscope: Assemble large genomes using short reads","archived":false,"fork":false,"pushed_at":"2025-03-27T16:08:46.000Z","size":63867,"stargazers_count":317,"open_issues_count":6,"forks_count":110,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-04-01T00:35:23.319Z","etag":null,"topics":["assembler","bioinformatics","bloom-filter","c-plus-plus","genome","mpi","openmp","scaffold","science"],"latest_commit_sha":null,"homepage":"http://www.bcgsc.ca/platform/bioinfo/software/abyss","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bcgsc.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.bib","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-03-19T23:13:39.000Z","updated_at":"2025-03-17T13:49:44.000Z","dependencies_parsed_at":"2024-03-19T17:52:40.998Z","dependency_job_id":"120ba396-4672-410e-b43e-893308032761","html_url":"https://github.com/bcgsc/abyss","commit_stats":{"total_commits":5507,"total_committers":49,"mean_commits":"112.38775510204081","dds":0.3628109678590884,"last_synced_commit":"5dc06d676b4c2bd51a5f7e38f79eed273bb6b9fa"},"previous_names":[],"tags_count":72,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2Fabyss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2Fabyss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2Fabyss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bcgsc%2Fabyss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bcgsc","download_url":"https://codeload.github.com/bcgsc/abyss/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247761050,"owners_count":20991531,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assembler","bioinformatics","bloom-filter","c-plus-plus","genome","mpi","openmp","scaffold","science"],"created_at":"2024-09-24T22:55:36.756Z","updated_at":"2025-04-08T01:37:19.470Z","avatar_url":"https://github.com/bcgsc.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Release](https://img.shields.io/github/release/bcgsc/abyss.svg)](https://github.com/bcgsc/abyss/releases)\n[![Downloads](https://img.shields.io/github/downloads/bcgsc/abyss/total?logo=github)](https://github.com/bcgsc/abyss/releases/download/2.3.10/abyss-2.3.10.tar.gz)\n[![Conda](https://img.shields.io/conda/dn/bioconda/abyss?label=Conda)](https://anaconda.org/bioconda/abyss)\n[![Issues](https://img.shields.io/github/issues/bcgsc/abyss.svg)](https://github.com/bcgsc/abyss/issues)\n\nABySS\n================================================================================\n\nABySS is a *de novo* sequence assembler intended for short paired-end reads and genomes of all sizes.\n\nPlease [cite our papers](#citation).\n\nContents\n================================================================================\n\n* [Installation](#installation)\n\t* [Install ABySS using Conda](#install-abyss-using-conda-recommended)\n\t* [Install ABySS using Homebrew](#install-abyss-using-homebrew)\n\t* [Install ABySS on Windows](#install-abyss-on-windows)\n* [Dependencies](#dependencies)\n\t* [Dependencies for linked reads](#dependencies-for-linked-reads)\n\t* [Optional dependencies](#optional-dependencies)\n* [Compiling ABySS from source](#compiling-abyss-from-source)\n* [Before starting an assembly](#before-starting-an-assembly)\n* [Modes](#modes)\n\t* [Bloom filter mode](#bloom-filter-mode)\n\t* [MPI mode (legacy)](#mpi-mode-legacy)\n* [Examples](#examples)\n\t* [Assemble a small synthetic data set](#assemble-a-small-synthetic-data-set)\n\t* [Assembling a paired-end library](#assembling-a-paired-end-library)\n\t* [Assembling multiple libraries](#assembling-multiple-libraries)\n\t* [Scaffolding](#scaffolding)\n\t* [Scaffolding with linked reads](#scaffolding-with-linked-reads)\n\t* [Rescaffolding with long sequences](#rescaffolding-with-long-sequences)\n\t* [Assembling using a paired de Bruijn graph](#assembling-using-a-paired-de-bruijn-graph)\n\t* [Assembling a strand-specific RNA-Seq library](#assembling-a-strand-specific-rna-seq-library)\n* [Optimizing the parameters k and kc](#optimizing-the-parameters-k-and-kc)\n* [Running ABySS on a cluster](#running-abyss-on-a-cluster)\n* [Using the DIDA alignment framework](#using-the-dida-alignment-framework)\n* [Assembly Parameters](#assembly-parameters)\n* [ABySS programs](#abyss-programs)\n* [Export to SQLite Database](#export-to-sqlite-database)\n\t* [Database parameters](#database-parameters)\n\t* [Helper programs](#helper-programs)\n* [Citation](#citation)\n* [Related Publications](#related-publications)\n* [Support](#support)\n* [Authors](#authors)\n\nInstallation\n================================================================================\n\n## Install ABySS using Conda (recommended)\n\nIf you have the [Conda](https://docs.conda.io/en/latest/) package manager (Linux, MacOS) installed, run:\n\n\tconda install -c bioconda -c conda-forge abyss\n\nOr you can install ABySS in a dedicated environment:\n\n    conda create -n abyss-env\n    conda activate abyss-env\n    conda install -c bioconda -c conda-forge abyss\n\n## Install ABySS using Homebrew\n\nIf you have the [Homebrew](https://brew.sh) package manager (Linux, MacOS) installed, run:\n\n\tbrew install abyss\n\n## Install ABySS on Windows\n\nInstall [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/) from which you can run Conda or Homebrew installation.\n\nDependencies\n============\n\n## Dependencies for linked reads\n\n- [ARCS](https://github.com/bcgsc/arcs) for scaffolding.\n- [Tigmint](https://github.com/bcgsc/tigmint) for correcting assembly errors.\n\nThese can be installed through Conda:\n\n\tconda install -c bioconda arcs tigmint\n\nOr Homebrew:\n\n\tbrew install brewsci/bio/arcs brewsci/bio/links-scaffolder\n\n## Optional dependencies\n\n- [pigz](https://zlib.net/pigz/) for parallel gzip.\n- [samtools](https://samtools.github.io) for reading BAM files.\n- [zsh](https://sourceforge.net/projects/zsh/) for reporting time and memory usage.\n\nConda:\n\n\tconda install -c bioconda samtools\n\tconda install -c conda-forge pigz zsh\n\nHomebrew:\n\n\tbrew install pigz samtools zsh\n\nCompiling ABySS from source\n================================================================================\n\nWhen compiling ABySS from source the following tools are\nrequired:\n\n* [Autoconf](http://www.gnu.org/software/autoconf)\n* [Automake](http://www.gnu.org/software/automake)\n\nABySS requires a C++ compiler that supports\n[OpenMP](http://www.openmp.org) such as [GCC](http://gcc.gnu.org).\n\nThe following libraries are required:\n\n* [Boost](http://www.boost.org/)\n* [Open MPI](http://www.open-mpi.org)\n* [sparsehash](https://code.google.com/p/sparsehash/)\n* [btllib](https://github.com/bcgsc/btllib)\n\nConda:\n\n\tconda install -c conda-forge boost openmpi\n\tconda install -c bioconda google-sparsehash btllib\n\nIt is also helpful to install the compilers Conda package that automatically passes the correct compiler flags to use the available Conda packages:\n\n\tconda install -c conda-forge compilers\n\nHomebrew:\n\n\tbrew install boost open-mpi google-sparsehash\n\nABySS will receive an error when compiling with Boost 1.51.0 or 1.52.0\nsince they contain a bug. Later versions of Boost compile without error.\n\nTo compile, run the following:\n\n\t./autogen.sh\n\tmkdir build\n\tcd build\n\t../configure --prefix=/path/to/abyss\n\tmake\n\tmake install\n\nYou may also pass the following flags to `configure` script:\n\n\t--with-boost=PATH\n\t--with-mpi=PATH\n\t--with-sqlite=PATH\n\t--with-sparsehash=PATH\n\t--with-btllib=PATH\n\nWhere PATH is the path to the directory containing the corresponding dependencies. This should only be necessary if `configure` doesn't find the dependencies by default. If you are using Conda, PATH would be the path to the Conda installation. SQLite and MPI are optional dependencies.\n\nThe above steps install ABySS at the provided path, in this case `/path/to/abyss`.\nNot specifying `--prefix` would install in `/usr/local`, which requires\nsudo privileges when running `make install`.\n\nABySS requires a modern compiler such as GCC 6 or greater. If you have multiple\nversions of GCC installed, you can specify a different compiler:\n\n\t../configure CC=gcc-10 CXX=g++-10\n\nWhile OpenMPI is assumed by default you can switch to LAM/MPI or MPICH\nusing:\n\n        ../configure --enable-lammpi\n        ../configure --enable-mpich\n\nThe default maximum k-mer size is 192 and may be decreased to reduce\nmemory usage or increased at compile time. This value must be a\nmultiple of 32 (i.e. 32, 64, 96, 128, etc):\n\n\t../configure --enable-maxk=160\n\nIf you encounter compiler warnings that are not critical, you can allow the compilation to continue:\n\n\t../configure --disable-werror\n\nTo run ABySS, its executables should be found in your `PATH` environment variable. If you\ninstalled ABySS in `/opt/abyss`, add `/opt/abyss/bin` to your `PATH`:\n\n\tPATH=/opt/abyss/bin:$PATH\n\nBefore starting an assembly\n================================================================================\n\nABySS stores temporary files in `TMPDIR`, which is `/tmp` by default on most systems. If your default temporary disk volume is too small, set `TMPDIR` to a larger volume, such as `/var/tmp` or your home directory.\n\n\texport TMPDIR=/var/tmp\n\nModes\n================================================================================\n\n## Bloom filter mode\n\nThe recommended mode of running ABySS is the Bloom filter mode. Specifying\nthe Bloom filter memory budget with the `B` parameter enables this mode, which can\nreduce memory consumption by ten-fold compared to the MPI mode. `B` may be specified\nwith unit suffixes 'k' (kilobytes), 'M' (megabytes), 'G' (gigabytes). If no units\nare specified bytes are assumed. Internally, the Bloom filter assembler allocates\nthe entire memory budget (`B * 8/9`) to a Counting Bloom filter, and an additional\n(`B/9`) memory to another Bloom filter that is used to track k-mers that have previously\nbeen included in contigs.\n\nA good value for `B` depends on a number of factors, but primarily on the\ngenome being assembled. A general guideline is:\n\nP. glauca (~20Gbp): `B=500G`\nH. sapiens (~3.1Gbp): `B=50G`\nC. elegans (~101Mbp): `B=2G`\n\nFor other genome sizes, the value for `B` can be interpolated. Note that\nthere is no downside to using larger than necessary `B` value, except for\nthe memory required. To make sure you have selected a correct `B` value,\ninspect the standard error log of the assembly process and ensure that the\nreported FPR value under `Counting Bloom filter stats` is 5% or less. This\nrequires using verbosity level 1 with `v=-v` option.\n\n## MPI mode (legacy)\n\nThis mode is legacy and we do not recommend running ABySS with it.\nTo run ABySS in the MPI mode, you need to specify the `np` parameter,\nwhich specifies the number of processes to use for the parallel MPI job.\nWithout any MPI configuration, this will allow you to use multiple cores\non a single machine. To use multiple machines for assembly, you must create\na `hostfile` for `mpirun`, which is described in the `mpirun` man page.\n\n*Do not* run `mpirun -np 8 abyss-pe`. To run ABySS with 8 threads, use\n`abyss-pe np=8`. The `abyss-pe` driver script will start the MPI\nprocess, like so: `mpirun -np 8 ABYSS-P`.\n\nThe paired-end assembly stage is multithreaded, but must run on a\nsingle machine. The number of threads to use may be specified with the\nparameter `j`. The default value for `j` is the value of `np`.\n\nExamples\n================================================================================\n\n## Assemble a small synthetic data set\n\n\twget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.4/test-data.tar.gz\n\ttar xzvf test-data.tar.gz\n\tabyss-pe k=25 name=test B=1G \\\n\t\tin='test-data/reads1.fastq test-data/reads2.fastq'\n\nCalculate assembly contiguity statistics:\n\n\tabyss-fac test-unitigs.fa test-contigs.fa test-scaffolds.fa\n\n## Assembling a paired-end library\n\nTo assemble paired reads in two files named `reads1.fa` and\n`reads2.fa` into contigs in a file named `ecoli-contigs.fa`, run the\ncommand:\n\n\tabyss-pe name=ecoli k=96 B=2G in='reads1.fa reads2.fa'\n\nThe parameter `in` specifies the input files to read, which may be in\nFASTA, FASTQ, qseq, export, SRA, SAM or BAM format and compressed with\ngz, bz2 or xz and may be tarred. The assembled contigs will be stored\nin `${name}-contigs.fa` and the scaffolds will be stored in `${name}-scaffolds.fa`.\n\nA pair of reads must be named with the suffixes `/1` and `/2` to\nidentify the first and second read, or the reads may be named\nidentically. The paired reads may be in separate files or interleaved\nin a single file.\n\nReads without mates should be placed in a file specified by the\nparameter `se` (single-end). Reads without mates in the paired-end\nfiles will slow down the paired-end assembler considerably during the\n`abyss-fixmate` stage.\n\n## Assembling multiple libraries\n\nThe distribution of fragment sizes of each library is calculated\nempirically by aligning paired reads to the contigs produced by the\nsingle-end assembler, and the distribution is stored in a file with\nthe extension `.hist`, such as `ecoli-3.hist`. The N50 of the\nsingle-end assembly must be well over the fragment-size to obtain an\naccurate empirical distribution.\n\nHere's an example scenario of assembling a data set with two different\nfragment libraries and single-end reads. Note that the names of the libraries\n(`pea` and `peb`) are arbitrary.\n\n * Library `pea` has reads in two files,\n   `pea_1.fa` and `pea_2.fa`.\n * Library `peb` has reads in two files,\n   `peb_1.fa` and `peb_2.fa`.\n * Single-end reads are stored in two files, `se1.fa` and `se2.fa`.\n\nThe command line to assemble this example data set is:\n\n\tabyss-pe k=96 B=2G name=ecoli lib='pea peb' \\\n\t\tpea='pea_1.fa pea_2.fa' peb='peb_1.fa peb_2.fa' \\\n\t\tse='se1.fa se2.fa'\n\nThe empirical distribution of fragment sizes will be stored in two\nfiles named `pea-3.hist` and `peb-3.hist`. These files may be\nplotted to check that the empirical distribution agrees with the\nexpected distribution. The assembled contigs will be stored in\n`${name}-contigs.fa` and the scaffolds will be stored in `${name}-scaffolds.fa`.\n\n## Scaffolding\n\nLong-distance mate-pair libraries may be used to scaffold an assembly.\nSpecify the names of the mate-pair libraries using the parameter `mp`.\nThe scaffolds will be stored in the file `${name}-scaffolds.fa`.\nHere's an example of assembling a data set with two paired-end\nlibraries and two mate-pair libraries. Note that the names of the libraries\n(`pea`, `peb`, `mpa`, `mpb`) are arbitrary.\n\n\tabyss-pe k=96 B=2G name=ecoli lib='pea peb' mp='mpc mpd' \\\n\t\tpea='pea_1.fa pea_2.fa' peb='peb_1.fa peb_2.fa' \\\n\t\tmpc='mpc_1.fa mpc_2.fa' mpd='mpd_1.fa mpd_2.fa'\n\nThe mate-pair libraries are used only for scaffolding and do not\ncontribute towards the consensus sequence.\n\n## Scaffolding with linked reads\n\nABySS can scaffold using linked reads from 10x Genomics Chromium. The barcodes must first be extracted from the read sequences and added to the `BX:Z` tag of the FASTQ header, typically using the `longranger basic` command of [Long Ranger](https://support.10xgenomics.com/genome-exome/software/overview/welcome) or [EMA preproc](https://github.com/arshajii/ema#readme). The linked reads are used to correct assembly errors, which requires that [Tigmint](https://github.com/bcgsc/tigmint). The linked reads are also used for scaffolding, which requires [ARCS](https://github.com/bcgsc/arcs). See [Dependencies](#dependencies) for installation instructions.\n\nABySS can combine paired-end, mate-pair, and linked-read libraries. The `pe` and `lr` libraries will be used to build the de Bruijn graph. The `mp` libraries will be used for paired-end/mate-pair scaffolding. The `lr` libraries will be used for misassembly correction using Tigmint and scaffolding using ARCS.\n\n\tabyss-pe k=96 B=2G name=hsapiens \\\n\t\tpe='pea' pea='lra.fastq.gz' \\\n\t\tmp='mpa' mpa='lra.fastq.gz' \\\n\t\tlr='lra' lra='lra.fastq.gz'\n\nABySS performs better with a mixture of paired-end, mate-pair, and linked reads, but it is possible to assemble only linked reads using ABySS, though this mode of operation is experimental.\n\n\tabyss-pe k=96 name=hsapiens lr='lra' lra='lra.fastq.gz'\n\n## Rescaffolding with long sequences\n\nLong sequences such as RNA-Seq contigs can be used to rescaffold an\nassembly. Sequences are aligned using BWA-MEM to the assembled\nscaffolds. Additional scaffolds are then formed between scaffolds that\ncan be linked unambiguously when considering all BWA-MEM alignments.\n\nSimilar to scaffolding, the names of the datasets can be specified with\nthe `long` parameter. These scaffolds will be stored in the file\n`${name}-long-scaffs.fa`. The following is an example of an assembly with PET, MPET and an RNA-Seq assembly. Note that the names of the libraries are arbitrary.\n\n\tabyss-pe k=96 B=2G name=ecoli lib='pe1 pe2' mp='mp1 mp2' long='longa' \\\n\t\tpe1='pe1_1.fa pe1_2.fa' pe2='pe2_1.fa pe2_2.fa' \\\n\t\tmp1='mp1_1.fa mp1_2.fa' mp2='mp2_1.fa mp2_2.fa' \\\n\t\tlonga='longa.fa'\n\n## Assembling using a paired de Bruijn graph\n\nAssemblies may be performed using a _paired de Bruijn graph_ instead\nof a standard de Bruijn graph. In paired de Bruijn graph mode, ABySS\nuses _k-mer pairs_ in place of k-mers, where each k-mer pair consists of\ntwo equal-size k-mers separated by a fixed distance. A k-mer pair\nis functionally similar to a large k-mer spanning the breadth of the k-mer\npair, but uses less memory because the sequence in the gap is not stored.\nTo assemble using paired de Bruijn graph mode, specify both individual\nk-mer size (`K`) and k-mer pair span (`k`). For example, to assemble E.\ncoli with a individual k-mer size of 16 and a k-mer pair span of 96:\n\n\tabyss-pe name=ecoli K=16 k=96 in='reads1.fa reads2.fa'\n\nIn this example, the size of the intervening gap between k-mer pairs is\n64 bp (96 - 2\\*16). Note that the `k` parameter takes on a new meaning\nin paired de Bruijn graph mode. `k` indicates kmer pair span in\npaired de Bruijn graph mode (when `K` is set), whereas `k` indicates\nk-mer size in standard de Bruijn graph mode (when `K` is not set).\n\n## Assembling a strand-specific RNA-Seq library\n\nStrand-specific RNA-Seq libraries can be assembled such that the\nresulting unitigs, contigs and scaffolds are oriented correctly with\nrespect to the original transcripts that were sequenced. In order to\nrun ABySS in strand-specific mode, the `SS` parameter must be used as\nin the following example:\n\n\tabyss-pe name=SS-RNA B=2G k=96 in='reads1.fa reads2.fa' SS=--SS\n\nThe expected orientation for the read sequences with respect to the\noriginal RNA is RF. i.e. the first read in a read pair is always in\nreverse orientation.\n\nOptimizing the parameters k and kc\n================================================================================\n\nIt is standard practice when running ABySS to run multiple assemblies\nto find the optimal values for the `k` and `kc` parameters. `k` determines\nthe k-mer size in the de Bruijn Graph, and `kc` is the k-mer minimum coverage\nmultiplicity cutoff, which filters out erroneous k-mers. The range in which `k`\nshould be tested depends on the read size and read coverage.\n\nA rough indicator is, for 2x150bp reads and 40x coverage, the right `k` value is often around 70 to 90. For 2x250bp reads and 40x coverage, the right value might be around 110 to 140.\n\nFor `kc`, 2 is most often a good value, but can go as high as 4.\n\nThe following shell snippet will assemble for `k` values 2 and 3, and every eighth value of `k` from 50 to 90. In the end, we calculate the contiguity statistics, as a proxy for identifying the optimal assembly. Other metrics can be used, as needed.\n\n\tfor kc in 2 3; do\n\t\tfor k in `seq 50 8 90`; do\n\t\t\tmkdir k${k}-kc${kc}\n\t\t\tabyss-pe -C k${k}-kc${kc} name=ecoli B=2G k=$k kc=$kc in=../reads.fa\n\t\tdone\n\tdone\n\tabyss-fac k*/ecoli-scaffolds.fa\n\nThe default maximum value for `k` is 192. This limit may be changed at\ncompile time using the `--enable-maxk` option of configure. It may be\ndecreased to 32 to decrease memory usage or increased to larger values.\n\nRunning ABySS on a cluster\n================================================================================\n\nABySS integrates well with cluster job schedulers, such as:\n\n * SGE (Sun Grid Engine)\n * Portable Batch System (PBS)\n * Load Sharing Facility (LSF)\n * IBM LoadLeveler\n\nFor example, to submit an array of jobs to assemble every eighth value of\n`k` between 50 and 90 using 64 processes for each job:\n\n\tqsub -N ecoli -pe openmpi 64 -t 50-90:8 \\\n\t\t\u003c\u003c\u003c'mkdir k$SGE_TASK_ID \u0026\u0026 abyss-pe -C k$SGE_TASK_ID in=/data/reads.fa'\n\nUsing the DIDA alignment framework\n================================================================================\n\nABySS supports the use of DIDA (Distributed Indexing Dispatched Alignment),\nan MPI-based framework for computing sequence alignments in parallel across\nmultiple machines. The DIDA software must be separately downloaded and\ninstalled from http://www.bcgsc.ca/platform/bioinfo/software/dida. In\ncomparison to the standard ABySS alignment stages which are constrained\nto a single machine, DIDA offers improved performance and the ability to\nscale to larger targets. Please see the DIDA section of the abyss-pe man\npage (in the `doc` subdirectory) for details on usage.\n\nAssembly Parameters\n================================================================================\n\nParameters of the driver script, `abyss-pe`\n\n * `a`: maximum number of branches of a bubble [`2`]\n * `b`: maximum length of a bubble (bp) [`\"\"`]\n * `B`: Bloom filter size (e.g. \"100M\")\n * `c`: minimum mean k-mer coverage of a unitig [`sqrt(median)`]\n * `d`: allowable error of a distance estimate (bp) [`6`]\n * `e`: minimum erosion k-mer coverage [`round(sqrt(median))`]\n * `E`: minimum erosion k-mer coverage per strand [1 if `sqrt(median) \u003e 2` else 0]\n * `G`: genome size, used to calculate NG50\n * `H`: number of Bloom filter hash functions [`4`]\n * `j`: number of threads [`2`]\n * `k`: size of k-mer (when `K` is not set) or the span of a k-mer pair (when `K` is set)\n * `kc`: minimum k-mer count threshold for Bloom filter assembly [`2`]\n * `K`: the length of a single k-mer in a k-mer pair (bp)\n * `l`: minimum alignment length of a read (bp) [`40`]\n * `m`: minimum overlap of two unitigs (bp) [`0` (interpreted as `k - 1`) if `mp` is provided or if `k\u003c=50`, otherwise `50`]\n * `n`: minimum number of pairs required for building contigs [`10`]\n * `N`: minimum number of pairs required for building scaffolds [`15-20`]\n * `np`: number of MPI processes [`1`]\n * `p`: minimum sequence identity of a bubble [`0.9`]\n * `q`: minimum base quality [`3`]\n * `s`: minimum unitig size required for building contigs (bp) [`1000`]\n * `S`: minimum contig size required for building scaffolds (bp) [`100-5000`]\n * `t`: maximum length of blunt contigs to trim [`k`]\n * `v`: use `v=-v` for verbose logging, `v=-vv` for extra verbose\n * `x`: spaced seed (Bloom filter assembly only)\n * `lr_s`: minimum contig size required for building scaffolds with linked reads (bp) [`S`]\n * `lr_n`: minimum number of barcodes required for building scaffolds with linked reads [`10`]\n\nEnvironment variables\n================================================================================\n\n`abyss-pe` configuration variables may be set on the command line or from the environment, for example with `export k=96`. It can happen that `abyss-pe` picks up such variables from your environment that you had not intended, and that can cause trouble. To troubleshoot that situation, use the `abyss-pe env` command to print the values of all the `abyss-pe` configuration variables:\n\n\tabyss-pe env [options]\n\nABySS programs\n================================================================================\n\n`abyss-pe` is a driver script implemented as a Makefile. Any option of\n`make` may be used with `abyss-pe`. Particularly useful options are:\n\n * `-C dir`, `--directory=dir`\n   Change to the directory `dir` and store the results there.\n * `-n`, `--dry-run`\n   Print the commands that would be executed, but do not execute\n   them.\n\n`abyss-pe` uses the following programs, which must be found in your\n`PATH`:\n\n * `ABYSS`: de Bruijn graph assembler\n * `ABYSS-P`: parallel (MPI) de Bruijn graph assembler\n * `AdjList`: find overlapping sequences\n * `DistanceEst`: estimate the distance between sequences\n * `MergeContigs`: merge sequences\n * `MergePaths`: merge overlapping paths\n * `Overlap`: find overlapping sequences using paired-end reads\n * `PathConsensus`: find a consensus sequence of ambiguous paths\n * `PathOverlap`: find overlapping paths\n * `PopBubbles`: remove bubbles from the sequence overlap graph\n * `SimpleGraph`: find paths through the overlap graph\n * `abyss-fac`: calculate assembly contiguity statistics\n * `abyss-filtergraph`: remove shim contigs from the overlap graph\n * `abyss-fixmate`: fill the paired-end fields of SAM alignments\n * `abyss-map`: map reads to a reference sequence\n * `abyss-scaffold`: scaffold contigs using distance estimates\n * `abyss-todot`: convert graph formats and merge graphs\n * `abyss-rresolver`: resolve repeats using short reads\n\nThis [flowchart](https://github.com/bcgsc/abyss/blob/master/doc/flowchart.pdf) shows the ABySS assembly pipeline and its intermediate files.\n\nExport to SQLite Database\n================================================================================\n\nABySS has a built-in support for SQLite database to export log values into a SQLite file and/or `.csv` files at runtime.\n\n## Database parameters\nOf `abyss-pe`:\n * `db`: path to SQLite repository file [`$(name).sqlite`]\n * `species`: name of species to archive [ ]\n * `strain`: name of strain to archive [ ]\n * `library`: name of library to archive [ ]\n\nFor example, to export data of species 'Ecoli', strain 'O121' and library 'pea' into your SQLite database repository named '/abyss/test.sqlite':\n\n\tabyss-pe db=/abyss/test.sqlite species=Ecoli strain=O121 library=pea [other options]\n\n## Helper programs\n\nFound in your `path`:\n\n * `abyss-db-txt`: create a flat file showing entire repository at a glance\n * `abyss-db-csv`: create `.csv` table(s) from the repository\n\nUsage:\n\n    abyss-db-txt /your/repository\n    abyss-db-csv /your/repository program(s)\n\nFor example,\n\n\tabyss-db-txt repo.sqlite\n\tabyss-db-csv repo.sqlite DistanceEst\n\tabyss-db-csv repo.sqlite DistanceEst abyss-scaffold\n\tabyss-db-csv repo.sqlite --all\n\nCitation\n================================================================================\n\n## [ABySS 2.0](http://doi.org/10.1101/gr.214346.116)\n\nShaun D Jackman, Benjamin P Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, René L Warren, and Inanc Birol (2017).\n**ABySS 2.0: Resource-efficient assembly of large genomes using a Bloom filter**.\n*Genome research*, 27(5), 768-777.\n[doi:10.1101/gr.214346.116](http://doi.org/10.1101/gr.214346.116)\n\n## [ABySS](http://genome.cshlp.org/content/19/6/1117)\n\nSimpson, Jared T., Kim Wong, Shaun D. Jackman, Jacqueline E. Schein,\nSteven JM Jones, and Inanc Birol (2009).\n**ABySS: a parallel assembler for short read sequence data**.\n*Genome research*, 19(6), 1117-1123.\n[doi:10.1101/gr.089532.108](http://dx.doi.org/10.1101/gr.089532.108)\n\nRelated Publications\n================================================================================\n\n## [RResolver](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04790-z)\n\nVladimir Nikolić, Amirhossein Afshinfard, Justin Chu, Johnathan Wong, Lauren Coombe, Ka Ming Nip, René L. Warren \u0026 Inanç Birol (2022).\n**RResolver: efficient short-read repeat resolution within ABySS**.\n*BMC Bioinformatics* 23, Article number: 246 (2022).\n[doi:10.1186/s12859-022-04790-z](https://doi.org/10.1186/s12859-022-04790-z)\n\n## [Trans-ABySS](http://www.nature.com/nmeth/journal/v7/n11/abs/nmeth.1517.html)\n\nRobertson, Gordon, Jacqueline Schein, Readman Chiu, Richard Corbett,\nMatthew Field, Shaun D. Jackman, Karen Mungall, et al (2010).\n**De novo assembly and analysis of RNA-seq data**.\n*Nature methods*, 7(11), 909-912.\n[doi:10.1038/10.1038/nmeth.1517](http://dx.doi.org/10.1038/nmeth.1517)\n\n## [ABySS-Explorer](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5290690)\n\nNielsen, Cydney B., Shaun D. Jackman, Inanc Birol, and Steven JM Jones (2009).\n**ABySS-Explorer: visualizing genome sequence assemblies**.\n*IEEE Transactions on Visualization and Computer Graphics*, 15(6), 881-888.\n[doi:10.1109/TVCG.2009.116](http://dx.doi.org/10.1109/TVCG.2009.116)\n\nSupport\n================================================================================\n\n[Create a new issue on GitHub.](https://github.com/bcgsc/abyss/issues)\n\n[Ask a question on Biostars.](https://www.biostars.org/tag/abyss/)\n\nSubscribe to the [ABySS mailing list](http://groups.google.com/group/abyss-users), \u003cabyss-users@googlegroups.com\u003e.\n\nFor questions related to transcriptome assembly, contact the [Trans-ABySS mailing list](http://groups.google.com/group/trans-abyss), \u003ctrans-abyss@googlegroups.com\u003e.\n\nAuthors\n================================================================================\n\n+ **[Shaun Jackman](http://sjackman.ca)** - [GitHub/sjackman](https://github.com/sjackman) - [@sjackman](https://twitter.com/sjackman)\n+ **Tony Raymond** - [GitHub/traymond](https://github.com/traymond)\n+ **Ben Vandervalk** - [GitHub/benvvalk ](https://github.com/benvvalk)\n+ **Jared Simpson** - [GitHub/jts](https://github.com/jts)\n+ **Johnathan Wong** - [GitHub/jowong4](https://github.com/jowong4)\n+ **Vladimir Nikolić** - [GitHub/vlad0x00](https://github.com/vlad0x00)\n\nSupervised by [**Dr. Inanc Birol**](http://www.bcgsc.ca/faculty/inanc-birol).\n\nCopyright 2016-present Canada's Michael Smith Genome Sciences Centre\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbcgsc%2Fabyss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbcgsc%2Fabyss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbcgsc%2Fabyss/lists"}