An open API service indexing awesome lists of open source software.

https://github.com/maxibor/mgenottate


https://github.com/maxibor/mgenottate

Last synced: 5 months ago
JSON representation

Awesome Lists containing this project

README

          

# maxibor/mgenottate

**Mgenottate**: (Meta) GENOme ANNOTTATion

>Takes genomes as an input, compute completion/contamination QC metrics with Busco, dereplicates with dREP, and provides a summary table in the end.

```mermaid
graph LR
a[genome fasta]--> b[busco quality assesment]
b --> c[dRep genome ANI dereplication]
c --> d[MMSeqs2 genome taxonomic_annotation]
d --> e[Summary table]
```

## Usage

```bash
nextflow run maxibor/mgenottate -profile {conda,docker,singularity} --input genome_sheet.csv --busco_db path/to/busco/db --mmseqs2_db_path path/to/mmseqs/db
```

## Input/output options

Define where the pipeline should find input data and save output data.

| Parameter | Description | Type | Default | Required | Hidden |
|-----------|-----------|-----------|-----------|-----------|-----------|
| `input` | Path to comma-separated file containing information about the samples and genomes See below for more infos. | `string` | | True | |
| `outdir` | The output directory where the results will be saved. You have to use absolute paths to storage on Cloud

> An example input file can be found in [tests/data/test_samplesheet.csv](tests/data/test_samplesheet.csv)

It contains 2 columns, the first one being the sample name to which a genome belog, and the second one the path to a genome in fasta file (compressed or not).

## Databases
| Parameter | Description | Type | Default | Required | Hidden |
|-----------|-----------|-----------|-----------|-----------|-----------|
| `busco_db` | Path to busco database | `string` | | True | |
| `skip_tax_annotation` | Skip taxonomic annotation | `bool` | False | False | |
| `mmseqs2_db_name` | Name of mmseqs prebuilt database (required if not db path is provided) | `string` | | | |
| `mmseqs2_db_path` | Path to mmseqs database (required if no db name is provided)| `string` | | | |

> See [MMSeqs2 wiki](https://github.com/soedinglab/MMseqs2/wiki#downloading-databases) for valid MMSeqs DB names.

## Tools options

| Parameter | Description | Type | Default | Required | Hidden |
|-----------|-----------|-----------|-----------|-----------|-----------|
| `busco_mode` | Busco mode HelpOne of genome, proteins, or transcriptome|
`string` | genome | | |
| `busco_lineage` | Busco lineage. auto for automatic lineage selection | `string` | auto | | |
| `drep_ani` | drep secondary clustering ANI threshold | `number` | 0.99 | | |
| `mmseqs2_mem` | Amount of memory for MMSeqs2 (in Gb) | `string` | '14G' | | |
| `mmseqs2_search_type` | 2 (translated), 3 (nucleotide) or 4 (translated nucleotide backtrace) | `integer` | null(auto) | | |

## Max job request options

Set the top limit for requested resources for any single job.

| Parameter | Description | Type | Default | Required | Hidden |
|-----------|-----------|-----------|-----------|-----------|-----------|
| `max_cpus` | Maximum number of CPUs that can be requested for any single job. HelpUse to set
an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`
| `integer` | 16
| | True |
| `max_memory` | Maximum amount of memory that can be requested for any single job. HelpUse to
set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory
'8.GB'`
| `string` | 128.GB | | True |
| `max_time` | Maximum amount of time that can be requested for any single job. HelpUse to set
an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time
'2.h'`
| `string` | 240.h | | True |

## Generic options

Less common options for the pipeline, typically set in a config file.

| Parameter | Description | Type | Default | Required | Hidden |
|-----------|-----------|-----------|-----------|-----------|-----------|
| `help` | Display help text. | `boolean` | | | True |
| `version` | Display version and exit. | `boolean` | | | True |
| `publish_dir_mode` | Method used to save pipeline results to output directory. HelpThe
Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the
pipeline what method should be used to move these files. See [Nextflow
docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.
| `string` | copy | | True |
| `monochrome_logs` | Do not use coloured log outputs. | `boolean` | | | True |