{"id":41034941,"url":"https://github.com/maxibor/mgenottate","last_synced_at":"2026-01-22T10:31:47.731Z","repository":{"id":246204078,"uuid":"817811636","full_name":"maxibor/mgenottate","owner":"maxibor","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-20T13:11:14.000Z","size":4200,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-17T09:42:45.903Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxibor.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATIONS.md","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-20T13:43:03.000Z","updated_at":"2025-06-01T13:51:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"983b6eac-7e14-4bb6-8756-04e8cc0073fd","html_url":"https://github.com/maxibor/mgenottate","commit_stats":null,"previous_names":["maxibor/mgenottate"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/maxibor/mgenottate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fmgenottate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fmgenottate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fmgenottate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fmgenottate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxibor","download_url":"https://codeload.github.com/maxibor/mgenottate/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxibor%2Fmgenottate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28661874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T01:17:37.254Z","status":"online","status_checked_at":"2026-01-22T02:00:07.137Z","response_time":144,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-22T10:31:46.683Z","updated_at":"2026-01-22T10:31:47.725Z","avatar_url":"https://github.com/maxibor.png","language":"Nextflow","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n# maxibor/mgenottate\n\n**Mgenottate**: (Meta) GENOme ANNOTTATion\n\n\u003eTakes genomes as an input, compute completion/contamination QC metrics with Busco, dereplicates with dREP, and provides a summary table in the end.\n\n```mermaid\ngraph LR\n    a[genome fasta]--\u003e b[busco quality assesment]\n    b --\u003e c[dRep genome ANI dereplication]\n    c --\u003e d[MMSeqs2 genome taxonomic_annotation]\n    d --\u003e e[Summary table]\n```\n\n## Usage\n\n```bash\nnextflow run maxibor/mgenottate -profile {conda,docker,singularity} --input genome_sheet.csv --busco_db path/to/busco/db --mmseqs2_db_path path/to/mmseqs/db\n```\n\n## Input/output options                                                                                                            \n                                                                                                                                   \nDefine where the pipeline should find input data and save output data.                                                             \n                                                                                                                                   \n| Parameter | Description | Type | Default | Required | Hidden |                                                                   \n|-----------|-----------|-----------|-----------|-----------|-----------|                                                          \n| `input` | Path to comma-separated file containing information about the samples and genomes See below for more infos. | `string` |  | True |  |                                                                             \n| `outdir` | The output directory where the results will be saved. You have to use absolute paths to storage on Cloud                                                                                           \n\n\n\u003e An example input file can be found in [tests/data/test_samplesheet.csv](tests/data/test_samplesheet.csv)\n\nIt contains 2 columns, the first one being the sample name to which a genome belog, and the second one the path to a genome in fasta file (compressed or not).\n\n## Databases                                                                                                                                                                                                                                 \n| Parameter | Description | Type | Default | Required | Hidden |                                                                   \n|-----------|-----------|-----------|-----------|-----------|-----------|                                                          \n| `busco_db` | Path to busco database | `string` |  | True |  |                                                                    \n| `skip_tax_annotation` | Skip taxonomic annotation | `bool` | False | False |  |      \n| `mmseqs2_db_name` | Name of mmseqs prebuilt database (required if not db path is provided)  | `string` |  |  |  |                            \n| `mmseqs2_db_path` | Path to mmseqs database (required if no db name is provided)| `string` |  |  |  |                                                                \n\n\u003e See [MMSeqs2 wiki](https://github.com/soedinglab/MMseqs2/wiki#downloading-databases) for valid MMSeqs DB names.\n\n## Tools options\n\n| Parameter | Description | Type | Default | Required | Hidden |                                                                   \n|-----------|-----------|-----------|-----------|-----------|-----------|                                                          \n| `busco_mode` | Busco mode \u003cdetails\u003e\u003csummary\u003eHelp\u003c/summary\u003e\u003csmall\u003eOne of genome, proteins, or transcriptome\u003c/small\u003e\u003c/details\u003e|    \n`string` | genome |  |  |                                                                                                          \n| `busco_lineage` | Busco lineage. auto for automatic lineage selection | `string` | auto |  |  |                                  \n| `drep_ani` | drep secondary clustering ANI threshold | `number` | 0.99 |  |  |\n| `mmseqs2_mem` | Amount of memory for MMSeqs2 (in Gb) | `string` | '14G' |  |  |                                                       \n| `mmseqs2_search_type` | 2 (translated), 3 (nucleotide) or 4 (translated nucleotide backtrace) | `integer` | null(auto) |  |  |                                                        \n\n## Max job request options                                                                                                         \n                                                                                                                                   \nSet the top limit for requested resources for any single job.                                                                      \n                                                                                                                                   \n| Parameter | Description | Type | Default | Required | Hidden |                                                                   \n|-----------|-----------|-----------|-----------|-----------|-----------|                                                          \n| `max_cpus` | Maximum number of CPUs that can be requested for any single job. \u003cdetails\u003e\u003csummary\u003eHelp\u003c/summary\u003e\u003csmall\u003eUse to set  \nan upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`\u003c/small\u003e\u003c/details\u003e| `integer` | 16\n|  | True |                                                                                                                        \n| `max_memory` | Maximum amount of memory that can be requested for any single job. \u003cdetails\u003e\u003csummary\u003eHelp\u003c/summary\u003e\u003csmall\u003eUse to  \nset an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory   \n'8.GB'`\u003c/small\u003e\u003c/details\u003e| `string` | 128.GB |  | True |                                                                           \n| `max_time` | Maximum amount of time that can be requested for any single job. \u003cdetails\u003e\u003csummary\u003eHelp\u003c/summary\u003e\u003csmall\u003eUse to set  \nan upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time           \n'2.h'`\u003c/small\u003e\u003c/details\u003e| `string` | 240.h |  | True |                                                                             \n                                                                                                                                   \n## Generic options                                                                                                                 \n                                                                                                                                   \nLess common options for the pipeline, typically set in a config file.                                                              \n                                                                                                                                   \n| Parameter | Description | Type | Default | Required | Hidden |                                                                   \n|-----------|-----------|-----------|-----------|-----------|-----------|                                                          \n| `help` | Display help text. | `boolean` |  |  | True |                                                                           \n| `version` | Display version and exit. | `boolean` |  |  | True |                                                                 \n| `publish_dir_mode` | Method used to save pipeline results to output directory. \u003cdetails\u003e\u003csummary\u003eHelp\u003c/summary\u003e\u003csmall\u003eThe        \nNextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the     \npipeline what method should be used to move these files. See [Nextflow                                                             \ndocs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.\u003c/small\u003e\u003c/details\u003e| `string` | copy |  | True |     \n| `monochrome_logs` | Do not use coloured log outputs. | `boolean` |  |  | True |                                                  \n                                                                                                                                   \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxibor%2Fmgenottate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxibor%2Fmgenottate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxibor%2Fmgenottate/lists"}