{"id":15489788,"url":"https://github.com/oschwengers/asap","last_synced_at":"2025-04-16T05:57:38.865Z","repository":{"id":33246552,"uuid":"99518358","full_name":"oschwengers/asap","owner":"oschwengers","description":"A scalable bacterial genome assembly, annotation and analysis pipeline","archived":false,"fork":false,"pushed_at":"2023-12-05T22:16:53.000Z","size":14252,"stargazers_count":73,"open_issues_count":9,"forks_count":19,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-13T17:22:39.590Z","etag":null,"topics":["amr","annotation","assembly","bacteria","bioinformatics","ngs"],"latest_commit_sha":null,"homepage":"https://doi.org/10.1371/journal.pcbi.1007134","language":"Groovy","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oschwengers.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-08-06T23:09:49.000Z","updated_at":"2025-03-05T11:50:03.000Z","dependencies_parsed_at":"2022-07-07T22:28:49.106Z","dependency_job_id":"08d0957b-4bd0-4083-9117-17d45be6cfdd","html_url":"https://github.com/oschwengers/asap","commit_stats":{"total_commits":274,"total_committers":7,"mean_commits":"39.142857142857146","dds":0.08029197080291972,"last_synced_commit":"12616ce6f8400def1d60f02ada3fb9b6d512774e"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oschwengers%2Fasap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oschwengers%2Fasap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oschwengers%2Fasap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oschwengers%2Fasap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oschwengers","download_url":"https://codeload.github.com/oschwengers/asap/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249205788,"owners_count":21229992,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amr","annotation","assembly","bacteria","bioinformatics","ngs"],"created_at":"2024-10-02T07:07:58.539Z","updated_at":"2025-04-16T05:57:38.842Z","avatar_url":"https://github.com/oschwengers.png","language":"Groovy","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![DOI:10.1371/journal.pcbi.1007134](https://zenodo.org/badge/DOI/10.1371/journal.pcbi.1007134.svg)](https://doi.org/10.1371/journal.pcbi.1007134)\n[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/oschwengers/asap/blob/master/LICENSE)\n![Don't judge me](https://img.shields.io/badge/Language-Groovy-blue.svg)\n![GitHub release](https://img.shields.io/github/release/oschwengers/asap.svg)\n[![Docker Pulls](https://img.shields.io/docker/pulls/oschwengers/asap.svg)](https://hub.docker.com/r/oschwengers/asap)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3606299.svg)](https://doi.org/10.5281/zenodo.3606299)\n\n# ASA³P - Automatic Bacterial Isolate Assembly, Annotation and Analyses Pipeline\n\n![ASA³P Overview](asap.png)\n\n## Contents\n- [Description](#description)\n- [Features](#features)\n  - [Per Isolate](#per-isolate)\n  - [Comparative](#comparative)\n- [Availability](#availability)\n  - [Docker](#docker)\n  - [Cloud](#cloud-openstack)\n- [Input/Output](#inputoutput)\n- [Citation](#citation)\n- [License](#license)\n- [FAQ](#faq)\n\n## Description\nASA³P is an automatic and highly scalable assembly, annotation and higher-level\nanalyses pipeline for closely related bacterial isolates.\n\nASA³P is a fully automatic, locally executable and scalable assembly, annotation\nand higher-level analysis pipeline creating results in standard bioinformatics\nfile formats as well as sophisticated HTML5 documents. Its main purpose is the\nautomatic processing of NGS WGS data of multiple closely related isolates, thus\ntransforming raw reads into assembled and annotated genomes and finally gathering\nas much information on every single bacterial genome as possible.\nPer-isolate analyses are complemented by comparative insights. Therefore, the\npipeline incorporates many best-in-class open source bioinformatics tools and\nthus minimizes the burden of ever-repeating tasks. Envisaged as a\npreprocessing tool it provides comprehensive insights as well as a general overview\nand comparison of analysed genomes along with all necessary result files for subsequent\ndeeper analyses. All results are presented via modern HTML5 documents comprising\ninteractive visualizations.\n\n## Features\n\n### Per isolate\n- quality/adapter clipping\n- assembly (**Illumnia**, **PacBio** \u0026 **ONT**)\n- scaffolding\n- annotation\n- taxonomic classification (**Kmer/ANI**, **16S** and **ANI**)\n- multi locus sequence typing (**MLST**)\n- antibiotic resistance detection\n- virulence factor detection\n- reference mapping\n- SNP detection\n\n### Comparative\n- calculation of core/pan genome and singleton genes\n- phylogenetic tree creation\n\n## Availability\nAll necessary files are hosted at Zenodo: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3606299.svg)](https://doi.org/10.5281/zenodo.3606299)\n\nTargeting different project sizes, i.e. number of genomes which should be\nanalysed as a single project, we distribute ASA³P in two versions:\n- **Docker**: linux container image for small to medium projects\n- **OpenStack**: highly scalable cloud version for (very) large projects\n\nFor both the following files are necessary:\n- ASA³P tarball containing binaries, 3rd party executables and databases: asap.tar.gz\n- configuration template: config.xls\n\nNote:\nAs the ASA³P tarball contains all databases and 3rd party executables necessary,\nit is rather huge (23 Gb zipped, 29 Gb unzipped) and thus, download times may be quit long.\nTo unzip the tarball a deflating tool supporting multithreadding might be beneficial,\ne.g. pigz on linux (`sudo apt install pigz` for Ubuntu).\n\nAdditional files:\n- comprehensive manual: manual.pdf\n- configuration example: config-example.xls\n\nAdditional example and benchmark projects are hostet in a distinct repository at Zenodo: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3606760.svg)](https://doi.org/10.5281/zenodo.3606760)\n   - 4 public *L. monocytogenes* genomes: example-lmonocytogenes-4.tar.gz\n   - 32 public *L. monocytogenes* genomes: example-lmonocytogenes-32.tar.gz\n   - 8 *E. coli* project merely showing support of different input types: example-ecoli-input.tar.gz\n\n### Docker\nFor small to medium projects (up to ~200 isolates) but also for the sake of\nsimplicity, reproducibility and easy distribution, we offer ASA³P as a\n**Docker** image hosted at:\n**Docker Hub** (https://hub.docker.com/r/oschwengers/asap/). Please, follow the\nofficial instructions (https://docs.docker.com/install) to install Docker.\n\nSetup:\n```bash\n$ sudo docker pull oschwengers/asap\n$ wget https://zenodo.org/record/3780003/files/asap.tar.gz\n$ tar -xzf asap.tar.gz\n$ rm asap.tar.gz\n```\n\nRunning an ASA³P Container using the `asap-docker.sh` shell wrapper script:\n```bash\n$ #\u003cASAP_DIR\u003e/asap-docker.sh -p \u003cPROJECT_DIR\u003e [-s \u003cSCRATCH_DIR\u003e] [-a ASAP_DIR] [-z] [-c] [-d]\n$ asap/asap-docker.sh -p example-lmonocytogenes -s /tmp\n```\n\nParameters \u0026 Options:\n* `-p \u003cPROJECT_DIR\u003e`: mandatory: path to the actual project directory (containing `config.xls` and `data` directory)\n* `-a \u003cASAP_DIR\u003e`: optional: path to the ASA³P dir in case the script was moved/copied somewhere else\n* `-s \u003cSCRATCH_DIR\u003e`: optional: path to a distinct scratch/tmp dir\n* `-z`: optional: skip characterization steps\n* `-c`: optional: skip comparative analysis steps\n* `-d`: optional: enable verbose logs for debugging purposes\n\n**Note**\n1. This shell wrapper script should remain within the ASA³P directory in order to\ncorrectly extract related paths. In case the script was moved/copied somewhere else,\nyou have to provide the path via `-a \u003cASAP_DIR\u003e`.\n2. The script gathers user:group ids and passes these to the Docker container thus,\nfiles created by ASA³P automatically have the correct user ownerships instead of sudo ones.\n3. The script will ask for the sudo password as Docker containers can currently only\nbe executed as sudo. This is pure technical necessity unrelated to ASA³P itself.\n\n**Complete example**:\n```bash\n$ sudo docker pull oschwengers/asap\n$ wget https://zenodo.org/record/3780003/files/asap.tar.gz\n$ tar -xzf asap.tar.gz\n$ rm asap.tar.gz\n$ wget https://zenodo.org/record/3606761/files/example-lmonocytogenes-4.tar.gz?download=1\n$ tar -xzf example-lmonocytogenes-4.tar.gz\n$ rm example-lmonocytogenes-4.tar.gz\n$ asap/asap-docker.sh -p example-lmonocytogenes-4/\n```\n\nFor further information have a look at the Docker readme (docker/DOCKER.md ).\n\n### Cloud OpenStack\nASA³P's **OpenStack** based cloud version targets the analysis of hundreds to\neven thousands of bacterial isolates. Therefore, it features automatic creation,\nsetup and orchestration of an **SGE** based compute cluster and its entire\nunderlying infrastructure. Therefore, the **OpenStack** cloud version internally\ntakes advantage of the BiBiGrid (https://github.com/BiBiServ/bibigrid) framework.\nHence, analysis of thousands of genomes can be achieved in a highly parallel\nmanner and adequate amount of time. ASA³P takes care of all setup and orchestration\naspects and thus hides away as much technical complexity as possible. For further\ninformation please have a look at our user manual.\n\nIn order to trigger an **OpenStack** based cloud project, you need the following\nadditional cloud related files:\n- ASA³P cloud tarball (containing binaries, property files and a customized BiBiGrid version):\n[asap-cloud.tar.gz](https://zenodo.org/record/3606300/files/asap-cloud.tar.gz?download=1) (md5sum: c584dedcaf17963a240dbabfad95f608)\n\nOnce ASA³P is properly setup you can start it by executing a single shell script:\n```bash\n$ ~/asap-cloud/asap-cloud.sh -i \u003cINSTANCE_ID\u003e -o \u003cOPEN_STACK_RC_FILE\u003e -p \u003cPROJECT_DIR\u003e\n```\n\nParameters:\n* `\u003cINSTANCE_ID\u003e`: VM id of the gateway instance (VM you start ASA³P from)\n* `\u003cOPEN_STACK_RC_FILE\u003e`: OpenStack RC file providing cloud and project information\n* `\u003cPROJECT_DIR\u003e`: path to the actual project directory (containing `config.xls` and `data` directory)\n\nFor a comprehensive and detailed description of how to setup an OpenStack project\nand ASA³P therein, please have a look at our manual stored here:\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3606299.svg)](https://doi.org/10.5281/zenodo.3606299)\n\n## Input/Output\n\n### Input\nASA³P is able to process raw sequencing reads from Illumina (SE/PE), PacBio (bax.h5/ubam) and ONT\n(basecalled as fastq) as well as assembled contigs (fasta) and annotated genomes\n(GenBank/EMBL/GFF).\n\nASA³P expects all input files and information regarding a single execution\n(i.e. a \"project\") within a dedicated directory. All necessary information\n(meta information, reference genomes, isolate/sample names and files) are\nprovided via an Excel config file named *config.xls*.\nA corresponding template can be downloaded [here](https://zenodo.org/record/3606300/files/config.xls?download=1).\nFor further details on how to fill out a proper configuration file, please have\na look at the [manual](https://zenodo.org/record/3606300/files/manual.pdf?download=1)\nand the exemplary projects listed above. All input files referenced in a configuration\nspreadsheet need to be placed in a subdirectory called *data*.\n\n**Example**:\n```\nproject-dir\n├── config.xls\n├── data\n│   ├── reference-genome-1.gbk\n│   ├── reference-genome-2.fasta\n│   ├── isolate-1-1.fastq.gz\n│   ├── isolate-1-2.fastq.gz\n│   ├── isolate-2-1.fastq.gz\n│   ├── isolate-2-2.fastq.gz\n│   ├── isolate-3.1.bax.h5\n│   ├── isolate-3.2.bax.h5\n│   ├── isolate-3.3.bax.h5\n│   ├── ...\n```\n\n### Output\n**tl; dr**\nJust open your browser and open the **index.html** file located at:\n```\nproject-dir\n├── reports   (HTML5 reports)\n│   ├── index.html\n```\n\nIn order to provide a first glimpse into the results of the pipeline, we configured\na public login to a static web server for demonstration purposes only at:\nhttps://www.computational.bio.uni-giessen.de/asap/\n```\n$ login: asap-test\n$ password: asap-test\n```\n\nASA³P stores all output files within the specified project directory\nleaving input files untouched:\n- empty status file indicating ASA³P current status, one of:\n   - *status.running*\n   - *status.finished*\n   - *status.failed*\n- log file (*asap.log*)\n- internal configuration file (*config.json*)\n- report directory containing **HTML5** report pages (*reports*)\n\nFurthermore, for each analysis ASA³P creates a corresponding subdirectory\ncontaining result files such as:\n- empty status file indicating an analysis' status, one of:\n   - *status.running*\n   - *status.finished*\n   - *status.failed*\n- **JSON** file (*info.json*) comprising collected and aggregated information\n- binary result files in standard file formats (**.fasta**, **.gbk**, **.gff**, **.bam**, **.vcf.gz**, etc...)\n\nWhere necessary ASA³P creates subdirectories for each isolate within an\nanalysis directory.\n\n**Example**:\n```\nproject-dir\n├── [state.running | state.finished | state.failed]\n├── asap.log   (global logging file)\n├── config.xls   (config spreadsheet)\n├── config.json   (internal config)\n├── reports   (HTML5 reports)\n│   ├── index.html\n│   ├── ...\n├── reads_qc   (quality clipped read files)\n│   ├── \u003csample-name\u003e\n│   ├── ├── [state.finished | state.failed]\n│   ├── ├── isolate-1-1.fastq.gz\n│   ├── ├── isolate-1-2.fastq.gz\n│   ├── ├── info.json\n│   ├── ...\n├── assembly   (assemblies)\n│   ├── \u003csample-name\u003e\n│   ├── ├── [state.finished | state.failed]\n│   ├── ├── \u003csample-name\u003e.fasta\n│   ├── ├── \u003csample-name\u003e-discarded.fasta\n│   ├── ├── info.json\n│   ├── ...\n├── scaffolds   (scaffolded contigs)\n│   ├── \u003csample-name\u003e\n│   ├── ├── [state.finished | state.failed]\n│   ├── ├── \u003csample-name\u003e.fasta   (scaffolds)\n│   ├── ├── \u003csample-name\u003e-pseudo.fasta   (pseudo genome)\n│   ├── ├── info.json\n│   ├── ...\n├── annotations\n│   ├── \u003csample-name\u003e\n│   ├── ├── [state.finished | state.failed]\n│   ├── ├── \u003csample-name\u003e.gbk   (Genbank)\n│   ├── ├── \u003csample-name\u003e.gff   (GFF3)\n│   ├── ├── \u003csample-name\u003e.ffn   (gene sequences)\n│   ├── ├── \u003csample-name\u003e.faa   (protein sequences)\n│   ├── ├── info.json\n│   ├── ...\n├── taxonomy   (taxonomic classfication results)\n│   ├── [\u003csample-name\u003e.finished | \u003csample-name\u003e.failed]\n│   ├── \u003csample-name\u003e.json\n│   ├── ...\n├── mlst   (multi-locus sequence typing results)\n│   ├── [\u003csample-name\u003e.finished | \u003csample-name\u003e.failed]\n│   ├── \u003csample-name\u003e.json\n│   ├── ...\n├── abr   (antibiotic resistance genes detection)\n│   ├── [\u003csample-name\u003e.finished | \u003csample-name\u003e.failed]\n│   ├── \u003csample-name\u003e.json\n│   ├── ...\n├── vf   (virulence factor detection results)\n│   ├── [\u003csample-name\u003e.finished | \u003csample-name\u003e.failed]\n│   ├── \u003csample-name\u003e.json\n│   ├── ...\n├── mappings   (reference mappings)\n│   ├── [\u003csample-name\u003e.finished | \u003csample-name\u003e.failed]\n│   ├── \u003csample-name\u003e.json\n│   ├── \u003csample-name\u003e.bam\n│   ├── \u003csample-name\u003e.bam.bai\n│   ├── ...\n├── snps   (called single nucleotide polymorphisms)\n│   ├── [\u003csample-name\u003e.finished | \u003csample-name\u003e.failed]\n│   ├── \u003csample-name\u003e.json\n│   ├── \u003csample-name\u003e.consensus.fasta   (mpileup consensus file)\n│   ├── \u003csample-name\u003e.vcf.gz   (SNPs in variant calling format file)\n│   ├── \u003csample-name\u003e.vcf.gz.tbi\n│   ├── \u003csample-name\u003e.chk   (bcftools stats)\n│   ├── \u003csample-name\u003e.csv   (SNPeff per gene statisics)\n│   ├── ...\n├── corepan\n│   ├── [state.finished | state.failed]\n│   ├── info.json\n│   ├── core.fasta   (core genome sequences)\n│   ├── pan.fasta   (pan genome sequences)\n│   ├── pan-matrix.tsv   (pan genome matrix)\n│   ├── \u003csample-name\u003e.json\n│   ├── ...\n├── phylogeny\n│   ├── [state.finished | state.failed]\n│   ├── info.json\n│   ├── tree.nwk   (phylogenetic tree in newick file)\n│   ├── consensus.fasta   (global consensus file)\n├── data\n```\n\n## Citation\n\n\u003e Schwengers et al. (2020). ASA³P: An automatic and scalable pipeline for the assembly, annotation and higher level analysis of closely related bacterial isolates. PLOS Computational Biology 16(3): e1007134. https://doi.org/10.1371/journal.pcbi.1007134\n\n## License\nASA³P itself is published and distributed under GPL3 license. In contradiction,\nsome of its dependencies bundled within the ASA³P tarball (asap.tar.gz file) are\npublished under different licenses, e.g. GPL2, BSD, MIT, LGPL, etc.\nA file (README.md) within the ASA³P directory contains a list of all\ndependencies and related licenses.\n\n**NOTE**\nPlease, notice that some bundled dependencies are published under a\n**free-for-academic** or **free-for-non-commercial** usage license model.\nTo our best knowledge this is true for at least the following databases:\n- CARD: free for academic usage\n- PubMLST: proprietary but free to use\n\n## FAQ\n* __Is there a public example project?__\nJust download and use one of these exemplary projects containing a tiny set of 4 public\n*Listeria monocytogenes* genomes from **SRA**, a larger 32 *L. monocytogenes* set as well as\na comprehensive *E. coli* set covering all potential input data types:\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3606760.svg)](https://doi.org/10.5281/zenodo.3606760)\n\n* __Why do I have to pre-basecall ONT reads?__\nUnfortunately, there are too many combinations of flow cells, sequencing kits, etc.\nWe had to ask to put all these information in the config sheets which would blow them up.\nTherefore, we decided to outsource these very specific pre-processing step.\n\n* __Can I install ASA³P by myself?__\nYes you can! Nevertheless, we highly encourage to use either the **Docker**\ncontainer or the **OpenStack** images. As there are too many possible combinations of\nlinux distributions and software/database versions, we cannot give any support for this.\n\n* __I'm facing an error/bug. What shall I do?__\nIf you run into any issues with ASA³P, we'd be happy to hear about it!\nPlease, start the pipeline with `-d` (verbose debugging logs) and do not hesitate\nto file an issue including as much of the following as possible:\n- a detailed description of the issue\n- the `asap.log` file within your project/data directory\n- in case you can already pinpoint the error: the log file of the failed subanalysis\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foschwengers%2Fasap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foschwengers%2Fasap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foschwengers%2Fasap/lists"}